HeliosDB vs MongoDB

MongoDB redefined how developers think about databases -- schema flexibility, horizontal scaling, and a developer-friendly document model made it the default NoSQL choice for over a decade. But modern applications demand more than documents. They need vector search for AI, full-text search for discovery, relational JOINs for analytics, time-travel for auditing, and ACID transactions that actually work across collections. HeliosDB delivers all of this in a single binary with a familiar PostgreSQL wire protocol, eliminating the "MongoDB for documents, Postgres for relations, Pinecone for vectors" architecture that has become the norm.

Quick Comparison

Feature	MongoDB	HeliosDB
Data model	BSON documents in collections	Tables + JSONB + vectors + relational
Schema	Schema-optional (validation rules)	Schema-enforced + JSONB for flexibility
Query language	MongoDB Query Language (MQL)	Standard SQL (PostgreSQL-compatible)
Wire protocol	MongoDB wire protocol	PostgreSQL wire protocol + REST + gRPC
JOINs	`$lookup` (limited, no nested)	Full SQL JOINs (INNER, LEFT, RIGHT, FULL, CROSS, LATERAL)
Vector search	Atlas Vector Search (separate add-on)	Native HNSW + Product Quantization + SIMD
Full-text search	Atlas Search (Lucene-based add-on)	Native `@@` operator, integrated with optimizer
ACID transactions	Multi-document (since 4.0, with caveats)	Full MVCC, snapshot isolation, savepoints
Time-travel queries	Not supported	`SELECT * FROM t AT TIME '2026-01-01'`
Data branching	Not supported	Git-like branches with merge
Encryption	At-rest (WiredTiger) + in-transit (TLS)	TDE (AES-256-GCM) + Zero-Knowledge Encryption
Embedded deployment	Not available (server-only)	60MB binary, in-process or standalone
Replication	Replica sets (primary-secondary)	3-tier: WAL streaming, multi-primary, sharding
Sharding	Hash or range-based, mongos router	Hash, range, or list partitioning + proxy
Change streams	Native (oplog-based)	WAL-based change feeds (HeliosDB-Full)
Aggregation	Aggregation pipeline (stages)	SQL: GROUP BY, window functions, CTEs, subqueries
Indexes	B-tree, hashed, text, 2dsphere, wildcard	ART (Adaptive Radix Tree), bloom filters, zone maps, HNSW

The Architecture Problem

With MongoDB: The Multi-Database Stack

When your application outgrows documents, you bolt on additional databases:

+--------------+     +--------------+     +--------------+
|   MongoDB    |     |  PostgreSQL  |     |   Pinecone   |
| (documents)  |     | (relational) |     |  (vectors)   |
+------+-------+     +------+-------+     +------+-------+
       |                    |                    |
       +--------+-----------+------------+-------+
                |                        |
         +------+------+          +------+------+
         | Application |          |  Sync Layer |
         |   (joins    |          | (keep DBs   |
         |  in code)   |          |  in sync)   |
         +-------------+          +-------------+

 3+ databases to deploy and keep in sync
 JOINs happen in application code
 No cross-database transactions
 Vector search is a separate billing dimension

With HeliosDB: Documents + Relations + Vectors in One Engine

+-------------------------------------------------+
|                   HeliosDB                       |
|                                                 |
|  +----------+  +----------+  +--------------+   |
|  |  JSONB   |  |Relational|  | Vector Search|   |
|  |20+ funcs |  |  Tables  |  |  HNSW + PQ   |   |
|  +----+-----+  +----+-----+  +------+-------+   |
|       |             |               |            |
|  +----+-------------+---------------+----------+ |
|  |    MVCC Storage Engine + SQL Optimizer       | |
|  |  (ACID + TDE + WAL + Branching + FTS)       | |
|  +---------------------------------------------+ |
|                                                 |
|  PostgreSQL wire protocol (psql, any PG driver) |
+-------------------------------------------------+

 1 binary, 60MB, embeddable or clustered
 Documents + relations + vectors in one ACID transaction
 Standard SQL -- every developer already knows it

Schema Flexibility: BSON vs JSONB + Typed Columns

MongoDB: Schema-Optional Documents

// MongoDB: flexible schema, but no guarantees without validation
db.products.insertMany([
    {
        name: "Widget",
        price: 29.99,
        tags: ["hardware", "bestseller"],
        specs: { weight: 0.5, color: "blue" }
    },
    {
        // Oops: "price" is a string, "tags" is missing
        name: "Gadget",
        price: "forty-two",
        specifications: { mass: 1.2 }  // different field name
    }
]);

// Schema validation (added later, optional)
db.createCollection("products", {
    validator: {
        $jsonSchema: {
            bsonType: "object",
            required: ["name", "price"],
            properties: {
                name: { bsonType: "string" },
                price: { bsonType: "number" }
            }
        }
    }
});

HeliosDB: Typed Columns + JSONB Flexibility

-- Structured columns enforce data quality
-- JSONB column provides document-style flexibility
CREATE TABLE products (
    id          SERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    price       NUMERIC(10,2) NOT NULL CHECK (price > 0),
    tags        TEXT[] DEFAULT '{}',
    specs       JSONB DEFAULT '{}',
    embedding   VECTOR(384),
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Type-safe inserts: "forty-two" would be rejected immediately
INSERT INTO products (name, price, tags, specs, embedding)
VALUES (
    'Widget',
    29.99,
    ARRAY['hardware', 'bestseller'],
    '{"weight": 0.5, "color": "blue", "dimensions": {"l": 10, "w": 5, "h": 3}}',
    '[0.12, -0.45, 0.89, ...]'::vector(384)
);

-- Best of both worlds:
--   Structured data (name, price) has schema enforcement
--   Flexible data (specs) uses JSONB with 20+ query functions
--   Vector embeddings stored alongside for AI search

Query Language: MQL vs SQL

MongoDB: MongoDB Query Language

// Find products with nested field filter
db.products.find({
    "specs.weight": { $lt: 1.0 },
    tags: { $in: ["hardware"] },
    price: { $gte: 10, $lte: 50 }
}).sort({ price: -1 }).limit(10);

// Aggregation pipeline: multi-stage data transformation
db.orders.aggregate([
    { $match: { status: "completed", date: { $gte: ISODate("2026-01-01") } } },
    { $unwind: "$items" },
    { $group: {
        _id: "$items.product_id",
        total_revenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
        order_count: { $sum: 1 }
    }},
    { $sort: { total_revenue: -1 } },
    { $limit: 20 },
    // $lookup is MongoDB's JOIN -- limited to one collection, no nesting
    { $lookup: {
        from: "products",
        localField: "_id",
        foreignField: "_id",
        as: "product"
    }},
    { $unwind: "$product" },
    { $project: {
        product_name: "$product.name",
        total_revenue: 1,
        order_count: 1
    }}
]);

HeliosDB: Standard SQL (Same Query, Clearer Syntax)

-- Same filter query: readable, standard SQL
SELECT name, price, specs->>'weight' AS weight
FROM products
WHERE (specs->>'weight')::numeric < 1.0
  AND 'hardware' = ANY(tags)
  AND price BETWEEN 10 AND 50
ORDER BY price DESC
LIMIT 10;

-- Same aggregation: SQL with JOINs (no pipeline gymnastics)
SELECT
    p.name AS product_name,
    SUM(oi.price * oi.qty) AS total_revenue,
    COUNT(DISTINCT o.id) AS order_count
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.status = 'completed'
  AND o.order_date >= '2026-01-01'
GROUP BY p.name
ORDER BY total_revenue DESC
LIMIT 20;

-- HeliosDB advantage: CTEs and window functions
WITH monthly_revenue AS (
    SELECT
        p.name,
        date_trunc('month', o.order_date) AS month,
        SUM(oi.price * oi.qty) AS revenue
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.id
    JOIN products p ON p.id = oi.product_id
    WHERE o.order_date >= '2025-01-01'
    GROUP BY p.name, date_trunc('month', o.order_date)
)
SELECT
    name,
    month,
    revenue,
    LAG(revenue) OVER (PARTITION BY name ORDER BY month) AS prev_month,
    ROUND(
        (revenue - LAG(revenue) OVER (PARTITION BY name ORDER BY month))
        / NULLIF(LAG(revenue) OVER (PARTITION BY name ORDER BY month), 0) * 100,
        2
    ) AS growth_pct
FROM monthly_revenue
ORDER BY name, month;

JSONB Deep Dive: 20+ Functions

HeliosDB's JSONB support is not a thin wrapper -- it provides full PostgreSQL-compatible JSONB with over 20 functions and operators:

-- Navigation and extraction
SELECT
    data->'address'->>'city' AS city,             -- nested extraction
    data#>>'{contacts,0,email}' AS first_email,   -- path extraction
    jsonb_extract_path(data, 'address', 'zip') AS zip,
    jsonb_typeof(data->'age') AS age_type          -- "number"
FROM customers;

-- Containment and existence
SELECT * FROM customers
WHERE data @> '{"tier": "enterprise"}'             -- contains
  AND data ? 'phone'                                -- key exists
  AND data ?| ARRAY['email', 'phone'];             -- any key exists

-- Modification (immutable: returns new document)
UPDATE customers
SET data = jsonb_set(
    data,
    '{address,verified}',
    'true'::jsonb
)
WHERE id = 42;

-- Aggregation: build JSON from query results
SELECT jsonb_agg(
    jsonb_build_object(
        'name', name,
        'total', order_total
    )
) AS summary
FROM (
    SELECT c.name, SUM(o.total) AS order_total
    FROM customers c
    JOIN orders o ON o.customer_id = c.id
    GROUP BY c.name
    ORDER BY order_total DESC
    LIMIT 5
) top_customers;

-- Expand JSONB arrays and objects
SELECT
    p.name,
    tag.value AS tag
FROM products p,
     jsonb_array_elements_text(p.data->'tags') AS tag
WHERE tag.value LIKE '%sale%';

-- JSONB in JOINs with relational data (impossible in MongoDB without $lookup)
SELECT
    c.name,
    c.data->>'tier' AS tier,
    COUNT(o.id) AS orders,
    SUM(o.total) AS revenue,
    AVG(o.total) AS avg_order
FROM customers c
JOIN orders o ON o.customer_id = c.id
LEFT JOIN shipments s ON s.order_id = o.id
WHERE c.data @> '{"region": "EU"}'
  AND o.order_date >= '2026-01-01'
GROUP BY c.name, c.data->>'tier'
HAVING SUM(o.total) > 10000
ORDER BY revenue DESC;

Vector Search: Add-On vs Built-In

MongoDB: Atlas Vector Search (Separate Service)

// MongoDB Atlas Vector Search requires:
// 1. Atlas cluster (cloud-only, not self-hosted)
// 2. Separate search index definition
// 3. $vectorSearch aggregation stage (Atlas-only)

// Create search index (Atlas UI or API, not MQL)
// { "type": "vectorSearch", "fields": [{ "path": "embedding", ... }] }

// Query using aggregation pipeline
db.documents.aggregate([
    {
        $vectorSearch: {
            index: "vector_index",
            path: "embedding",
            queryVector: [0.12, -0.45, 0.89 /* ... */],
            numCandidates: 100,
            limit: 10
        }
    },
    {
        $project: {
            title: 1,
            content: 1,
            score: { $meta: "vectorSearchScore" }
        }
    }
]);

// Limitations:
//  - Atlas cloud only (not available in self-hosted MongoDB)
//  - Separate index type, separate billing
//  - Cannot combine with $match in same pipeline stage
//  - No hybrid vector + full-text in single operation
//  - Limited to Atlas M10+ clusters

HeliosDB: Native Vector Search with SQL

-- Create HNSW vector index (works everywhere: embedded, server, cluster)
CREATE INDEX idx_docs_embedding ON documents USING HNSW (embedding);

-- Vector similarity search: standard SQL
SELECT title, content,
       embedding <-> '[0.12, -0.45, 0.89, ...]'::vector(384) AS distance
FROM documents
ORDER BY embedding <-> '[0.12, -0.45, 0.89, ...]'::vector(384)
LIMIT 10;

-- Hybrid search: vector + full-text + SQL filters in ONE query
SELECT title, content,
       embedding <-> $1::vector(384) AS semantic_distance
FROM documents
WHERE content @@ 'database AND transactions'
  AND metadata @> '{"category": "technical"}'
  AND created_at >= '2026-01-01'
ORDER BY embedding <-> $1::vector(384)
LIMIT 10;

-- Vector search across JOINed data (impossible in MongoDB)
SELECT
    d.title,
    d.content,
    a.name AS author,
    c.name AS category,
    d.embedding <-> $1::vector(384) AS distance
FROM documents d
JOIN authors a ON a.id = d.author_id
JOIN categories c ON c.id = d.category_id
WHERE c.name IN ('engineering', 'research')
ORDER BY d.embedding <-> $1::vector(384)
LIMIT 10;

Aspect	MongoDB Atlas Vector Search	HeliosDB Vector Search
Availability	Atlas cloud only (M10+)	Everywhere: embedded, server, cluster
Index type	Separate vector index	HNSW (native, integrated with optimizer)
Quantization	Not available	Product Quantization for memory efficiency
SIMD acceleration	Not documented	Native SIMD for similarity computation
Hybrid search	Separate stages	Single query: vector + FTS + SQL
JOIN support	No (aggregation only)	Full SQL JOINs with vector ORDER BY
Self-hosted	Not available	Yes (60MB binary)
ACID	Not guaranteed in search	Full MVCC transaction isolation

Time-Travel and Data Branching

MongoDB has no equivalent for either feature.

MongoDB: No Historical Queries

// MongoDB has no time-travel capability
// To query historical data, you must:
// 1. Implement change data capture (CDC) manually
// 2. Store snapshots in a separate collection
// 3. Use change streams + application logic

// Change stream (captures changes, but doesn't query history)
const changeStream = db.collection('orders').watch();
changeStream.on('change', (change) => {
    // Save to history collection manually
    db.collection('orders_history').insertOne({
        ...change.fullDocument,
        _change_type: change.operationType,
        _changed_at: new Date()
    });
});

// Querying "what was the state at time X?" requires
// replaying all changes up to that timestamp -- expensive and complex

HeliosDB: Built-In Time-Travel + Branching

-- Time-travel: query data as it existed at any timestamp
SELECT * FROM orders AT TIME '2026-01-15 09:30:00';

-- Audit trail: compare before and after
SELECT
    a.order_id,
    a.status AS status_before,
    b.status AS status_after,
    a.total AS total_before,
    b.total AS total_after
FROM orders AT TIME '2026-02-01' a
JOIN orders b ON a.order_id = b.order_id
WHERE a.status != b.status;

-- Data branching: git-like workflows for data
CREATE BRANCH pricing_test FROM main;
CHECKOUT pricing_test;

-- Experiment with pricing changes
UPDATE products SET price = price * 0.9 WHERE category = 'seasonal';

-- Run analytics on the experimental data
SELECT category, AVG(price) AS avg_price, COUNT(*) AS products
FROM products
GROUP BY category;

-- Safe: production data on 'main' is completely untouched
CHECKOUT main;
SELECT category, AVG(price) FROM products GROUP BY category;
-- (original prices, unchanged)

-- Merge if satisfied, or drop the branch
MERGE pricing_test INTO main;
-- or: DROP BRANCH pricing_test;

Indexing: B-Tree vs ART + Bloom Filters + Zone Maps

MongoDB Indexes

// B-tree (default)
db.orders.createIndex({ customer_id: 1, order_date: -1 });

// Compound index
db.orders.createIndex({ status: 1, total: -1 });

// Text index (one per collection limitation)
db.products.createIndex({ name: "text", description: "text" });

// 2dsphere (geospatial)
db.locations.createIndex({ coordinates: "2dsphere" });

// Wildcard (all fields in document)
db.logs.createIndex({ "$**": 1 });

HeliosDB Indexes

-- ART (Adaptive Radix Tree): faster than B-tree for skewed distributions
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Composite index
CREATE INDEX idx_orders_status ON orders (status, total DESC);

-- HNSW vector index (no equivalent in MongoDB without Atlas)
CREATE INDEX idx_products_embedding ON products USING HNSW (embedding);

-- HeliosDB also uses automatically:
--   Bloom filters: fast negative lookups (key definitely not in block)
--   Zone maps: min/max per data block for range query pruning
--   These are transparent -- no CREATE INDEX needed

Index Type	MongoDB	HeliosDB
Primary key	B-tree on `_id`	ART on PRIMARY KEY
Secondary	B-tree	ART
Text search	1 text index per collection	FTS on any text column (unlimited)
Vector	Atlas Vector Search (cloud only)	HNSW (native, everywhere)
Geospatial	2dsphere, 2d	Not built-in
Bloom filter	Not available	Automatic per data block
Zone maps	Not available	Automatic min/max per block
Wildcard	`$**` (all fields)	Not applicable (schema-defined)

Transactions: A Deeper Look

MongoDB: Multi-Document Transactions (Since 4.0)

const session = client.startSession();
try {
    session.startTransaction({
        readConcern: { level: "snapshot" },
        writeConcern: { w: "majority" }
    });

    await orders.insertOne({ customer_id: 42, total: 100 }, { session });
    await inventory.updateOne(
        { product_id: 7, stock: { $gte: 1 } },
        { $inc: { stock: -1 } },
        { session }
    );
    await accounts.updateOne(
        { id: 42 },
        { $inc: { balance: -100 } },
        { session }
    );

    await session.commitTransaction();
} catch (error) {
    await session.abortTransaction();
} finally {
    session.endSession();
}

// Caveats:
//  - 60-second default timeout (TransactionLifetimeLimitSeconds)
//  - Performance overhead (~2x latency vs non-transactional)
//  - No savepoints (all-or-nothing only)
//  - Oplog size limits for large transactions
//  - Retryable writes required for production

HeliosDB: MVCC Transactions with Savepoints

BEGIN;

-- Insert order
INSERT INTO orders (customer_id, total) VALUES (42, 100.00);

-- Create savepoint before inventory update
SAVEPOINT before_inventory;

-- Update inventory
UPDATE inventory SET stock = stock - 1 WHERE product_id = 7 AND stock >= 1;

-- If inventory was insufficient, roll back just that part
-- ROLLBACK TO SAVEPOINT before_inventory;

-- Update account balance
UPDATE accounts SET balance = balance - 100.00 WHERE id = 42;

-- Nested savepoint for audit logging
SAVEPOINT before_audit;
INSERT INTO audit_log (action, details)
VALUES ('purchase', '{"customer": 42, "total": 100.00}');

COMMIT;

-- Transactions in HeliosDB:
--  - No arbitrary timeout limits
--  - MVCC: readers never block writers
--  - Savepoints for partial rollback
--  - Snapshot isolation by default
--  - WAL-backed durability with fsync

Aspect	MongoDB	HeliosDB
Multi-document	Yes (since 4.0)	Yes (native MVCC)
Savepoints	No	Yes (SAVEPOINT / ROLLBACK TO)
Isolation level	Snapshot (read concern)	Snapshot isolation (default)
Timeout	60s default	Configurable, no arbitrary limit
Performance overhead	~2x latency	Minimal (MVCC is the default path)
Cross-shard	Yes (with overhead)	Yes (via sharding tier)
Nested transactions	No	Yes (via savepoints)

Encryption Comparison

MongoDB: Encryption at Rest + Client-Side Field-Level Encryption

// MongoDB encryption options:
// 1. At-rest: WiredTiger encrypts storage files (Enterprise only)
// 2. In-transit: TLS for client connections
// 3. CSFLE: Client-Side Field-Level Encryption (Enterprise or Atlas)

// CSFLE example (complex setup required)
const client = new MongoClient(uri, {
    autoEncryption: {
        keyVaultNamespace: "encryption.__keyVault",
        kmsProviders: {
            aws: { accessKeyId: "...", secretAccessKey: "..." }
        },
        schemaMap: {
            "mydb.patients": {
                bsonType: "object",
                properties: {
                    ssn: {
                        encrypt: {
                            bsonType: "string",
                            algorithm: "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"
                        }
                    }
                }
            }
        }
    }
});

// Limitations:
//  - At-rest encryption: Enterprise edition only
//  - CSFLE: complex schema map configuration
//  - No encryption of indexes (query patterns leak information)
//  - No zero-knowledge option (server can read non-CSFLE fields)

HeliosDB: TDE + Zero-Knowledge Encryption

-- TDE: transparent, everything encrypted at rest
-- No application code changes, no schema annotations
-- Data pages, WAL entries, indexes -- all encrypted with AES-256-GCM

INSERT INTO patients (name, ssn, diagnosis)
VALUES ('Jane Doe', '123-45-6789', 'confidential');

-- Queries work transparently -- decryption happens automatically
SELECT * FROM patients WHERE name = 'Jane Doe';

-- Zero-Knowledge Encryption: even the database admin cannot read data
-- Application holds the encryption key
-- HeliosDB processes encrypted data without seeing plaintext
-- Ring + AWS-LC FIPS-validated providers

Aspect	MongoDB	HeliosDB
At-rest encryption	Enterprise only (WiredTiger)	All editions (TDE, AES-256-GCM)
Scope	Storage files	Data pages + WAL + indexes
Field-level encryption	CSFLE (complex setup)	TDE encrypts everything transparently
Zero-knowledge	No (server sees non-CSFLE data)	Yes (application-held keys)
Index encryption	No	Yes
FIPS compliance	Depends on provider	Ring + AWS-LC FIPS providers
Configuration	Schema maps, KMS setup	Single config option

Embedded Deployment

MongoDB is a server-only database. There is no way to embed it inside your application process.

MongoDB: Server Required

Application Process          MongoDB Server (separate process)
+-----------------+         +----------------------+
|  Your App       |  TCP    |  mongod               |
|  (Node/Python)  | ------> |  (1GB+ RAM minimum)   |
|                 |         |  (WiredTiger engine)   |
+-----------------+         +----------------------+

- Minimum resource: ~1GB RAM for mongod
- Separate deployment, monitoring, upgrades
- Network round-trip for every operation
- Cannot run on IoT/edge devices

HeliosDB: Embedded or Server

# Embedded mode: database runs inside your application process
# No separate server, no network round-trips, no deployment overhead

import psycopg2

# Option 1: Connect to HeliosDB server (like MongoDB)
conn = psycopg2.connect("host=localhost port=5432 dbname=myapp")

# Option 2: HeliosDB as embedded library (no server needed)
# The database engine runs in-process
# 60MB binary, starts in milliseconds
# Perfect for: edge devices, IoT, desktop apps, CI/CD, testing

// TypeScript/Node.js: connect via PostgreSQL protocol
import { Client } from 'pg';

const client = new Client({
    host: 'localhost',
    port: 5432,
    database: 'myapp',
});
await client.connect();

// Every PostgreSQL client library works out of the box
// No MongoDB driver, no Mongoose, no special ODM required
const result = await client.query(
    'SELECT * FROM products WHERE data @> $1 LIMIT 10',
    [JSON.stringify({ category: 'electronics' })]
);

// Go: standard database/sql with any PostgreSQL driver
package main

import (
    "database/sql"
    _ "github.com/lib/pq"
)

func main() {
    db, _ := sql.Open("postgres", "host=localhost port=5432 dbname=myapp sslmode=disable")
    defer db.Close()

    rows, _ := db.Query(`
        SELECT name, data->>'category' AS category
        FROM products
        WHERE data @> '{"in_stock": true}'
        ORDER BY (data->>'price')::numeric DESC
        LIMIT 20
    `)
    // Standard Go database patterns -- nothing MongoDB-specific to learn
}

Replication and Sharding

MongoDB: Replica Sets + Sharded Clusters

Replica Set:
+----------+     +----------+     +----------+
| Primary  |---->|Secondary |---->|Secondary |
|  (R/W)   |     |  (R/O)   |     |  (R/O)   |
+----------+     +----------+     +----------+
  Oplog-based async replication
  Automatic failover (election)

Sharded Cluster:
+---------+  +---------+  +---------+
| mongos  |  | mongos  |  | mongos  |  (routers)
+----+----+  +----+----+  +----+----+
     +------+-----+------+-----+
  +----------+  +----------+  +----------+
  |  Shard 1 |  |  Shard 2 |  |  Shard 3 |
  | (replica |  | (replica |  | (replica |
  |   set)   |  |   set)   |  |   set)   |
  +----------+  +----------+  +----------+
  + Config servers (another replica set)

  Minimum for sharded HA: 13 processes
  (3 config, 3x3 shard replicas, 1+ mongos)

HeliosDB: 3-Tier Replication

Tier 1: WAL Streaming (simple HA)
+----------+    WAL stream    +----------+
| Primary  | ---------------> | Replica  |
|  (R/W)   |                  |  (R/O)   |
+----------+                  +----------+
  Minimum for HA: 2 nodes

Tier 2: Multi-Primary (write scaling)
+----------+ <-----------> +----------+
| Primary1 |              | Primary2 |
|  (R/W)   |              |  (R/W)   |
+----------+              +----------+
  Both nodes accept writes

Tier 3: Sharding (horizontal scaling)
+----------+  +----------+  +----------+
| Shard 1  |  | Shard 2  |  | Shard 3  |
|(hash/rng |  |(hash/rng |  |(hash/rng |
| /list)   |  | /list)   |  | /list)   |
+----------+  +----------+  +----------+
  + Proxy with 13 built-in features
  (connection pooling, query routing, failover, etc.)

Aspect	MongoDB	HeliosDB
Basic HA	3 nodes (replica set)	2 nodes (primary + replica)
Sharded HA	13+ processes	3+ shards + proxy
Replication method	Oplog (async)	WAL streaming (sync or async)
Multi-primary	No (single primary per replica set)	Yes (Tier 2)
Shard key selection	Must choose carefully, hard to change	Hash, range, or list partitioning
Config servers	Separate replica set required	Not needed (proxy handles routing)
Resharding	Online (since 6.0, but slow)	Partition management via SQL

HeliosDB-Full: MongoDB-Style DocumentStore API

For teams migrating from MongoDB, HeliosDB-Full provides a DocumentStore API with familiar concepts:

# HeliosDB-Full DocumentStore: MongoDB-style API
# Collections, queries, aggregation pipeline, validation, change streams

import psycopg2
import json

conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()

# Create a "collection" (table with JSONB document column)
cur.execute("""
    CREATE TABLE IF NOT EXISTS products (
        id SERIAL PRIMARY KEY,
        doc JSONB NOT NULL,
        embedding VECTOR(384),
        created_at TIMESTAMPTZ DEFAULT NOW()
    )
""")

# Insert documents (like db.collection.insertOne)
cur.execute("""
    INSERT INTO products (doc, embedding)
    VALUES (%s, %s)
""", (
    json.dumps({
        "name": "Widget",
        "price": 29.99,
        "tags": ["hardware"],
        "specs": {"weight": 0.5}
    }),
    "[0.12, -0.45, 0.89]"
))

# Query with containment (like MongoDB $match)
cur.execute("""
    SELECT doc->>'name' AS name, doc->>'price' AS price
    FROM products
    WHERE doc @> '{"tags": ["hardware"]}'
      AND (doc->>'price')::numeric < 50
    ORDER BY (doc->>'price')::numeric
""")

# Aggregation (like MongoDB aggregation pipeline, but standard SQL)
cur.execute("""
    SELECT
        doc->>'category' AS category,
        COUNT(*) AS count,
        AVG((doc->>'price')::numeric) AS avg_price,
        jsonb_agg(doc->>'name') AS product_names
    FROM products
    GROUP BY doc->>'category'
    HAVING COUNT(*) > 5
    ORDER BY avg_price DESC
""")

conn.commit()

When to Choose MongoDB

MongoDB is the right choice when:

Existing MongoDB expertise -- Your team is proficient in MQL, Mongoose, and the MongoDB ecosystem, and migration cost is not justified
Atlas managed cloud -- You want a fully managed database-as-a-service with built-in monitoring, backups, and scaling
Atlas-specific features -- You rely on Atlas Search, Atlas Vector Search, Atlas Data Lake, or Atlas Charts as an integrated cloud suite
Geospatial queries -- Your application heavily uses 2dsphere indexes and geospatial operators ($geoWithin, $near)
Change streams -- You have extensive real-time event processing built on MongoDB change streams
Petabyte sharding -- You need to shard across hundreds of nodes with MongoDB's mature sharding infrastructure

When to Choose HeliosDB

HeliosDB is the right choice when:

SQL + documents together -- You need relational JOINs across document data without $lookup limitations
Built-in vector search -- You need AI/RAG with vector similarity search without a separate cloud add-on
Self-hosted vector search -- Atlas Vector Search is cloud-only; HeliosDB runs anywhere
Time-travel queries -- You need to query historical data at any timestamp for auditing or compliance
Data branching -- You want git-like workflows for testing schema migrations, pricing changes, or A/B experiments
Embedded deployment -- You need a database inside your application (edge, IoT, desktop, CI/CD)
PostgreSQL compatibility -- You want to use standard PostgreSQL tooling (psql, pgAdmin, DBeaver, any PG driver)
Full ACID with savepoints -- You need nested transaction rollback, not just all-or-nothing
Encryption without Enterprise -- TDE and zero-knowledge encryption in every edition, not just Enterprise
Simpler operations -- A single 60MB binary instead of replica sets, config servers, and mongos routers

Migration Path

From MongoDB to HeliosDB

# Step 1: Export from MongoDB
from pymongo import MongoClient
import psycopg2
import json

mongo = MongoClient('mongodb://localhost:27017')
mongo_db = mongo['myapp']

# Step 2: Create equivalent tables in HeliosDB
pg = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = pg.cursor()

# Option A: Keep document-style (minimal changes)
cur.execute("""
    CREATE TABLE products (
        id TEXT PRIMARY KEY,
        doc JSONB NOT NULL,
        embedding VECTOR(384)
    )
""")

# Option B: Extract key fields into typed columns (recommended)
cur.execute("""
    CREATE TABLE products (
        id TEXT PRIMARY KEY,
        name TEXT NOT NULL,
        price NUMERIC(10,2) NOT NULL,
        category TEXT,
        tags TEXT[],
        specs JSONB DEFAULT '{}',
        embedding VECTOR(384),
        created_at TIMESTAMPTZ DEFAULT NOW()
    )
""")

# Step 3: Migrate data
for doc in mongo_db.products.find():
    # Option A: store entire document as JSONB
    doc_id = str(doc.pop('_id'))
    cur.execute(
        "INSERT INTO products (id, doc) VALUES (%s, %s)",
        (doc_id, json.dumps(doc, default=str))
    )

pg.commit()

Query Translation Cheat Sheet

MongoDB (MQL)	HeliosDB (SQL)
`db.c.find({x: 1})`	`SELECT * FROM c WHERE doc->>'x' = '1'`
`db.c.find({x: {$gt: 5}})`	`SELECT * FROM c WHERE (doc->>'x')::int > 5`
`db.c.find({tags: {$in: ["a"]}})`	`SELECT * FROM c WHERE doc->'tags' ? 'a'`
`db.c.find({"a.b": 1})`	`SELECT * FROM c WHERE doc->'a'->>'b' = '1'`
`db.c.aggregate([{$group: ...}])`	`SELECT ... GROUP BY ... HAVING ...`
`db.c.aggregate([{$lookup: ...}])`	`SELECT ... JOIN ... ON ...`
`db.c.countDocuments({x: 1})`	`SELECT COUNT(*) FROM c WHERE doc->>'x' = '1'`
`db.c.distinct("x")`	`SELECT DISTINCT doc->>'x' FROM c`
`db.c.updateOne({_id: 1}, {$set: {x: 2}})`	`UPDATE c SET doc = jsonb_set(doc, '{x}', '2') WHERE id = '1'`
`db.c.deleteMany({status: "old"})`	`DELETE FROM c WHERE doc->>'status' = 'old'`

Summary

Dimension	MongoDB	HeliosDB
Best at	Schema-flexible document storage at scale	Unified documents + relations + vectors + time-travel
Query language	MQL (MongoDB-specific)	Standard SQL (PostgreSQL-compatible)
Document support	Native BSON documents	JSONB with 20+ functions + typed columns
Vector search	Atlas add-on (cloud only)	Native HNSW + PQ + SIMD (everywhere)
Full-text search	Atlas Search (Lucene add-on)	Native `@@` operator (integrated with optimizer)
JOINs	`$lookup` (limited)	Full SQL JOINs (all types including LATERAL)
Transactions	Multi-document (since 4.0, no savepoints)	Full MVCC with savepoints and nested rollback
Time-travel	Not supported	`AT TIME` queries on any table
Branching	Not supported	Git-like CREATE/MERGE/DROP BRANCH
Encryption	Enterprise-only at-rest + CSFLE	TDE + ZKE in all editions
Embedded mode	Not available	60MB binary, in-process
Ecosystem	MongoDB drivers, Mongoose, Atlas cloud	PostgreSQL drivers, any PG tool, REST, gRPC

MongoDB is a proven, mature document database with a vast ecosystem and a strong managed cloud offering in Atlas. HeliosDB is for teams who have outgrown the "one database per concern" architecture and want documents, relations, vectors, full-text search, time-travel, and branching in a single ACID-compliant engine that speaks PostgreSQL.