The Document Database That Also Does Everything Else
MongoDB redefined how developers think about databases -- schema flexibility, horizontal scaling, and a developer-friendly document model made it the default NoSQL choice for over a decade. But modern applications demand more than documents. They need vector search for AI, full-text search for discovery, relational JOINs for analytics, time-travel for auditing, and ACID transactions that actually work across collections. HeliosDB delivers all of this in a single binary with a familiar PostgreSQL wire protocol, eliminating the "MongoDB for documents, Postgres for relations, Pinecone for vectors" architecture that has become the norm.
| Feature | MongoDB | HeliosDB |
|---|---|---|
| Data model | BSON documents in collections | Tables + JSONB + vectors + relational |
| Schema | Schema-optional (validation rules) | Schema-enforced + JSONB for flexibility |
| Query language | MongoDB Query Language (MQL) | Standard SQL (PostgreSQL-compatible) |
| Wire protocol | MongoDB wire protocol | PostgreSQL wire protocol + REST + gRPC |
| JOINs | $lookup (limited, no nested) | Full SQL JOINs (INNER, LEFT, RIGHT, FULL, CROSS, LATERAL) |
| Vector search | Atlas Vector Search (separate add-on) | Native HNSW + Product Quantization + SIMD |
| Full-text search | Atlas Search (Lucene-based add-on) | Native @@ operator, integrated with optimizer |
| ACID transactions | Multi-document (since 4.0, with caveats) | Full MVCC, snapshot isolation, savepoints |
| Time-travel queries | Not supported | SELECT * FROM t AT TIME '2026-01-01' |
| Data branching | Not supported | Git-like branches with merge |
| Encryption | At-rest (WiredTiger) + in-transit (TLS) | TDE (AES-256-GCM) + Zero-Knowledge Encryption |
| Embedded deployment | Not available (server-only) | 60MB binary, in-process or standalone |
| Replication | Replica sets (primary-secondary) | 3-tier: WAL streaming, multi-primary, sharding |
| Sharding | Hash or range-based, mongos router | Hash, range, or list partitioning + proxy |
| Change streams | Native (oplog-based) | WAL-based change feeds (HeliosDB-Full) |
| Aggregation | Aggregation pipeline (stages) | SQL: GROUP BY, window functions, CTEs, subqueries |
| Indexes | B-tree, hashed, text, 2dsphere, wildcard | ART (Adaptive Radix Tree), bloom filters, zone maps, HNSW |
When your application outgrows documents, you bolt on additional databases:
+--------------+ +--------------+ +--------------+
| MongoDB | | PostgreSQL | | Pinecone |
| (documents) | | (relational) | | (vectors) |
+------+-------+ +------+-------+ +------+-------+
| | |
+--------+-----------+------------+-------+
| |
+------+------+ +------+------+
| Application | | Sync Layer |
| (joins | | (keep DBs |
| in code) | | in sync) |
+-------------+ +-------------+
3+ databases to deploy and keep in sync
JOINs happen in application code
No cross-database transactions
Vector search is a separate billing dimension
+-------------------------------------------------+
| HeliosDB |
| |
| +----------+ +----------+ +--------------+ |
| | JSONB | |Relational| | Vector Search| |
| |20+ funcs | | Tables | | HNSW + PQ | |
| +----+-----+ +----+-----+ +------+-------+ |
| | | | |
| +----+-------------+---------------+----------+ |
| | MVCC Storage Engine + SQL Optimizer | |
| | (ACID + TDE + WAL + Branching + FTS) | |
| +---------------------------------------------+ |
| |
| PostgreSQL wire protocol (psql, any PG driver) |
+-------------------------------------------------+
1 binary, 60MB, embeddable or clustered
Documents + relations + vectors in one ACID transaction
Standard SQL -- every developer already knows it
// MongoDB: flexible schema, but no guarantees without validation
db.products.insertMany([
{
name: "Widget",
price: 29.99,
tags: ["hardware", "bestseller"],
specs: { weight: 0.5, color: "blue" }
},
{
// Oops: "price" is a string, "tags" is missing
name: "Gadget",
price: "forty-two",
specifications: { mass: 1.2 } // different field name
}
]);
// Schema validation (added later, optional)
db.createCollection("products", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "price"],
properties: {
name: { bsonType: "string" },
price: { bsonType: "number" }
}
}
}
});
-- Structured columns enforce data quality
-- JSONB column provides document-style flexibility
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL CHECK (price > 0),
tags TEXT[] DEFAULT '{}',
specs JSONB DEFAULT '{}',
embedding VECTOR(384),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Type-safe inserts: "forty-two" would be rejected immediately
INSERT INTO products (name, price, tags, specs, embedding)
VALUES (
'Widget',
29.99,
ARRAY['hardware', 'bestseller'],
'{"weight": 0.5, "color": "blue", "dimensions": {"l": 10, "w": 5, "h": 3}}',
'[0.12, -0.45, 0.89, ...]'::vector(384)
);
-- Best of both worlds:
-- Structured data (name, price) has schema enforcement
-- Flexible data (specs) uses JSONB with 20+ query functions
-- Vector embeddings stored alongside for AI search
// Find products with nested field filter
db.products.find({
"specs.weight": { $lt: 1.0 },
tags: { $in: ["hardware"] },
price: { $gte: 10, $lte: 50 }
}).sort({ price: -1 }).limit(10);
// Aggregation pipeline: multi-stage data transformation
db.orders.aggregate([
{ $match: { status: "completed", date: { $gte: ISODate("2026-01-01") } } },
{ $unwind: "$items" },
{ $group: {
_id: "$items.product_id",
total_revenue: { $sum: { $multiply: ["$items.price", "$items.qty"] } },
order_count: { $sum: 1 }
}},
{ $sort: { total_revenue: -1 } },
{ $limit: 20 },
// $lookup is MongoDB's JOIN -- limited to one collection, no nesting
{ $lookup: {
from: "products",
localField: "_id",
foreignField: "_id",
as: "product"
}},
{ $unwind: "$product" },
{ $project: {
product_name: "$product.name",
total_revenue: 1,
order_count: 1
}}
]);
-- Same filter query: readable, standard SQL
SELECT name, price, specs->>'weight' AS weight
FROM products
WHERE (specs->>'weight')::numeric < 1.0
AND 'hardware' = ANY(tags)
AND price BETWEEN 10 AND 50
ORDER BY price DESC
LIMIT 10;
-- Same aggregation: SQL with JOINs (no pipeline gymnastics)
SELECT
p.name AS product_name,
SUM(oi.price * oi.qty) AS total_revenue,
COUNT(DISTINCT o.id) AS order_count
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.status = 'completed'
AND o.order_date >= '2026-01-01'
GROUP BY p.name
ORDER BY total_revenue DESC
LIMIT 20;
-- HeliosDB advantage: CTEs and window functions
WITH monthly_revenue AS (
SELECT
p.name,
date_trunc('month', o.order_date) AS month,
SUM(oi.price * oi.qty) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.order_date >= '2025-01-01'
GROUP BY p.name, date_trunc('month', o.order_date)
)
SELECT
name,
month,
revenue,
LAG(revenue) OVER (PARTITION BY name ORDER BY month) AS prev_month,
ROUND(
(revenue - LAG(revenue) OVER (PARTITION BY name ORDER BY month))
/ NULLIF(LAG(revenue) OVER (PARTITION BY name ORDER BY month), 0) * 100,
2
) AS growth_pct
FROM monthly_revenue
ORDER BY name, month;
HeliosDB's JSONB support is not a thin wrapper -- it provides full PostgreSQL-compatible JSONB with over 20 functions and operators:
-- Navigation and extraction
SELECT
data->'address'->>'city' AS city, -- nested extraction
data#>>'{contacts,0,email}' AS first_email, -- path extraction
jsonb_extract_path(data, 'address', 'zip') AS zip,
jsonb_typeof(data->'age') AS age_type -- "number"
FROM customers;
-- Containment and existence
SELECT * FROM customers
WHERE data @> '{"tier": "enterprise"}' -- contains
AND data ? 'phone' -- key exists
AND data ?| ARRAY['email', 'phone']; -- any key exists
-- Modification (immutable: returns new document)
UPDATE customers
SET data = jsonb_set(
data,
'{address,verified}',
'true'::jsonb
)
WHERE id = 42;
-- Aggregation: build JSON from query results
SELECT jsonb_agg(
jsonb_build_object(
'name', name,
'total', order_total
)
) AS summary
FROM (
SELECT c.name, SUM(o.total) AS order_total
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.name
ORDER BY order_total DESC
LIMIT 5
) top_customers;
-- Expand JSONB arrays and objects
SELECT
p.name,
tag.value AS tag
FROM products p,
jsonb_array_elements_text(p.data->'tags') AS tag
WHERE tag.value LIKE '%sale%';
-- JSONB in JOINs with relational data (impossible in MongoDB without $lookup)
SELECT
c.name,
c.data->>'tier' AS tier,
COUNT(o.id) AS orders,
SUM(o.total) AS revenue,
AVG(o.total) AS avg_order
FROM customers c
JOIN orders o ON o.customer_id = c.id
LEFT JOIN shipments s ON s.order_id = o.id
WHERE c.data @> '{"region": "EU"}'
AND o.order_date >= '2026-01-01'
GROUP BY c.name, c.data->>'tier'
HAVING SUM(o.total) > 10000
ORDER BY revenue DESC;
// MongoDB Atlas Vector Search requires:
// 1. Atlas cluster (cloud-only, not self-hosted)
// 2. Separate search index definition
// 3. $vectorSearch aggregation stage (Atlas-only)
// Create search index (Atlas UI or API, not MQL)
// { "type": "vectorSearch", "fields": [{ "path": "embedding", ... }] }
// Query using aggregation pipeline
db.documents.aggregate([
{
$vectorSearch: {
index: "vector_index",
path: "embedding",
queryVector: [0.12, -0.45, 0.89 /* ... */],
numCandidates: 100,
limit: 10
}
},
{
$project: {
title: 1,
content: 1,
score: { $meta: "vectorSearchScore" }
}
}
]);
// Limitations:
// - Atlas cloud only (not available in self-hosted MongoDB)
// - Separate index type, separate billing
// - Cannot combine with $match in same pipeline stage
// - No hybrid vector + full-text in single operation
// - Limited to Atlas M10+ clusters
-- Create HNSW vector index (works everywhere: embedded, server, cluster)
CREATE INDEX idx_docs_embedding ON documents USING HNSW (embedding);
-- Vector similarity search: standard SQL
SELECT title, content,
embedding <-> '[0.12, -0.45, 0.89, ...]'::vector(384) AS distance
FROM documents
ORDER BY embedding <-> '[0.12, -0.45, 0.89, ...]'::vector(384)
LIMIT 10;
-- Hybrid search: vector + full-text + SQL filters in ONE query
SELECT title, content,
embedding <-> $1::vector(384) AS semantic_distance
FROM documents
WHERE content @@ 'database AND transactions'
AND metadata @> '{"category": "technical"}'
AND created_at >= '2026-01-01'
ORDER BY embedding <-> $1::vector(384)
LIMIT 10;
-- Vector search across JOINed data (impossible in MongoDB)
SELECT
d.title,
d.content,
a.name AS author,
c.name AS category,
d.embedding <-> $1::vector(384) AS distance
FROM documents d
JOIN authors a ON a.id = d.author_id
JOIN categories c ON c.id = d.category_id
WHERE c.name IN ('engineering', 'research')
ORDER BY d.embedding <-> $1::vector(384)
LIMIT 10;
| Aspect | MongoDB Atlas Vector Search | HeliosDB Vector Search |
|---|---|---|
| Availability | Atlas cloud only (M10+) | Everywhere: embedded, server, cluster |
| Index type | Separate vector index | HNSW (native, integrated with optimizer) |
| Quantization | Not available | Product Quantization for memory efficiency |
| SIMD acceleration | Not documented | Native SIMD for similarity computation |
| Hybrid search | Separate stages | Single query: vector + FTS + SQL |
| JOIN support | No (aggregation only) | Full SQL JOINs with vector ORDER BY |
| Self-hosted | Not available | Yes (60MB binary) |
| ACID | Not guaranteed in search | Full MVCC transaction isolation |
MongoDB has no equivalent for either feature.
// MongoDB has no time-travel capability
// To query historical data, you must:
// 1. Implement change data capture (CDC) manually
// 2. Store snapshots in a separate collection
// 3. Use change streams + application logic
// Change stream (captures changes, but doesn't query history)
const changeStream = db.collection('orders').watch();
changeStream.on('change', (change) => {
// Save to history collection manually
db.collection('orders_history').insertOne({
...change.fullDocument,
_change_type: change.operationType,
_changed_at: new Date()
});
});
// Querying "what was the state at time X?" requires
// replaying all changes up to that timestamp -- expensive and complex
-- Time-travel: query data as it existed at any timestamp
SELECT * FROM orders AT TIME '2026-01-15 09:30:00';
-- Audit trail: compare before and after
SELECT
a.order_id,
a.status AS status_before,
b.status AS status_after,
a.total AS total_before,
b.total AS total_after
FROM orders AT TIME '2026-02-01' a
JOIN orders b ON a.order_id = b.order_id
WHERE a.status != b.status;
-- Data branching: git-like workflows for data
CREATE BRANCH pricing_test FROM main;
CHECKOUT pricing_test;
-- Experiment with pricing changes
UPDATE products SET price = price * 0.9 WHERE category = 'seasonal';
-- Run analytics on the experimental data
SELECT category, AVG(price) AS avg_price, COUNT(*) AS products
FROM products
GROUP BY category;
-- Safe: production data on 'main' is completely untouched
CHECKOUT main;
SELECT category, AVG(price) FROM products GROUP BY category;
-- (original prices, unchanged)
-- Merge if satisfied, or drop the branch
MERGE pricing_test INTO main;
-- or: DROP BRANCH pricing_test;
// B-tree (default)
db.orders.createIndex({ customer_id: 1, order_date: -1 });
// Compound index
db.orders.createIndex({ status: 1, total: -1 });
// Text index (one per collection limitation)
db.products.createIndex({ name: "text", description: "text" });
// 2dsphere (geospatial)
db.locations.createIndex({ coordinates: "2dsphere" });
// Wildcard (all fields in document)
db.logs.createIndex({ "$**": 1 });
-- ART (Adaptive Radix Tree): faster than B-tree for skewed distributions
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Composite index
CREATE INDEX idx_orders_status ON orders (status, total DESC);
-- HNSW vector index (no equivalent in MongoDB without Atlas)
CREATE INDEX idx_products_embedding ON products USING HNSW (embedding);
-- HeliosDB also uses automatically:
-- Bloom filters: fast negative lookups (key definitely not in block)
-- Zone maps: min/max per data block for range query pruning
-- These are transparent -- no CREATE INDEX needed
| Index Type | MongoDB | HeliosDB |
|---|---|---|
| Primary key | B-tree on _id | ART on PRIMARY KEY |
| Secondary | B-tree | ART |
| Text search | 1 text index per collection | FTS on any text column (unlimited) |
| Vector | Atlas Vector Search (cloud only) | HNSW (native, everywhere) |
| Geospatial | 2dsphere, 2d | Not built-in |
| Bloom filter | Not available | Automatic per data block |
| Zone maps | Not available | Automatic min/max per block |
| Wildcard | $** (all fields) | Not applicable (schema-defined) |
const session = client.startSession();
try {
session.startTransaction({
readConcern: { level: "snapshot" },
writeConcern: { w: "majority" }
});
await orders.insertOne({ customer_id: 42, total: 100 }, { session });
await inventory.updateOne(
{ product_id: 7, stock: { $gte: 1 } },
{ $inc: { stock: -1 } },
{ session }
);
await accounts.updateOne(
{ id: 42 },
{ $inc: { balance: -100 } },
{ session }
);
await session.commitTransaction();
} catch (error) {
await session.abortTransaction();
} finally {
session.endSession();
}
// Caveats:
// - 60-second default timeout (TransactionLifetimeLimitSeconds)
// - Performance overhead (~2x latency vs non-transactional)
// - No savepoints (all-or-nothing only)
// - Oplog size limits for large transactions
// - Retryable writes required for production
BEGIN;
-- Insert order
INSERT INTO orders (customer_id, total) VALUES (42, 100.00);
-- Create savepoint before inventory update
SAVEPOINT before_inventory;
-- Update inventory
UPDATE inventory SET stock = stock - 1 WHERE product_id = 7 AND stock >= 1;
-- If inventory was insufficient, roll back just that part
-- ROLLBACK TO SAVEPOINT before_inventory;
-- Update account balance
UPDATE accounts SET balance = balance - 100.00 WHERE id = 42;
-- Nested savepoint for audit logging
SAVEPOINT before_audit;
INSERT INTO audit_log (action, details)
VALUES ('purchase', '{"customer": 42, "total": 100.00}');
COMMIT;
-- Transactions in HeliosDB:
-- - No arbitrary timeout limits
-- - MVCC: readers never block writers
-- - Savepoints for partial rollback
-- - Snapshot isolation by default
-- - WAL-backed durability with fsync
| Aspect | MongoDB | HeliosDB |
|---|---|---|
| Multi-document | Yes (since 4.0) | Yes (native MVCC) |
| Savepoints | No | Yes (SAVEPOINT / ROLLBACK TO) |
| Isolation level | Snapshot (read concern) | Snapshot isolation (default) |
| Timeout | 60s default | Configurable, no arbitrary limit |
| Performance overhead | ~2x latency | Minimal (MVCC is the default path) |
| Cross-shard | Yes (with overhead) | Yes (via sharding tier) |
| Nested transactions | No | Yes (via savepoints) |
// MongoDB encryption options:
// 1. At-rest: WiredTiger encrypts storage files (Enterprise only)
// 2. In-transit: TLS for client connections
// 3. CSFLE: Client-Side Field-Level Encryption (Enterprise or Atlas)
// CSFLE example (complex setup required)
const client = new MongoClient(uri, {
autoEncryption: {
keyVaultNamespace: "encryption.__keyVault",
kmsProviders: {
aws: { accessKeyId: "...", secretAccessKey: "..." }
},
schemaMap: {
"mydb.patients": {
bsonType: "object",
properties: {
ssn: {
encrypt: {
bsonType: "string",
algorithm: "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"
}
}
}
}
}
}
});
// Limitations:
// - At-rest encryption: Enterprise edition only
// - CSFLE: complex schema map configuration
// - No encryption of indexes (query patterns leak information)
// - No zero-knowledge option (server can read non-CSFLE fields)
-- TDE: transparent, everything encrypted at rest
-- No application code changes, no schema annotations
-- Data pages, WAL entries, indexes -- all encrypted with AES-256-GCM
INSERT INTO patients (name, ssn, diagnosis)
VALUES ('Jane Doe', '123-45-6789', 'confidential');
-- Queries work transparently -- decryption happens automatically
SELECT * FROM patients WHERE name = 'Jane Doe';
-- Zero-Knowledge Encryption: even the database admin cannot read data
-- Application holds the encryption key
-- HeliosDB processes encrypted data without seeing plaintext
-- Ring + AWS-LC FIPS-validated providers
| Aspect | MongoDB | HeliosDB |
|---|---|---|
| At-rest encryption | Enterprise only (WiredTiger) | All editions (TDE, AES-256-GCM) |
| Scope | Storage files | Data pages + WAL + indexes |
| Field-level encryption | CSFLE (complex setup) | TDE encrypts everything transparently |
| Zero-knowledge | No (server sees non-CSFLE data) | Yes (application-held keys) |
| Index encryption | No | Yes |
| FIPS compliance | Depends on provider | Ring + AWS-LC FIPS providers |
| Configuration | Schema maps, KMS setup | Single config option |
MongoDB is a server-only database. There is no way to embed it inside your application process.
Application Process MongoDB Server (separate process)
+-----------------+ +----------------------+
| Your App | TCP | mongod |
| (Node/Python) | ------> | (1GB+ RAM minimum) |
| | | (WiredTiger engine) |
+-----------------+ +----------------------+
- Minimum resource: ~1GB RAM for mongod
- Separate deployment, monitoring, upgrades
- Network round-trip for every operation
- Cannot run on IoT/edge devices
# Embedded mode: database runs inside your application process
# No separate server, no network round-trips, no deployment overhead
import psycopg2
# Option 1: Connect to HeliosDB server (like MongoDB)
conn = psycopg2.connect("host=localhost port=5432 dbname=myapp")
# Option 2: HeliosDB as embedded library (no server needed)
# The database engine runs in-process
# 60MB binary, starts in milliseconds
# Perfect for: edge devices, IoT, desktop apps, CI/CD, testing
// TypeScript/Node.js: connect via PostgreSQL protocol
import { Client } from 'pg';
const client = new Client({
host: 'localhost',
port: 5432,
database: 'myapp',
});
await client.connect();
// Every PostgreSQL client library works out of the box
// No MongoDB driver, no Mongoose, no special ODM required
const result = await client.query(
'SELECT * FROM products WHERE data @> $1 LIMIT 10',
[JSON.stringify({ category: 'electronics' })]
);
// Go: standard database/sql with any PostgreSQL driver
package main
import (
"database/sql"
_ "github.com/lib/pq"
)
func main() {
db, _ := sql.Open("postgres", "host=localhost port=5432 dbname=myapp sslmode=disable")
defer db.Close()
rows, _ := db.Query(`
SELECT name, data->>'category' AS category
FROM products
WHERE data @> '{"in_stock": true}'
ORDER BY (data->>'price')::numeric DESC
LIMIT 20
`)
// Standard Go database patterns -- nothing MongoDB-specific to learn
}
Replica Set:
+----------+ +----------+ +----------+
| Primary |---->|Secondary |---->|Secondary |
| (R/W) | | (R/O) | | (R/O) |
+----------+ +----------+ +----------+
Oplog-based async replication
Automatic failover (election)
Sharded Cluster:
+---------+ +---------+ +---------+
| mongos | | mongos | | mongos | (routers)
+----+----+ +----+----+ +----+----+
+------+-----+------+-----+
+----------+ +----------+ +----------+
| Shard 1 | | Shard 2 | | Shard 3 |
| (replica | | (replica | | (replica |
| set) | | set) | | set) |
+----------+ +----------+ +----------+
+ Config servers (another replica set)
Minimum for sharded HA: 13 processes
(3 config, 3x3 shard replicas, 1+ mongos)
Tier 1: WAL Streaming (simple HA)
+----------+ WAL stream +----------+
| Primary | ---------------> | Replica |
| (R/W) | | (R/O) |
+----------+ +----------+
Minimum for HA: 2 nodes
Tier 2: Multi-Primary (write scaling)
+----------+ <-----------> +----------+
| Primary1 | | Primary2 |
| (R/W) | | (R/W) |
+----------+ +----------+
Both nodes accept writes
Tier 3: Sharding (horizontal scaling)
+----------+ +----------+ +----------+
| Shard 1 | | Shard 2 | | Shard 3 |
|(hash/rng | |(hash/rng | |(hash/rng |
| /list) | | /list) | | /list) |
+----------+ +----------+ +----------+
+ Proxy with 13 built-in features
(connection pooling, query routing, failover, etc.)
| Aspect | MongoDB | HeliosDB |
|---|---|---|
| Basic HA | 3 nodes (replica set) | 2 nodes (primary + replica) |
| Sharded HA | 13+ processes | 3+ shards + proxy |
| Replication method | Oplog (async) | WAL streaming (sync or async) |
| Multi-primary | No (single primary per replica set) | Yes (Tier 2) |
| Shard key selection | Must choose carefully, hard to change | Hash, range, or list partitioning |
| Config servers | Separate replica set required | Not needed (proxy handles routing) |
| Resharding | Online (since 6.0, but slow) | Partition management via SQL |
For teams migrating from MongoDB, HeliosDB-Full provides a DocumentStore API with familiar concepts:
# HeliosDB-Full DocumentStore: MongoDB-style API
# Collections, queries, aggregation pipeline, validation, change streams
import psycopg2
import json
conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()
# Create a "collection" (table with JSONB document column)
cur.execute("""
CREATE TABLE IF NOT EXISTS products (
id SERIAL PRIMARY KEY,
doc JSONB NOT NULL,
embedding VECTOR(384),
created_at TIMESTAMPTZ DEFAULT NOW()
)
""")
# Insert documents (like db.collection.insertOne)
cur.execute("""
INSERT INTO products (doc, embedding)
VALUES (%s, %s)
""", (
json.dumps({
"name": "Widget",
"price": 29.99,
"tags": ["hardware"],
"specs": {"weight": 0.5}
}),
"[0.12, -0.45, 0.89]"
))
# Query with containment (like MongoDB $match)
cur.execute("""
SELECT doc->>'name' AS name, doc->>'price' AS price
FROM products
WHERE doc @> '{"tags": ["hardware"]}'
AND (doc->>'price')::numeric < 50
ORDER BY (doc->>'price')::numeric
""")
# Aggregation (like MongoDB aggregation pipeline, but standard SQL)
cur.execute("""
SELECT
doc->>'category' AS category,
COUNT(*) AS count,
AVG((doc->>'price')::numeric) AS avg_price,
jsonb_agg(doc->>'name') AS product_names
FROM products
GROUP BY doc->>'category'
HAVING COUNT(*) > 5
ORDER BY avg_price DESC
""")
conn.commit()
MongoDB is the right choice when:
$geoWithin, $near)HeliosDB is the right choice when:
$lookup limitations# Step 1: Export from MongoDB
from pymongo import MongoClient
import psycopg2
import json
mongo = MongoClient('mongodb://localhost:27017')
mongo_db = mongo['myapp']
# Step 2: Create equivalent tables in HeliosDB
pg = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = pg.cursor()
# Option A: Keep document-style (minimal changes)
cur.execute("""
CREATE TABLE products (
id TEXT PRIMARY KEY,
doc JSONB NOT NULL,
embedding VECTOR(384)
)
""")
# Option B: Extract key fields into typed columns (recommended)
cur.execute("""
CREATE TABLE products (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL,
category TEXT,
tags TEXT[],
specs JSONB DEFAULT '{}',
embedding VECTOR(384),
created_at TIMESTAMPTZ DEFAULT NOW()
)
""")
# Step 3: Migrate data
for doc in mongo_db.products.find():
# Option A: store entire document as JSONB
doc_id = str(doc.pop('_id'))
cur.execute(
"INSERT INTO products (id, doc) VALUES (%s, %s)",
(doc_id, json.dumps(doc, default=str))
)
pg.commit()
| MongoDB (MQL) | HeliosDB (SQL) |
|---|---|
db.c.find({x: 1}) | SELECT * FROM c WHERE doc->>'x' = '1' |
db.c.find({x: {$gt: 5}}) | SELECT * FROM c WHERE (doc->>'x')::int > 5 |
db.c.find({tags: {$in: ["a"]}}) | SELECT * FROM c WHERE doc->'tags' ? 'a' |
db.c.find({"a.b": 1}) | SELECT * FROM c WHERE doc->'a'->>'b' = '1' |
db.c.aggregate([{$group: ...}]) | SELECT ... GROUP BY ... HAVING ... |
db.c.aggregate([{$lookup: ...}]) | SELECT ... JOIN ... ON ... |
db.c.countDocuments({x: 1}) | SELECT COUNT(*) FROM c WHERE doc->>'x' = '1' |
db.c.distinct("x") | SELECT DISTINCT doc->>'x' FROM c |
db.c.updateOne({_id: 1}, {$set: {x: 2}}) | UPDATE c SET doc = jsonb_set(doc, '{x}', '2') WHERE id = '1' |
db.c.deleteMany({status: "old"}) | DELETE FROM c WHERE doc->>'status' = 'old' |
| Dimension | MongoDB | HeliosDB |
|---|---|---|
| Best at | Schema-flexible document storage at scale | Unified documents + relations + vectors + time-travel |
| Query language | MQL (MongoDB-specific) | Standard SQL (PostgreSQL-compatible) |
| Document support | Native BSON documents | JSONB with 20+ functions + typed columns |
| Vector search | Atlas add-on (cloud only) | Native HNSW + PQ + SIMD (everywhere) |
| Full-text search | Atlas Search (Lucene add-on) | Native @@ operator (integrated with optimizer) |
| JOINs | $lookup (limited) | Full SQL JOINs (all types including LATERAL) |
| Transactions | Multi-document (since 4.0, no savepoints) | Full MVCC with savepoints and nested rollback |
| Time-travel | Not supported | AT TIME queries on any table |
| Branching | Not supported | Git-like CREATE/MERGE/DROP BRANCH |
| Encryption | Enterprise-only at-rest + CSFLE | TDE + ZKE in all editions |
| Embedded mode | Not available | 60MB binary, in-process |
| Ecosystem | MongoDB drivers, Mongoose, Atlas cloud | PostgreSQL drivers, any PG tool, REST, gRPC |
MongoDB is a proven, mature document database with a vast ecosystem and a strong managed cloud offering in Atlas. HeliosDB is for teams who have outgrown the "one database per concern" architecture and want documents, relations, vectors, full-text search, time-travel, and branching in a single ACID-compliant engine that speaks PostgreSQL.
Get started with HeliosDB in minutes. Open source, free to use.