HeliosDB vs MinIO

MinIO is the gold standard for self-hosted S3-compatible object storage -- fast, reliable, and battle-tested at petabyte scale. But storing objects and understanding them are two very different problems. HeliosDB is a unified database engine that combines SQL, full-text search, vector search, ACID transactions, and AI-native capabilities in a single binary. When your application needs to query, search, and reason about the data inside those objects, MinIO alone falls short -- and HeliosDB fills that gap. Better yet, HeliosDB-Full can query MinIO directly via its S3 Foreign Data Wrapper, giving you the best of both worlds.

Quick Comparison

Feature	MinIO	HeliosDB
Primary purpose	S3-compatible object/blob storage	Unified SQL database + documents + vectors
Query language	S3 Select (CSV/JSON/Parquet filtering)	Full SQL (JOINs, CTEs, window functions, subqueries)
Full-text search	Not built-in (requires OpenSearch/MeiliSearch)	Native `@@` operator with boolean, phrase, fuzzy
Vector search	Not supported (requires Milvus/Weaviate)	Native HNSW indexes with SIMD + Product Quantization
ACID transactions	Single-object atomicity only	Full multi-statement ACID with MVCC
Schema enforcement	None (opaque blobs)	Typed columns, constraints, foreign keys
Encryption	SSE-S3, SSE-KMS, SSE-C (object-level)	TDE (AES-256-GCM) + Zero-Knowledge Encryption
Data branching	Not supported	Git-like branching with merge
Time-travel queries	Object versioning (linear)	MVCC snapshots at any timestamp
Replication	Erasure coding + site replication	WAL streaming, multi-primary, sharding
Deployment size	60MB binary, 4+ nodes for HA	60MB binary, single-node to sharded cluster
Wire protocol	S3 API (REST)	PostgreSQL wire protocol + REST + gRPC

The Architecture Problem

With MinIO: The Object Storage Stack

When you need to search, query, or analyze data stored in MinIO, you end up building a pipeline of external services:

+--------------+     +--------------+     +--------------+
|   MinIO      |---->|  ETL / Spark |---->|  Data        |
|  (storage)   |     |  (transform) |     |  Warehouse   |
+--------------+     +--------------+     +--------------+
       |                                         |
       |              +--------------+           |
       +----------->  |  OpenSearch  |<----------+
                      |  (FTS)       |
                      +--------------+
       |              +--------------+
       +----------->  |  Milvus      |
                      |  (vectors)   |
                      +--------------+

 5 services to deploy, monitor, and keep in sync
 No ACID guarantees across the pipeline
 Data freshness measured in minutes to hours

With HeliosDB: One Engine, Everything Built In

+---------------------------------------------+
|                 HeliosDB                     |
|                                             |
|  +---------+  +------+  +--------------+   |
|  |   SQL   |  |  FTS |  | Vector Search|   |
|  |  Engine |  |  @@  |  |   HNSW+PQ    |   |
|  +----+----+  +--+---+  +------+-------+   |
|       |          |              |            |
|  +----+----------+--------------+----------+ |
|  |        MVCC Storage Engine              | |
|  |    (ACID + TDE + WAL + Branching)       | |
|  +-----------------------------------------+ |
|                                             |
|  Optional: S3 FDW --> MinIO (bulk storage) |
+---------------------------------------------+

 1 binary, everything ACID-compliant
 Query, search, and analyze in one SQL statement
 Real-time: data available instantly on INSERT

S3 Select vs Full SQL Optimizer

MinIO supports S3 Select for basic filtering of CSV, JSON, and Parquet objects. HeliosDB provides a complete SQL engine with a cost-based optimizer, parallel execution, and 69+ plan node types.

MinIO: S3 Select

import boto3

s3 = boto3.client('s3', endpoint_url='http://minio:9000',
                  aws_access_key_id='minioadmin',
                  aws_secret_access_key='minioadmin')

# S3 Select: filter a single CSV object
result = s3.select_object_content(
    Bucket='sales-data',
    Key='2026/01/transactions.csv',
    Expression="SELECT s.amount FROM s3object s WHERE s.region = 'EU'",
    ExpressionType='SQL',
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'CSV': {}}
)

# Limitations:
#  - Cannot JOIN across objects
#  - Cannot aggregate (no SUM/AVG/COUNT)
#  - Cannot use subqueries or CTEs
#  - One file at a time
#  - No window functions

HeliosDB: Full SQL with Optimizer

-- Connect via any PostgreSQL client (psql, pgAdmin, DBeaver, etc.)

-- Complex analytics query: JOINs, CTEs, window functions, aggregation
WITH monthly_sales AS (
    SELECT
        region,
        date_trunc('month', sale_date) AS month,
        SUM(amount) AS total,
        COUNT(*) AS txn_count
    FROM transactions
    WHERE sale_date >= '2026-01-01'
    GROUP BY region, date_trunc('month', sale_date)
),
ranked AS (
    SELECT *,
        RANK() OVER (PARTITION BY month ORDER BY total DESC) AS region_rank,
        LAG(total) OVER (PARTITION BY region ORDER BY month) AS prev_month
    FROM monthly_sales
)
SELECT
    region,
    month,
    total,
    txn_count,
    region_rank,
    ROUND((total - prev_month) / prev_month * 100, 2) AS growth_pct
FROM ranked
WHERE region_rank <= 3
ORDER BY month, region_rank;

-- HeliosDB-Full: Query Parquet files on MinIO via S3 Foreign Data Wrapper
CREATE FOREIGN TABLE minio_logs (
    timestamp TIMESTAMPTZ,
    user_id   INTEGER,
    action    TEXT,
    payload   JSONB
) SERVER s3_server
OPTIONS (
    bucket   'analytics',
    key      'logs/2026/*.parquet',
    format   'parquet'
);

-- JOIN MinIO data with local tables in one query
SELECT u.name, COUNT(*) AS actions, MAX(l.timestamp) AS last_seen
FROM minio_logs l
JOIN users u ON u.id = l.user_id
WHERE l.timestamp > NOW() - INTERVAL '7 days'
GROUP BY u.name
ORDER BY actions DESC
LIMIT 20;

Encryption: SSE vs TDE + Zero-Knowledge

MinIO: Server-Side Encryption

MinIO offers three encryption modes, all operating at the object level:

Mode	Key Management	Scope
SSE-S3	MinIO-managed keys	Per-object
SSE-KMS	External KMS (Vault, AWS KMS)	Per-object
SSE-C	Client-provided keys per request	Per-object

# MinIO SSE-C: client must send key with every request
from minio import Minio
from minio.sse import SseCustomerKey
import os

key = os.urandom(32)
sse = SseCustomerKey(key)

client = Minio('minio:9000', access_key='admin', secret_key='password', secure=False)
client.put_object('secure-bucket', 'doc.pdf', data, length, sse=sse)

# Problem: lose the key, lose the data
# Problem: key transmitted over the wire on every request
# Problem: no encryption of indexes or metadata queries

HeliosDB: Transparent Data Encryption + Zero-Knowledge Encryption

-- TDE: everything encrypted at rest automatically (AES-256-GCM)
-- WAL entries, data pages, indexes -- all encrypted transparently
-- No application code changes needed

INSERT INTO medical_records (patient_id, diagnosis, notes)
VALUES (42, 'confidential', 'encrypted at rest automatically');

-- Queries work normally -- decryption is transparent
SELECT * FROM medical_records WHERE patient_id = 42;

-- Zero-Knowledge Encryption: even the DB admin cannot read data
-- Application holds the key; HeliosDB never sees plaintext
-- Ring + AWS-LC FIPS-validated cryptographic providers

Aspect	MinIO SSE	HeliosDB TDE + ZKE
Granularity	Per-object	Entire database (pages, WAL, indexes)
Transparency	Application must specify per request	Fully transparent to queries
Index encryption	No (metadata in plaintext)	Yes (encrypted indexes)
Key rotation	Manual per-object re-encryption	Online key rotation
Zero-knowledge option	No	Yes (application-held keys)
FIPS compliance	Depends on KMS	Ring + AWS-LC FIPS providers
Algorithm	AES-256 (SSE-S3) or client choice	AES-256-GCM

Document Intelligence: Opaque Blobs vs Queryable Data

MinIO: Store and Retrieve

# Upload a JSON document to MinIO
client.put_object('documents', 'invoice-1234.json',
    io.BytesIO(json.dumps(invoice).encode()),
    content_type='application/json')

# To search this document, you need:
# 1. Download it
# 2. Parse it
# 3. Index it in OpenSearch/MeiliSearch
# 4. Query the search engine
# 5. Fetch the matching objects back from MinIO
# Total: 5 steps, 3 services, no ACID guarantees

HeliosDB: Store, Query, Search, Analyze -- All in SQL

-- Store the document as JSONB with full queryability
INSERT INTO invoices (id, data, embedding)
VALUES (
    1234,
    '{"vendor": "Acme Corp", "total": 4500.00,
      "items": [{"desc": "Widget", "qty": 100, "price": 45.00}],
      "tags": ["hardware", "bulk"]}',
    '[0.12, -0.45, 0.89, ...]'::vector(384)
);

-- Query deeply nested JSON fields
SELECT
    data->>'vendor' AS vendor,
    (data->>'total')::numeric AS total,
    jsonb_array_length(data->'items') AS line_items
FROM invoices
WHERE data @> '{"tags": ["hardware"]}'
  AND (data->>'total')::numeric > 1000
ORDER BY total DESC;

-- Full-text search across all invoice data
SELECT id, data->>'vendor' AS vendor
FROM invoices
WHERE data::text @@ 'widget AND bulk';

-- Vector similarity search (find semantically similar invoices)
SELECT id, data->>'vendor' AS vendor,
       embedding <-> query_embedding AS distance
FROM invoices
ORDER BY embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384)
LIMIT 5;

-- Hybrid search: combine text + vector + SQL filters
SELECT id, data->>'vendor' AS vendor
FROM invoices
WHERE data::text @@ 'hardware'
  AND embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384) < 0.3
  AND (data->>'total')::numeric > 500
ORDER BY embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384);

Replication and High Availability

MinIO: Erasure Coding + Site Replication

MinIO Cluster (4-32 nodes)
+-----+  +-----+  +-----+  +-----+
| N1  |  | N2  |  | N3  |  | N4  |
|EC 4 |  |EC 4 |  |EC 4 |  |EC 4 |
+--+--+  +--+--+  +--+--+  +--+--+
   +----+----+----+----+----+--+
        |  Erasure Set (16 drives)
        |  Survives N/2 drive failures

Site Replication:
  Site A <--------> Site B
  (async bucket/object/IAM sync)

Erasure coding: survives up to N/2 drive failures per set
Site replication: async, bucket-level, eventual consistency
No cross-object transactions during replication
Minimum 4 nodes for production HA

HeliosDB: 3-Tier Replication Architecture

Tier 1: WAL Streaming (Primary -> Replica)
+----------+    WAL Stream    +----------+
| Primary  | ---------------> | Replica  |
| (R/W)    |                  | (R/O)    |
+----------+                  +----------+
  Automatic failover, zero data loss

Tier 2: Multi-Primary (All nodes accept writes)
+----------+ <-----------> +----------+
| Primary1 |              | Primary2 |
| (R/W)    |              | (R/W)    |
+----------+              +----------+
  Conflict-free replication, write anywhere

Tier 3: Sharding (Horizontal partitioning)
+----------+  +----------+  +----------+
| Shard 1  |  | Shard 2  |  | Shard 3  |
| keys A-M |  | keys N-S |  | keys T-Z |
+----------+  +----------+  +----------+
  Linear write scaling by shard key

Aspect	MinIO	HeliosDB
Data protection	Erasure coding (N/2 failures)	WAL + replicas + fsync
Multi-site	Site replication (async)	WAL streaming or multi-primary
Consistency	Eventual (site replication)	Strong (WAL streaming) or causal (multi-primary)
Write scaling	All nodes accept writes	Multi-primary or sharding
Minimum HA nodes	4	2 (primary + replica)
Failover	Manual or load-balancer	Automatic with proxy (13 features)
Transaction safety	Single-object	Full ACID across replicas

AI and RAG Pipelines

MinIO: Storage Layer for AI

# Typical MinIO + AI pipeline (5 services required)

# 1. Store documents in MinIO
client.put_object('docs', 'manual.pdf', pdf_data, len(pdf_data))

# 2. Extract text (external service: Tika, Textract, etc.)
text = tika_client.extract('s3://docs/manual.pdf')

# 3. Chunk text (external: LangChain, LlamaIndex)
chunks = text_splitter.split(text)

# 4. Generate embeddings (external: OpenAI API)
embeddings = openai.embeddings.create(input=chunks, model='text-embedding-3-small')

# 5. Store vectors (external: Milvus/Weaviate using MinIO as backend)
milvus_client.insert(collection, embeddings)

# 6. Search requires querying Milvus, then fetching from MinIO
results = milvus_client.search(query_embedding, limit=5)
docs = [client.get_object('docs', r.key) for r in results]

HeliosDB: Integrated RAG in SQL

-- Everything in one database, one transaction

-- 1. Store documents with embeddings
INSERT INTO knowledge_base (title, content, embedding, metadata)
VALUES (
    'Product Manual v3',
    'HeliosDB supports ACID transactions with MVCC...',
    '[0.12, -0.33, 0.78, ...]'::vector(384),
    '{"category": "docs", "version": 3}'
);

-- 2. RAG retrieval: hybrid search (FTS + vector + metadata filter)
SELECT title, content,
       embedding <-> $1::vector(384) AS semantic_distance
FROM knowledge_base
WHERE content @@ 'ACID transactions'
  AND metadata @> '{"category": "docs"}'
ORDER BY embedding <-> $1::vector(384)
LIMIT 5;

-- 3. All results are ACID-consistent, encrypted at rest,
--    and available for time-travel queries
SELECT title, content
FROM knowledge_base AT TIME '2026-01-15 00:00:00'
WHERE content @@ 'transactions';

# Python SDK: simple RAG with HeliosDB
import psycopg2

conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()

# Single query for semantic search + metadata filter + FTS
cur.execute("""
    SELECT title, content
    FROM knowledge_base
    WHERE content @@ %s
      AND embedding <-> %s::vector(384) < 0.5
    ORDER BY embedding <-> %s::vector(384)
    LIMIT 5
""", (user_query, query_embedding, query_embedding))

context_docs = cur.fetchall()
# Feed directly to LLM -- no intermediate services

Time-Travel and Data Branching

MinIO provides linear object versioning -- you can retrieve a previous version of an object by its version ID. HeliosDB provides MVCC time-travel queries and git-like data branching.

MinIO: Linear Object Versions

# List versions of a single object
versions = client.list_objects('bucket', 'key', include_version=True)
for v in versions:
    print(v.version_id, v.last_modified)

# Retrieve a specific version
obj = client.get_object('bucket', 'key', version_id='abc-123')

HeliosDB: Time-Travel + Branching

-- Time-travel: query any table at any point in time
SELECT * FROM users AT TIME '2026-02-01 00:00:00';

-- Compare data between two timestamps
SELECT
    a.user_id,
    a.balance AS balance_before,
    b.balance AS balance_after,
    b.balance - a.balance AS change
FROM users AT TIME '2026-01-01' a
JOIN users AT TIME '2026-02-01' b ON a.user_id = b.user_id
WHERE b.balance != a.balance;

-- Git-like branching: create isolated data environment
CREATE BRANCH experiment FROM main;
CHECKOUT experiment;

-- Make changes without affecting production
UPDATE pricing SET price = price * 1.1 WHERE category = 'premium';
INSERT INTO promotions VALUES ('SPRING26', 0.15, '2026-03-01', '2026-04-01');

-- Analyze impact
SELECT category, AVG(price) AS avg_price FROM pricing GROUP BY category;

-- Happy with results? Merge back
CHECKOUT main;
MERGE experiment INTO main;

-- Not happy? Just drop the branch -- production is untouched
DROP BRANCH experiment;

Better Together: HeliosDB + MinIO

HeliosDB-Full can use MinIO as an external storage tier via its S3 Foreign Data Wrapper. This gives you the best of both worlds: MinIO for cost-effective petabyte blob storage, HeliosDB for intelligent querying.

+---------------------------------------------+
|              HeliosDB-Full                   |
|                                             |
|  SQL Engine --> S3 FDW --> MinIO Cluster    |
|      |                        |             |
|      |  Predicate pushdown    |  Parquet    |
|      |  Projection pruning    |  CSV        |
|      |  Cost-based planning   |  JSON       |
|      |                        |             |
|  Local tables <-- JOIN --> Remote data      |
|  (ACID, fast)              (bulk, cheap)    |
+---------------------------------------------+

-- Define MinIO as a foreign server
CREATE SERVER minio_archive
TYPE 's3'
OPTIONS (
    endpoint   'http://minio-cluster:9000',
    access_key 'helios-service',
    secret_key 'secret',
    region     'us-east-1'
);

-- Map Parquet files as a foreign table
CREATE FOREIGN TABLE archived_orders (
    order_id    INTEGER,
    customer_id INTEGER,
    total       NUMERIC,
    order_date  DATE,
    status      TEXT
) SERVER minio_archive
OPTIONS (
    bucket 'data-warehouse',
    key    'orders/year=2025/*.parquet',
    format 'parquet'
);

-- Query across local (hot) and MinIO (cold) data seamlessly
SELECT
    c.name,
    c.tier,
    SUM(o.total) AS total_2025,
    SUM(r.total) AS total_2026
FROM customers c
LEFT JOIN archived_orders o ON o.customer_id = c.id
LEFT JOIN orders r ON r.customer_id = c.id
    AND r.order_date >= '2026-01-01'
GROUP BY c.name, c.tier
ORDER BY total_2025 + total_2026 DESC
LIMIT 50;

When to Choose MinIO

MinIO is the right choice when:

Petabyte-scale blob storage -- You need to store and serve terabytes to petabytes of files (images, videos, backups, logs) at low cost
S3 API compatibility -- Your existing tools, CI/CD pipelines, and applications already speak S3
Write-once, read-many -- Your workload is primarily storing immutable objects and retrieving them by key
Data lake foundation -- You are building a data lake where Spark, Trino, or Presto handle the query layer
Large binary files -- Video processing, ML model artifacts, container images, and backup archives
Erasure-coded durability -- You want drive-failure tolerance without full replication overhead

When to Choose HeliosDB

HeliosDB is the right choice when:

You need to query your data -- SQL with JOINs, CTEs, window functions, aggregations, and subqueries
ACID transactions matter -- Multi-statement transactions with rollback, savepoints, and isolation
Search is a core feature -- Full-text search and/or vector similarity search built into the database
AI/RAG applications -- Store documents, embeddings, and metadata together with hybrid search
Embedded or edge deployment -- A single 60MB binary that runs anywhere, including in-process
Time-travel and branching -- Query historical data, create isolated branches for testing or experimentation
Schema enforcement -- Typed columns, constraints, foreign keys, triggers, and row-level security
PostgreSQL compatibility -- Connect with psql, pgAdmin, DBeaver, any PG driver -- zero application changes
Encryption requirements -- TDE encrypts everything at rest; ZKE ensures even admins cannot read data

Migration Path

From MinIO to HeliosDB

If you are currently storing queryable data (JSON, CSV, metadata) in MinIO and querying it with external tools:

# Step 1: Export from MinIO
import boto3, json

s3 = boto3.client('s3', endpoint_url='http://minio:9000')
objects = s3.list_objects_v2(Bucket='documents')

# Step 2: Load into HeliosDB via PostgreSQL protocol
import psycopg2

conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()

for obj in objects['Contents']:
    body = s3.get_object(Bucket='documents', Key=obj['Key'])
    data = json.loads(body['Body'].read())

    cur.execute("""
        INSERT INTO documents (key, data, created_at)
        VALUES (%s, %s, %s)
    """, (obj['Key'], json.dumps(data), obj['LastModified']))

conn.commit()

Keep Both: MinIO for Blobs, HeliosDB for Intelligence

-- HeliosDB-Full: query MinIO data without moving it
CREATE FOREIGN TABLE minio_logs
    SERVER minio_server
    OPTIONS (bucket 'logs', key '2026/*.parquet', format 'parquet');

-- Local metadata + remote blobs in one query
SELECT m.filename, m.upload_date, l.error_count
FROM file_metadata m
JOIN minio_logs l ON l.filename = m.filename
WHERE l.error_count > 0
ORDER BY l.error_count DESC;

Summary

Dimension	MinIO	HeliosDB
Best at	Storing petabytes of blobs cheaply	Querying, searching, and analyzing data intelligently
Data model	Opaque objects in buckets	Tables + JSONB + vectors + time-travel
Query power	S3 Select (single-file filter)	Full SQL optimizer (69+ plan nodes)
Search	None (external services)	FTS + vector + hybrid -- all native
Transactions	Single-object	Multi-statement ACID with MVCC
Encryption	SSE (object-level)	TDE + ZKE (database-wide, zero-knowledge)
AI/RAG	Storage layer for AI pipelines	Integrated AI data platform
Deployment	4+ nodes minimum for HA	Single binary, embeddable, or clustered
Together	HeliosDB-Full queries MinIO via S3 FDW	Best of both worlds

MinIO excels at what it was built for: high-performance, S3-compatible object storage at massive scale. HeliosDB excels at what it was built for: making data queryable, searchable, and intelligent. The ideal architecture often uses both -- MinIO as the cost-effective bulk storage tier, and HeliosDB as the intelligent query layer that makes that data useful.