MinIO is the gold standard for self-hosted S3-compatible object storage -- fast, reliable, and battle-tested at petabyte scale. But storing objects and understanding them are two very different problems. HeliosDB is a unified database engine that combines SQL, full-text search, vector search, ACID transactions, and AI-native capabilities in a single binary. When your application needs to query, search, and reason about the data inside those objects, MinIO alone falls short -- and HeliosDB fills that gap. Better yet, HeliosDB-Full can query MinIO directly via its S3 Foreign Data Wrapper, giving you the best of both worlds.


Quick Comparison

Feature MinIO HeliosDB
Primary purposeS3-compatible object/blob storageUnified SQL database + documents + vectors
Query languageS3 Select (CSV/JSON/Parquet filtering)Full SQL (JOINs, CTEs, window functions, subqueries)
Full-text searchNot built-in (requires OpenSearch/MeiliSearch)Native @@ operator with boolean, phrase, fuzzy
Vector searchNot supported (requires Milvus/Weaviate)Native HNSW indexes with SIMD + Product Quantization
ACID transactionsSingle-object atomicity onlyFull multi-statement ACID with MVCC
Schema enforcementNone (opaque blobs)Typed columns, constraints, foreign keys
EncryptionSSE-S3, SSE-KMS, SSE-C (object-level)TDE (AES-256-GCM) + Zero-Knowledge Encryption
Data branchingNot supportedGit-like branching with merge
Time-travel queriesObject versioning (linear)MVCC snapshots at any timestamp
ReplicationErasure coding + site replicationWAL streaming, multi-primary, sharding
Deployment size60MB binary, 4+ nodes for HA60MB binary, single-node to sharded cluster
Wire protocolS3 API (REST)PostgreSQL wire protocol + REST + gRPC

The Architecture Problem

With MinIO: The Object Storage Stack

When you need to search, query, or analyze data stored in MinIO, you end up building a pipeline of external services:

+--------------+     +--------------+     +--------------+
|   MinIO      |---->|  ETL / Spark |---->|  Data        |
|  (storage)   |     |  (transform) |     |  Warehouse   |
+--------------+     +--------------+     +--------------+
       |                                         |
       |              +--------------+           |
       +----------->  |  OpenSearch  |<----------+
                      |  (FTS)       |
                      +--------------+
       |              +--------------+
       +----------->  |  Milvus      |
                      |  (vectors)   |
                      +--------------+

 5 services to deploy, monitor, and keep in sync
 No ACID guarantees across the pipeline
 Data freshness measured in minutes to hours

With HeliosDB: One Engine, Everything Built In

+---------------------------------------------+
|                 HeliosDB                     |
|                                             |
|  +---------+  +------+  +--------------+   |
|  |   SQL   |  |  FTS |  | Vector Search|   |
|  |  Engine |  |  @@  |  |   HNSW+PQ    |   |
|  +----+----+  +--+---+  +------+-------+   |
|       |          |              |            |
|  +----+----------+--------------+----------+ |
|  |        MVCC Storage Engine              | |
|  |    (ACID + TDE + WAL + Branching)       | |
|  +-----------------------------------------+ |
|                                             |
|  Optional: S3 FDW --> MinIO (bulk storage) |
+---------------------------------------------+

 1 binary, everything ACID-compliant
 Query, search, and analyze in one SQL statement
 Real-time: data available instantly on INSERT

S3 Select vs Full SQL Optimizer

MinIO supports S3 Select for basic filtering of CSV, JSON, and Parquet objects. HeliosDB provides a complete SQL engine with a cost-based optimizer, parallel execution, and 69+ plan node types.

MinIO: S3 Select

import boto3

s3 = boto3.client('s3', endpoint_url='http://minio:9000',
                  aws_access_key_id='minioadmin',
                  aws_secret_access_key='minioadmin')

# S3 Select: filter a single CSV object
result = s3.select_object_content(
    Bucket='sales-data',
    Key='2026/01/transactions.csv',
    Expression="SELECT s.amount FROM s3object s WHERE s.region = 'EU'",
    ExpressionType='SQL',
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'CSV': {}}
)

# Limitations:
#  - Cannot JOIN across objects
#  - Cannot aggregate (no SUM/AVG/COUNT)
#  - Cannot use subqueries or CTEs
#  - One file at a time
#  - No window functions

HeliosDB: Full SQL with Optimizer

-- Connect via any PostgreSQL client (psql, pgAdmin, DBeaver, etc.)

-- Complex analytics query: JOINs, CTEs, window functions, aggregation
WITH monthly_sales AS (
    SELECT
        region,
        date_trunc('month', sale_date) AS month,
        SUM(amount) AS total,
        COUNT(*) AS txn_count
    FROM transactions
    WHERE sale_date >= '2026-01-01'
    GROUP BY region, date_trunc('month', sale_date)
),
ranked AS (
    SELECT *,
        RANK() OVER (PARTITION BY month ORDER BY total DESC) AS region_rank,
        LAG(total) OVER (PARTITION BY region ORDER BY month) AS prev_month
    FROM monthly_sales
)
SELECT
    region,
    month,
    total,
    txn_count,
    region_rank,
    ROUND((total - prev_month) / prev_month * 100, 2) AS growth_pct
FROM ranked
WHERE region_rank <= 3
ORDER BY month, region_rank;
-- HeliosDB-Full: Query Parquet files on MinIO via S3 Foreign Data Wrapper
CREATE FOREIGN TABLE minio_logs (
    timestamp TIMESTAMPTZ,
    user_id   INTEGER,
    action    TEXT,
    payload   JSONB
) SERVER s3_server
OPTIONS (
    bucket   'analytics',
    key      'logs/2026/*.parquet',
    format   'parquet'
);

-- JOIN MinIO data with local tables in one query
SELECT u.name, COUNT(*) AS actions, MAX(l.timestamp) AS last_seen
FROM minio_logs l
JOIN users u ON u.id = l.user_id
WHERE l.timestamp > NOW() - INTERVAL '7 days'
GROUP BY u.name
ORDER BY actions DESC
LIMIT 20;

Encryption: SSE vs TDE + Zero-Knowledge

MinIO: Server-Side Encryption

MinIO offers three encryption modes, all operating at the object level:

Mode Key Management Scope
SSE-S3MinIO-managed keysPer-object
SSE-KMSExternal KMS (Vault, AWS KMS)Per-object
SSE-CClient-provided keys per requestPer-object
# MinIO SSE-C: client must send key with every request
from minio import Minio
from minio.sse import SseCustomerKey
import os

key = os.urandom(32)
sse = SseCustomerKey(key)

client = Minio('minio:9000', access_key='admin', secret_key='password', secure=False)
client.put_object('secure-bucket', 'doc.pdf', data, length, sse=sse)

# Problem: lose the key, lose the data
# Problem: key transmitted over the wire on every request
# Problem: no encryption of indexes or metadata queries

HeliosDB: Transparent Data Encryption + Zero-Knowledge Encryption

-- TDE: everything encrypted at rest automatically (AES-256-GCM)
-- WAL entries, data pages, indexes -- all encrypted transparently
-- No application code changes needed

INSERT INTO medical_records (patient_id, diagnosis, notes)
VALUES (42, 'confidential', 'encrypted at rest automatically');

-- Queries work normally -- decryption is transparent
SELECT * FROM medical_records WHERE patient_id = 42;

-- Zero-Knowledge Encryption: even the DB admin cannot read data
-- Application holds the key; HeliosDB never sees plaintext
-- Ring + AWS-LC FIPS-validated cryptographic providers
Aspect MinIO SSE HeliosDB TDE + ZKE
GranularityPer-objectEntire database (pages, WAL, indexes)
TransparencyApplication must specify per requestFully transparent to queries
Index encryptionNo (metadata in plaintext)Yes (encrypted indexes)
Key rotationManual per-object re-encryptionOnline key rotation
Zero-knowledge optionNoYes (application-held keys)
FIPS complianceDepends on KMSRing + AWS-LC FIPS providers
AlgorithmAES-256 (SSE-S3) or client choiceAES-256-GCM

Document Intelligence: Opaque Blobs vs Queryable Data

MinIO: Store and Retrieve

# Upload a JSON document to MinIO
client.put_object('documents', 'invoice-1234.json',
    io.BytesIO(json.dumps(invoice).encode()),
    content_type='application/json')

# To search this document, you need:
# 1. Download it
# 2. Parse it
# 3. Index it in OpenSearch/MeiliSearch
# 4. Query the search engine
# 5. Fetch the matching objects back from MinIO
# Total: 5 steps, 3 services, no ACID guarantees

HeliosDB: Store, Query, Search, Analyze -- All in SQL

-- Store the document as JSONB with full queryability
INSERT INTO invoices (id, data, embedding)
VALUES (
    1234,
    '{"vendor": "Acme Corp", "total": 4500.00,
      "items": [{"desc": "Widget", "qty": 100, "price": 45.00}],
      "tags": ["hardware", "bulk"]}',
    '[0.12, -0.45, 0.89, ...]'::vector(384)
);

-- Query deeply nested JSON fields
SELECT
    data->>'vendor' AS vendor,
    (data->>'total')::numeric AS total,
    jsonb_array_length(data->'items') AS line_items
FROM invoices
WHERE data @> '{"tags": ["hardware"]}'
  AND (data->>'total')::numeric > 1000
ORDER BY total DESC;

-- Full-text search across all invoice data
SELECT id, data->>'vendor' AS vendor
FROM invoices
WHERE data::text @@ 'widget AND bulk';

-- Vector similarity search (find semantically similar invoices)
SELECT id, data->>'vendor' AS vendor,
       embedding <-> query_embedding AS distance
FROM invoices
ORDER BY embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384)
LIMIT 5;

-- Hybrid search: combine text + vector + SQL filters
SELECT id, data->>'vendor' AS vendor
FROM invoices
WHERE data::text @@ 'hardware'
  AND embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384) < 0.3
  AND (data->>'total')::numeric > 500
ORDER BY embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384);

Replication and High Availability

MinIO: Erasure Coding + Site Replication

MinIO Cluster (4-32 nodes)
+-----+  +-----+  +-----+  +-----+
| N1  |  | N2  |  | N3  |  | N4  |
|EC 4 |  |EC 4 |  |EC 4 |  |EC 4 |
+--+--+  +--+--+  +--+--+  +--+--+
   +----+----+----+----+----+--+
        |  Erasure Set (16 drives)
        |  Survives N/2 drive failures

Site Replication:
  Site A <--------> Site B
  (async bucket/object/IAM sync)
  • Erasure coding: survives up to N/2 drive failures per set
  • Site replication: async, bucket-level, eventual consistency
  • No cross-object transactions during replication
  • Minimum 4 nodes for production HA

HeliosDB: 3-Tier Replication Architecture

Tier 1: WAL Streaming (Primary -> Replica)
+----------+    WAL Stream    +----------+
| Primary  | ---------------> | Replica  |
| (R/W)    |                  | (R/O)    |
+----------+                  +----------+
  Automatic failover, zero data loss

Tier 2: Multi-Primary (All nodes accept writes)
+----------+ <-----------> +----------+
| Primary1 |              | Primary2 |
| (R/W)    |              | (R/W)    |
+----------+              +----------+
  Conflict-free replication, write anywhere

Tier 3: Sharding (Horizontal partitioning)
+----------+  +----------+  +----------+
| Shard 1  |  | Shard 2  |  | Shard 3  |
| keys A-M |  | keys N-S |  | keys T-Z |
+----------+  +----------+  +----------+
  Linear write scaling by shard key
Aspect MinIO HeliosDB
Data protectionErasure coding (N/2 failures)WAL + replicas + fsync
Multi-siteSite replication (async)WAL streaming or multi-primary
ConsistencyEventual (site replication)Strong (WAL streaming) or causal (multi-primary)
Write scalingAll nodes accept writesMulti-primary or sharding
Minimum HA nodes42 (primary + replica)
FailoverManual or load-balancerAutomatic with proxy (13 features)
Transaction safetySingle-objectFull ACID across replicas

AI and RAG Pipelines

MinIO: Storage Layer for AI

# Typical MinIO + AI pipeline (5 services required)

# 1. Store documents in MinIO
client.put_object('docs', 'manual.pdf', pdf_data, len(pdf_data))

# 2. Extract text (external service: Tika, Textract, etc.)
text = tika_client.extract('s3://docs/manual.pdf')

# 3. Chunk text (external: LangChain, LlamaIndex)
chunks = text_splitter.split(text)

# 4. Generate embeddings (external: OpenAI API)
embeddings = openai.embeddings.create(input=chunks, model='text-embedding-3-small')

# 5. Store vectors (external: Milvus/Weaviate using MinIO as backend)
milvus_client.insert(collection, embeddings)

# 6. Search requires querying Milvus, then fetching from MinIO
results = milvus_client.search(query_embedding, limit=5)
docs = [client.get_object('docs', r.key) for r in results]

HeliosDB: Integrated RAG in SQL

-- Everything in one database, one transaction

-- 1. Store documents with embeddings
INSERT INTO knowledge_base (title, content, embedding, metadata)
VALUES (
    'Product Manual v3',
    'HeliosDB supports ACID transactions with MVCC...',
    '[0.12, -0.33, 0.78, ...]'::vector(384),
    '{"category": "docs", "version": 3}'
);

-- 2. RAG retrieval: hybrid search (FTS + vector + metadata filter)
SELECT title, content,
       embedding <-> $1::vector(384) AS semantic_distance
FROM knowledge_base
WHERE content @@ 'ACID transactions'
  AND metadata @> '{"category": "docs"}'
ORDER BY embedding <-> $1::vector(384)
LIMIT 5;

-- 3. All results are ACID-consistent, encrypted at rest,
--    and available for time-travel queries
SELECT title, content
FROM knowledge_base AT TIME '2026-01-15 00:00:00'
WHERE content @@ 'transactions';
# Python SDK: simple RAG with HeliosDB
import psycopg2

conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()

# Single query for semantic search + metadata filter + FTS
cur.execute("""
    SELECT title, content
    FROM knowledge_base
    WHERE content @@ %s
      AND embedding <-> %s::vector(384) < 0.5
    ORDER BY embedding <-> %s::vector(384)
    LIMIT 5
""", (user_query, query_embedding, query_embedding))

context_docs = cur.fetchall()
# Feed directly to LLM -- no intermediate services

Time-Travel and Data Branching

MinIO provides linear object versioning -- you can retrieve a previous version of an object by its version ID. HeliosDB provides MVCC time-travel queries and git-like data branching.

MinIO: Linear Object Versions

# List versions of a single object
versions = client.list_objects('bucket', 'key', include_version=True)
for v in versions:
    print(v.version_id, v.last_modified)

# Retrieve a specific version
obj = client.get_object('bucket', 'key', version_id='abc-123')

HeliosDB: Time-Travel + Branching

-- Time-travel: query any table at any point in time
SELECT * FROM users AT TIME '2026-02-01 00:00:00';

-- Compare data between two timestamps
SELECT
    a.user_id,
    a.balance AS balance_before,
    b.balance AS balance_after,
    b.balance - a.balance AS change
FROM users AT TIME '2026-01-01' a
JOIN users AT TIME '2026-02-01' b ON a.user_id = b.user_id
WHERE b.balance != a.balance;

-- Git-like branching: create isolated data environment
CREATE BRANCH experiment FROM main;
CHECKOUT experiment;

-- Make changes without affecting production
UPDATE pricing SET price = price * 1.1 WHERE category = 'premium';
INSERT INTO promotions VALUES ('SPRING26', 0.15, '2026-03-01', '2026-04-01');

-- Analyze impact
SELECT category, AVG(price) AS avg_price FROM pricing GROUP BY category;

-- Happy with results? Merge back
CHECKOUT main;
MERGE experiment INTO main;

-- Not happy? Just drop the branch -- production is untouched
DROP BRANCH experiment;

Better Together: HeliosDB + MinIO

HeliosDB-Full can use MinIO as an external storage tier via its S3 Foreign Data Wrapper. This gives you the best of both worlds: MinIO for cost-effective petabyte blob storage, HeliosDB for intelligent querying.

+---------------------------------------------+
|              HeliosDB-Full                   |
|                                             |
|  SQL Engine --> S3 FDW --> MinIO Cluster    |
|      |                        |             |
|      |  Predicate pushdown    |  Parquet    |
|      |  Projection pruning    |  CSV        |
|      |  Cost-based planning   |  JSON       |
|      |                        |             |
|  Local tables <-- JOIN --> Remote data      |
|  (ACID, fast)              (bulk, cheap)    |
+---------------------------------------------+
-- Define MinIO as a foreign server
CREATE SERVER minio_archive
TYPE 's3'
OPTIONS (
    endpoint   'http://minio-cluster:9000',
    access_key 'helios-service',
    secret_key 'secret',
    region     'us-east-1'
);

-- Map Parquet files as a foreign table
CREATE FOREIGN TABLE archived_orders (
    order_id    INTEGER,
    customer_id INTEGER,
    total       NUMERIC,
    order_date  DATE,
    status      TEXT
) SERVER minio_archive
OPTIONS (
    bucket 'data-warehouse',
    key    'orders/year=2025/*.parquet',
    format 'parquet'
);

-- Query across local (hot) and MinIO (cold) data seamlessly
SELECT
    c.name,
    c.tier,
    SUM(o.total) AS total_2025,
    SUM(r.total) AS total_2026
FROM customers c
LEFT JOIN archived_orders o ON o.customer_id = c.id
LEFT JOIN orders r ON r.customer_id = c.id
    AND r.order_date >= '2026-01-01'
GROUP BY c.name, c.tier
ORDER BY total_2025 + total_2026 DESC
LIMIT 50;

When to Choose MinIO

MinIO is the right choice when:

  • Petabyte-scale blob storage -- You need to store and serve terabytes to petabytes of files (images, videos, backups, logs) at low cost
  • S3 API compatibility -- Your existing tools, CI/CD pipelines, and applications already speak S3
  • Write-once, read-many -- Your workload is primarily storing immutable objects and retrieving them by key
  • Data lake foundation -- You are building a data lake where Spark, Trino, or Presto handle the query layer
  • Large binary files -- Video processing, ML model artifacts, container images, and backup archives
  • Erasure-coded durability -- You want drive-failure tolerance without full replication overhead

When to Choose HeliosDB

HeliosDB is the right choice when:

  • You need to query your data -- SQL with JOINs, CTEs, window functions, aggregations, and subqueries
  • ACID transactions matter -- Multi-statement transactions with rollback, savepoints, and isolation
  • Search is a core feature -- Full-text search and/or vector similarity search built into the database
  • AI/RAG applications -- Store documents, embeddings, and metadata together with hybrid search
  • Embedded or edge deployment -- A single 60MB binary that runs anywhere, including in-process
  • Time-travel and branching -- Query historical data, create isolated branches for testing or experimentation
  • Schema enforcement -- Typed columns, constraints, foreign keys, triggers, and row-level security
  • PostgreSQL compatibility -- Connect with psql, pgAdmin, DBeaver, any PG driver -- zero application changes
  • Encryption requirements -- TDE encrypts everything at rest; ZKE ensures even admins cannot read data

Migration Path

From MinIO to HeliosDB

If you are currently storing queryable data (JSON, CSV, metadata) in MinIO and querying it with external tools:

# Step 1: Export from MinIO
import boto3, json

s3 = boto3.client('s3', endpoint_url='http://minio:9000')
objects = s3.list_objects_v2(Bucket='documents')

# Step 2: Load into HeliosDB via PostgreSQL protocol
import psycopg2

conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()

for obj in objects['Contents']:
    body = s3.get_object(Bucket='documents', Key=obj['Key'])
    data = json.loads(body['Body'].read())

    cur.execute("""
        INSERT INTO documents (key, data, created_at)
        VALUES (%s, %s, %s)
    """, (obj['Key'], json.dumps(data), obj['LastModified']))

conn.commit()

Keep Both: MinIO for Blobs, HeliosDB for Intelligence

-- HeliosDB-Full: query MinIO data without moving it
CREATE FOREIGN TABLE minio_logs
    SERVER minio_server
    OPTIONS (bucket 'logs', key '2026/*.parquet', format 'parquet');

-- Local metadata + remote blobs in one query
SELECT m.filename, m.upload_date, l.error_count
FROM file_metadata m
JOIN minio_logs l ON l.filename = m.filename
WHERE l.error_count > 0
ORDER BY l.error_count DESC;

Summary

Dimension MinIO HeliosDB
Best atStoring petabytes of blobs cheaplyQuerying, searching, and analyzing data intelligently
Data modelOpaque objects in bucketsTables + JSONB + vectors + time-travel
Query powerS3 Select (single-file filter)Full SQL optimizer (69+ plan nodes)
SearchNone (external services)FTS + vector + hybrid -- all native
TransactionsSingle-objectMulti-statement ACID with MVCC
EncryptionSSE (object-level)TDE + ZKE (database-wide, zero-knowledge)
AI/RAGStorage layer for AI pipelinesIntegrated AI data platform
Deployment4+ nodes minimum for HASingle binary, embeddable, or clustered
TogetherHeliosDB-Full queries MinIO via S3 FDWBest of both worlds

MinIO excels at what it was built for: high-performance, S3-compatible object storage at massive scale. HeliosDB excels at what it was built for: making data queryable, searchable, and intelligent. The ideal architecture often uses both -- MinIO as the cost-effective bulk storage tier, and HeliosDB as the intelligent query layer that makes that data useful.

Ready to try HeliosDB?

Get started with HeliosDB in minutes. Open source, free to use.

Get Started Contact Sales