Object Storage Meets Intelligent Database: Why Storing Data Isn't Enough
MinIO is the gold standard for self-hosted S3-compatible object storage -- fast, reliable, and battle-tested at petabyte scale. But storing objects and understanding them are two very different problems. HeliosDB is a unified database engine that combines SQL, full-text search, vector search, ACID transactions, and AI-native capabilities in a single binary. When your application needs to query, search, and reason about the data inside those objects, MinIO alone falls short -- and HeliosDB fills that gap. Better yet, HeliosDB-Full can query MinIO directly via its S3 Foreign Data Wrapper, giving you the best of both worlds.
| Feature | MinIO | HeliosDB |
|---|---|---|
| Primary purpose | S3-compatible object/blob storage | Unified SQL database + documents + vectors |
| Query language | S3 Select (CSV/JSON/Parquet filtering) | Full SQL (JOINs, CTEs, window functions, subqueries) |
| Full-text search | Not built-in (requires OpenSearch/MeiliSearch) | Native @@ operator with boolean, phrase, fuzzy |
| Vector search | Not supported (requires Milvus/Weaviate) | Native HNSW indexes with SIMD + Product Quantization |
| ACID transactions | Single-object atomicity only | Full multi-statement ACID with MVCC |
| Schema enforcement | None (opaque blobs) | Typed columns, constraints, foreign keys |
| Encryption | SSE-S3, SSE-KMS, SSE-C (object-level) | TDE (AES-256-GCM) + Zero-Knowledge Encryption |
| Data branching | Not supported | Git-like branching with merge |
| Time-travel queries | Object versioning (linear) | MVCC snapshots at any timestamp |
| Replication | Erasure coding + site replication | WAL streaming, multi-primary, sharding |
| Deployment size | 60MB binary, 4+ nodes for HA | 60MB binary, single-node to sharded cluster |
| Wire protocol | S3 API (REST) | PostgreSQL wire protocol + REST + gRPC |
When you need to search, query, or analyze data stored in MinIO, you end up building a pipeline of external services:
+--------------+ +--------------+ +--------------+
| MinIO |---->| ETL / Spark |---->| Data |
| (storage) | | (transform) | | Warehouse |
+--------------+ +--------------+ +--------------+
| |
| +--------------+ |
+-----------> | OpenSearch |<----------+
| (FTS) |
+--------------+
| +--------------+
+-----------> | Milvus |
| (vectors) |
+--------------+
5 services to deploy, monitor, and keep in sync
No ACID guarantees across the pipeline
Data freshness measured in minutes to hours
+---------------------------------------------+
| HeliosDB |
| |
| +---------+ +------+ +--------------+ |
| | SQL | | FTS | | Vector Search| |
| | Engine | | @@ | | HNSW+PQ | |
| +----+----+ +--+---+ +------+-------+ |
| | | | |
| +----+----------+--------------+----------+ |
| | MVCC Storage Engine | |
| | (ACID + TDE + WAL + Branching) | |
| +-----------------------------------------+ |
| |
| Optional: S3 FDW --> MinIO (bulk storage) |
+---------------------------------------------+
1 binary, everything ACID-compliant
Query, search, and analyze in one SQL statement
Real-time: data available instantly on INSERT
MinIO supports S3 Select for basic filtering of CSV, JSON, and Parquet objects. HeliosDB provides a complete SQL engine with a cost-based optimizer, parallel execution, and 69+ plan node types.
import boto3
s3 = boto3.client('s3', endpoint_url='http://minio:9000',
aws_access_key_id='minioadmin',
aws_secret_access_key='minioadmin')
# S3 Select: filter a single CSV object
result = s3.select_object_content(
Bucket='sales-data',
Key='2026/01/transactions.csv',
Expression="SELECT s.amount FROM s3object s WHERE s.region = 'EU'",
ExpressionType='SQL',
InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
OutputSerialization={'CSV': {}}
)
# Limitations:
# - Cannot JOIN across objects
# - Cannot aggregate (no SUM/AVG/COUNT)
# - Cannot use subqueries or CTEs
# - One file at a time
# - No window functions
-- Connect via any PostgreSQL client (psql, pgAdmin, DBeaver, etc.)
-- Complex analytics query: JOINs, CTEs, window functions, aggregation
WITH monthly_sales AS (
SELECT
region,
date_trunc('month', sale_date) AS month,
SUM(amount) AS total,
COUNT(*) AS txn_count
FROM transactions
WHERE sale_date >= '2026-01-01'
GROUP BY region, date_trunc('month', sale_date)
),
ranked AS (
SELECT *,
RANK() OVER (PARTITION BY month ORDER BY total DESC) AS region_rank,
LAG(total) OVER (PARTITION BY region ORDER BY month) AS prev_month
FROM monthly_sales
)
SELECT
region,
month,
total,
txn_count,
region_rank,
ROUND((total - prev_month) / prev_month * 100, 2) AS growth_pct
FROM ranked
WHERE region_rank <= 3
ORDER BY month, region_rank;
-- HeliosDB-Full: Query Parquet files on MinIO via S3 Foreign Data Wrapper
CREATE FOREIGN TABLE minio_logs (
timestamp TIMESTAMPTZ,
user_id INTEGER,
action TEXT,
payload JSONB
) SERVER s3_server
OPTIONS (
bucket 'analytics',
key 'logs/2026/*.parquet',
format 'parquet'
);
-- JOIN MinIO data with local tables in one query
SELECT u.name, COUNT(*) AS actions, MAX(l.timestamp) AS last_seen
FROM minio_logs l
JOIN users u ON u.id = l.user_id
WHERE l.timestamp > NOW() - INTERVAL '7 days'
GROUP BY u.name
ORDER BY actions DESC
LIMIT 20;
MinIO offers three encryption modes, all operating at the object level:
| Mode | Key Management | Scope |
|---|---|---|
| SSE-S3 | MinIO-managed keys | Per-object |
| SSE-KMS | External KMS (Vault, AWS KMS) | Per-object |
| SSE-C | Client-provided keys per request | Per-object |
# MinIO SSE-C: client must send key with every request
from minio import Minio
from minio.sse import SseCustomerKey
import os
key = os.urandom(32)
sse = SseCustomerKey(key)
client = Minio('minio:9000', access_key='admin', secret_key='password', secure=False)
client.put_object('secure-bucket', 'doc.pdf', data, length, sse=sse)
# Problem: lose the key, lose the data
# Problem: key transmitted over the wire on every request
# Problem: no encryption of indexes or metadata queries
-- TDE: everything encrypted at rest automatically (AES-256-GCM)
-- WAL entries, data pages, indexes -- all encrypted transparently
-- No application code changes needed
INSERT INTO medical_records (patient_id, diagnosis, notes)
VALUES (42, 'confidential', 'encrypted at rest automatically');
-- Queries work normally -- decryption is transparent
SELECT * FROM medical_records WHERE patient_id = 42;
-- Zero-Knowledge Encryption: even the DB admin cannot read data
-- Application holds the key; HeliosDB never sees plaintext
-- Ring + AWS-LC FIPS-validated cryptographic providers
| Aspect | MinIO SSE | HeliosDB TDE + ZKE |
|---|---|---|
| Granularity | Per-object | Entire database (pages, WAL, indexes) |
| Transparency | Application must specify per request | Fully transparent to queries |
| Index encryption | No (metadata in plaintext) | Yes (encrypted indexes) |
| Key rotation | Manual per-object re-encryption | Online key rotation |
| Zero-knowledge option | No | Yes (application-held keys) |
| FIPS compliance | Depends on KMS | Ring + AWS-LC FIPS providers |
| Algorithm | AES-256 (SSE-S3) or client choice | AES-256-GCM |
# Upload a JSON document to MinIO
client.put_object('documents', 'invoice-1234.json',
io.BytesIO(json.dumps(invoice).encode()),
content_type='application/json')
# To search this document, you need:
# 1. Download it
# 2. Parse it
# 3. Index it in OpenSearch/MeiliSearch
# 4. Query the search engine
# 5. Fetch the matching objects back from MinIO
# Total: 5 steps, 3 services, no ACID guarantees
-- Store the document as JSONB with full queryability
INSERT INTO invoices (id, data, embedding)
VALUES (
1234,
'{"vendor": "Acme Corp", "total": 4500.00,
"items": [{"desc": "Widget", "qty": 100, "price": 45.00}],
"tags": ["hardware", "bulk"]}',
'[0.12, -0.45, 0.89, ...]'::vector(384)
);
-- Query deeply nested JSON fields
SELECT
data->>'vendor' AS vendor,
(data->>'total')::numeric AS total,
jsonb_array_length(data->'items') AS line_items
FROM invoices
WHERE data @> '{"tags": ["hardware"]}'
AND (data->>'total')::numeric > 1000
ORDER BY total DESC;
-- Full-text search across all invoice data
SELECT id, data->>'vendor' AS vendor
FROM invoices
WHERE data::text @@ 'widget AND bulk';
-- Vector similarity search (find semantically similar invoices)
SELECT id, data->>'vendor' AS vendor,
embedding <-> query_embedding AS distance
FROM invoices
ORDER BY embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384)
LIMIT 5;
-- Hybrid search: combine text + vector + SQL filters
SELECT id, data->>'vendor' AS vendor
FROM invoices
WHERE data::text @@ 'hardware'
AND embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384) < 0.3
AND (data->>'total')::numeric > 500
ORDER BY embedding <-> '[0.11, -0.44, 0.88, ...]'::vector(384);
MinIO Cluster (4-32 nodes)
+-----+ +-----+ +-----+ +-----+
| N1 | | N2 | | N3 | | N4 |
|EC 4 | |EC 4 | |EC 4 | |EC 4 |
+--+--+ +--+--+ +--+--+ +--+--+
+----+----+----+----+----+--+
| Erasure Set (16 drives)
| Survives N/2 drive failures
Site Replication:
Site A <--------> Site B
(async bucket/object/IAM sync)
Tier 1: WAL Streaming (Primary -> Replica)
+----------+ WAL Stream +----------+
| Primary | ---------------> | Replica |
| (R/W) | | (R/O) |
+----------+ +----------+
Automatic failover, zero data loss
Tier 2: Multi-Primary (All nodes accept writes)
+----------+ <-----------> +----------+
| Primary1 | | Primary2 |
| (R/W) | | (R/W) |
+----------+ +----------+
Conflict-free replication, write anywhere
Tier 3: Sharding (Horizontal partitioning)
+----------+ +----------+ +----------+
| Shard 1 | | Shard 2 | | Shard 3 |
| keys A-M | | keys N-S | | keys T-Z |
+----------+ +----------+ +----------+
Linear write scaling by shard key
| Aspect | MinIO | HeliosDB |
|---|---|---|
| Data protection | Erasure coding (N/2 failures) | WAL + replicas + fsync |
| Multi-site | Site replication (async) | WAL streaming or multi-primary |
| Consistency | Eventual (site replication) | Strong (WAL streaming) or causal (multi-primary) |
| Write scaling | All nodes accept writes | Multi-primary or sharding |
| Minimum HA nodes | 4 | 2 (primary + replica) |
| Failover | Manual or load-balancer | Automatic with proxy (13 features) |
| Transaction safety | Single-object | Full ACID across replicas |
# Typical MinIO + AI pipeline (5 services required)
# 1. Store documents in MinIO
client.put_object('docs', 'manual.pdf', pdf_data, len(pdf_data))
# 2. Extract text (external service: Tika, Textract, etc.)
text = tika_client.extract('s3://docs/manual.pdf')
# 3. Chunk text (external: LangChain, LlamaIndex)
chunks = text_splitter.split(text)
# 4. Generate embeddings (external: OpenAI API)
embeddings = openai.embeddings.create(input=chunks, model='text-embedding-3-small')
# 5. Store vectors (external: Milvus/Weaviate using MinIO as backend)
milvus_client.insert(collection, embeddings)
# 6. Search requires querying Milvus, then fetching from MinIO
results = milvus_client.search(query_embedding, limit=5)
docs = [client.get_object('docs', r.key) for r in results]
-- Everything in one database, one transaction
-- 1. Store documents with embeddings
INSERT INTO knowledge_base (title, content, embedding, metadata)
VALUES (
'Product Manual v3',
'HeliosDB supports ACID transactions with MVCC...',
'[0.12, -0.33, 0.78, ...]'::vector(384),
'{"category": "docs", "version": 3}'
);
-- 2. RAG retrieval: hybrid search (FTS + vector + metadata filter)
SELECT title, content,
embedding <-> $1::vector(384) AS semantic_distance
FROM knowledge_base
WHERE content @@ 'ACID transactions'
AND metadata @> '{"category": "docs"}'
ORDER BY embedding <-> $1::vector(384)
LIMIT 5;
-- 3. All results are ACID-consistent, encrypted at rest,
-- and available for time-travel queries
SELECT title, content
FROM knowledge_base AT TIME '2026-01-15 00:00:00'
WHERE content @@ 'transactions';
# Python SDK: simple RAG with HeliosDB
import psycopg2
conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()
# Single query for semantic search + metadata filter + FTS
cur.execute("""
SELECT title, content
FROM knowledge_base
WHERE content @@ %s
AND embedding <-> %s::vector(384) < 0.5
ORDER BY embedding <-> %s::vector(384)
LIMIT 5
""", (user_query, query_embedding, query_embedding))
context_docs = cur.fetchall()
# Feed directly to LLM -- no intermediate services
MinIO provides linear object versioning -- you can retrieve a previous version of an object by its version ID. HeliosDB provides MVCC time-travel queries and git-like data branching.
# List versions of a single object
versions = client.list_objects('bucket', 'key', include_version=True)
for v in versions:
print(v.version_id, v.last_modified)
# Retrieve a specific version
obj = client.get_object('bucket', 'key', version_id='abc-123')
-- Time-travel: query any table at any point in time
SELECT * FROM users AT TIME '2026-02-01 00:00:00';
-- Compare data between two timestamps
SELECT
a.user_id,
a.balance AS balance_before,
b.balance AS balance_after,
b.balance - a.balance AS change
FROM users AT TIME '2026-01-01' a
JOIN users AT TIME '2026-02-01' b ON a.user_id = b.user_id
WHERE b.balance != a.balance;
-- Git-like branching: create isolated data environment
CREATE BRANCH experiment FROM main;
CHECKOUT experiment;
-- Make changes without affecting production
UPDATE pricing SET price = price * 1.1 WHERE category = 'premium';
INSERT INTO promotions VALUES ('SPRING26', 0.15, '2026-03-01', '2026-04-01');
-- Analyze impact
SELECT category, AVG(price) AS avg_price FROM pricing GROUP BY category;
-- Happy with results? Merge back
CHECKOUT main;
MERGE experiment INTO main;
-- Not happy? Just drop the branch -- production is untouched
DROP BRANCH experiment;
HeliosDB-Full can use MinIO as an external storage tier via its S3 Foreign Data Wrapper. This gives you the best of both worlds: MinIO for cost-effective petabyte blob storage, HeliosDB for intelligent querying.
+---------------------------------------------+
| HeliosDB-Full |
| |
| SQL Engine --> S3 FDW --> MinIO Cluster |
| | | |
| | Predicate pushdown | Parquet |
| | Projection pruning | CSV |
| | Cost-based planning | JSON |
| | | |
| Local tables <-- JOIN --> Remote data |
| (ACID, fast) (bulk, cheap) |
+---------------------------------------------+
-- Define MinIO as a foreign server
CREATE SERVER minio_archive
TYPE 's3'
OPTIONS (
endpoint 'http://minio-cluster:9000',
access_key 'helios-service',
secret_key 'secret',
region 'us-east-1'
);
-- Map Parquet files as a foreign table
CREATE FOREIGN TABLE archived_orders (
order_id INTEGER,
customer_id INTEGER,
total NUMERIC,
order_date DATE,
status TEXT
) SERVER minio_archive
OPTIONS (
bucket 'data-warehouse',
key 'orders/year=2025/*.parquet',
format 'parquet'
);
-- Query across local (hot) and MinIO (cold) data seamlessly
SELECT
c.name,
c.tier,
SUM(o.total) AS total_2025,
SUM(r.total) AS total_2026
FROM customers c
LEFT JOIN archived_orders o ON o.customer_id = c.id
LEFT JOIN orders r ON r.customer_id = c.id
AND r.order_date >= '2026-01-01'
GROUP BY c.name, c.tier
ORDER BY total_2025 + total_2026 DESC
LIMIT 50;
MinIO is the right choice when:
HeliosDB is the right choice when:
If you are currently storing queryable data (JSON, CSV, metadata) in MinIO and querying it with external tools:
# Step 1: Export from MinIO
import boto3, json
s3 = boto3.client('s3', endpoint_url='http://minio:9000')
objects = s3.list_objects_v2(Bucket='documents')
# Step 2: Load into HeliosDB via PostgreSQL protocol
import psycopg2
conn = psycopg2.connect("host=localhost port=5432 dbname=helios")
cur = conn.cursor()
for obj in objects['Contents']:
body = s3.get_object(Bucket='documents', Key=obj['Key'])
data = json.loads(body['Body'].read())
cur.execute("""
INSERT INTO documents (key, data, created_at)
VALUES (%s, %s, %s)
""", (obj['Key'], json.dumps(data), obj['LastModified']))
conn.commit()
-- HeliosDB-Full: query MinIO data without moving it
CREATE FOREIGN TABLE minio_logs
SERVER minio_server
OPTIONS (bucket 'logs', key '2026/*.parquet', format 'parquet');
-- Local metadata + remote blobs in one query
SELECT m.filename, m.upload_date, l.error_count
FROM file_metadata m
JOIN minio_logs l ON l.filename = m.filename
WHERE l.error_count > 0
ORDER BY l.error_count DESC;
| Dimension | MinIO | HeliosDB |
|---|---|---|
| Best at | Storing petabytes of blobs cheaply | Querying, searching, and analyzing data intelligently |
| Data model | Opaque objects in buckets | Tables + JSONB + vectors + time-travel |
| Query power | S3 Select (single-file filter) | Full SQL optimizer (69+ plan nodes) |
| Search | None (external services) | FTS + vector + hybrid -- all native |
| Transactions | Single-object | Multi-statement ACID with MVCC |
| Encryption | SSE (object-level) | TDE + ZKE (database-wide, zero-knowledge) |
| AI/RAG | Storage layer for AI pipelines | Integrated AI data platform |
| Deployment | 4+ nodes minimum for HA | Single binary, embeddable, or clustered |
| Together | HeliosDB-Full queries MinIO via S3 FDW | Best of both worlds |
MinIO excels at what it was built for: high-performance, S3-compatible object storage at massive scale. HeliosDB excels at what it was built for: making data queryable, searchable, and intelligent. The ideal architecture often uses both -- MinIO as the cost-effective bulk storage tier, and HeliosDB as the intelligent query layer that makes that data useful.
Get started with HeliosDB in minutes. Open source, free to use.