Hybrid Direct + gRPC Storage Architecture
Hybrid Direct + gRPC Storage Architecture
Overview
HeliosDB uses a hybrid storage access pattern that optimizes for both local and remote operations in distributed deployments. This architecture combines:
- Direct Access: Sub-microsecond latency for local shard operations
- gRPC: ~100μs overhead for remote shard operations
- Parallel Scatter-Gather: Concurrent access to all shards for analytics
Architecture Diagram
┌─────────────────────────────────────────────────────────────────────┐│ Compute Layer ││ ┌───────────────────────────────────────────────────────────────┐ ││ │ Query Router │ ││ │ │ ││ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ ││ │ │ Point Lookup│ │ Multi-Get │ │ Full Scan/Agg │ │ ││ │ │ │ │ │ │ │ │ ││ │ │ Local: DIR │ │ HYBRID │ │ PARALLEL SCATTER │ │ ││ │ │ Remote: RPC │ │ Dir + Batch │ │ GATHER │ │ ││ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ ││ └───────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ Direct (local) │ gRPC (remote) │ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ NativeBackend│ │ NativeBackend│ │ NativeBackend│ │ Shard 0 │ │ Shard 1 │ │ Shard 2 │ │ (local) │ │ (remote) │ │ (remote) │ │ │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ WAL │ │ │ │ WAL │ │ │ │ WAL │ │ │ │O_DIRECT│ │ │ │O_DIRECT│ │ │ │O_DIRECT│ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │Segments│ │ │ │Segments│ │ │ │Segments│ │ │ │ 64MB │ │ │ │ 64MB │ │ │ │ 64MB │ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │Filters │ │ │ │Filters │ │ │ │Filters │ │ │ │XOR/Bloom│ │ │ │XOR/Bloom│ │ │ │XOR/Bloom│ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │ └──────────────┘ └──────────────┘ └──────────────┘Operation Routing Strategy
| Operation | Local Shard | Remote Shard | Rationale |
|---|---|---|---|
| Point Lookup | DIRECT (1-2μs) | gRPC (~100μs) | Avoid network for local |
| Multi-Get | DIRECT | gRPC batch | Batch amortizes RPC overhead |
| Range Scan | DIRECT | gRPC stream | Streaming reduces memory |
| Full Table Scan | DIRECT seq | PARALLEL scatter | Parallelism at scale |
| COUNT/SUM/AVG | DIRECT | PUSHDOWN | Compute at storage tier |
| INSERT/UPDATE | DIRECT | gRPC | Avoid network for local |
| Bulk Import | PARALLEL | PARALLEL | All shards simultaneously |
Benchmark Results
Test Configuration
- Scale: 4.5GB total (1.5GB per shard)
- Rows: 22.5M total (7.5M per shard)
- Shards: 3 NativeBackend instances
- Environment: Localhost (no network latency)
Point Lookup Performance
┌────────────────┬───────────┬──────────┬─────────┐│ Approach │ Time │ Avg/op │ Speedup │├────────────────┼───────────┼──────────┼─────────┤│ All gRPC │ 95.2ms │ 95.2μs │ 1.0x ││ Hybrid D+gRPC │ 68.0ms │ 68.0μs │ 1.4x ││ Direct only │ 0.5ms │ 1.7μs │ 56.0x │└────────────────┴───────────┴──────────┴─────────┘Multi-Get Performance (Batched)
┌─────────────────────┬───────────┬───────────┬─────────┐│ Approach │ Time │ Avg/batch │ Speedup │├─────────────────────┼───────────┼───────────┼─────────┤│ gRPC (individual) │ 92.4ms │ 0.92ms │ 1.0x ││ Hybrid + multi_get │ 21.8ms │ 0.22ms │ 4.2x │└─────────────────────┴───────────┴───────────┴─────────┘Write Throughput
┌────────────────┬───────────┬──────────────┬─────────┐│ Approach │ Time │ Throughput │ Speedup │├────────────────┼───────────┼──────────────┼─────────┤│ All gRPC │ 97.3ms │ 10,275/sec │ 1.0x ││ Hybrid D+gRPC │ 70.2ms │ 14,255/sec │ 1.4x │└────────────────┴───────────┴──────────────┴─────────┘Data Population (Parallel Ingestion)
Scale │ Rows │ Time │ Throughput──────────┼─────────────┼─────────┼────────────4.5GB │ 22,500,000 │ 12.8s │ 1.76M rows/sec15GB │ 75,000,000 │ 45.5s │ 1.65M rows/secWhen to Use Each Approach
Use Direct Access When:
- Data is on the local shard
- Point lookups (single key)
- Low-latency writes
- Single-shard transactions
Use gRPC When:
- Data is on a remote shard
- Cross-shard queries
- Need network transport
Use Parallel Scatter-Gather When:
- Full table scans across shards
- Analytics queries (GROUP BY, aggregations)
- Data size exceeds single-node capacity
- Network latency can be hidden by parallelism
Use Aggregation Pushdown When:
- COUNT, SUM, AVG, MIN, MAX operations
- Reduces data transfer (send results, not raw data)
- Parallel execution across shards
Implementation Example
use heliosdb_lite::storage::backend::{StorageBackend, NativeBackend};use heliosdb_lite::grpc::{StorageNodeServer, StorageClient};
// Setup: 3 shards with gRPC serverslet shard0 = Arc::new(NativeBackend::open_in_memory()?);let shard1 = Arc::new(NativeBackend::open_in_memory()?);let shard2 = Arc::new(NativeBackend::open_in_memory()?);
// Start gRPC servers for remote accessStorageNodeServer::new(shard1.clone()).with_addr(addr1).serve_background();StorageNodeServer::new(shard2.clone()).with_addr(addr2).serve_background();
// Connect clients for remote shardslet mut client1 = StorageClient::connect("127.0.0.1:50201").await?;let mut client2 = StorageClient::connect("127.0.0.1:50202").await?;
// Hybrid point lookupasync fn get_key(key: &[u8], shard_id: usize) -> Result<Option<Vec<u8>>> { let local_shard_id = 0;
if shard_id == local_shard_id { // Direct access for local shard (sub-microsecond) shard0.get(key) } else { // gRPC for remote shards (~100μs) match shard_id { 1 => client1.get("table", key, None).await, 2 => client2.get("table", key, None).await, _ => unreachable!(), } }}
// Parallel scatter-gather for full scanlet (r0, r1, r2) = tokio::join!( async { shard0.scan(b"data:", None) }, client1.scan("data", None, None, None), client2.scan("data", None, None, None));let total_rows = r0?.count() + r1?.len() + r2?.len();NativeBackend Features
The hybrid architecture is built on NativeBackend, which provides:
| Feature | Description |
|---|---|
| O_DIRECT WAL | Bypass OS cache for durability |
| Segment Storage | 64MB segments for efficient I/O |
| MVCC Snapshots | Consistent reads without blocking |
| Bloom/XOR Filters | Fast negative lookups |
| Compression | zstd, lz4 support |
| True In-Memory | No temp files required |
| Range Scans | Efficient prefix iteration |
Migration from RocksDB
RocksDB is deprecated in favor of NativeBackend + gRPC:
# Build without RocksDB (recommended for new projects)cargo build --no-default-features --features encryption,vector-search,ring-crypto
# Build with RocksDB (for backward compatibility)cargo build # includes rocksdb-backend feature by defaultRunning the Benchmark
# Quick test (150MB)cargo run --release --example hybrid_benchmark -- 0.01
# Medium test (1.5GB)cargo run --release --example hybrid_benchmark -- 0.1
# Large test (4.5GB, requires ~8GB RAM)cargo run --release --example hybrid_benchmark -- 0.3
# Full test (15GB, requires ~24GB RAM)cargo run --release --example hybrid_benchmark -- 1.0See Also
examples/hybrid_benchmark.rs- Full benchmark implementationsrc/storage/backend/mod.rs- Backend abstraction layersrc/grpc/- gRPC protocol implementation