Skip to content

Hybrid Direct + gRPC Storage Architecture

Hybrid Direct + gRPC Storage Architecture

Overview

HeliosDB uses a hybrid storage access pattern that optimizes for both local and remote operations in distributed deployments. This architecture combines:

  • Direct Access: Sub-microsecond latency for local shard operations
  • gRPC: ~100μs overhead for remote shard operations
  • Parallel Scatter-Gather: Concurrent access to all shards for analytics

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│ Compute Layer │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ Query Router │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ Point Lookup│ │ Multi-Get │ │ Full Scan/Agg │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Local: DIR │ │ HYBRID │ │ PARALLEL SCATTER │ │ │
│ │ │ Remote: RPC │ │ Dir + Batch │ │ GATHER │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
┌────────────────────────┼────────────────────────┐
│ Direct (local) │ gRPC (remote) │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ NativeBackend│ │ NativeBackend│ │ NativeBackend│
│ Shard 0 │ │ Shard 1 │ │ Shard 2 │
│ (local) │ │ (remote) │ │ (remote) │
│ │ │ │ │ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ WAL │ │ │ │ WAL │ │ │ │ WAL │ │
│ │O_DIRECT│ │ │ │O_DIRECT│ │ │ │O_DIRECT│ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │Segments│ │ │ │Segments│ │ │ │Segments│ │
│ │ 64MB │ │ │ │ 64MB │ │ │ │ 64MB │ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │Filters │ │ │ │Filters │ │ │ │Filters │ │
│ │XOR/Bloom│ │ │ │XOR/Bloom│ │ │ │XOR/Bloom│ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
└──────────────┘ └──────────────┘ └──────────────┘

Operation Routing Strategy

OperationLocal ShardRemote ShardRationale
Point LookupDIRECT (1-2μs)gRPC (~100μs)Avoid network for local
Multi-GetDIRECTgRPC batchBatch amortizes RPC overhead
Range ScanDIRECTgRPC streamStreaming reduces memory
Full Table ScanDIRECT seqPARALLEL scatterParallelism at scale
COUNT/SUM/AVGDIRECTPUSHDOWNCompute at storage tier
INSERT/UPDATEDIRECTgRPCAvoid network for local
Bulk ImportPARALLELPARALLELAll shards simultaneously

Benchmark Results

Test Configuration

  • Scale: 4.5GB total (1.5GB per shard)
  • Rows: 22.5M total (7.5M per shard)
  • Shards: 3 NativeBackend instances
  • Environment: Localhost (no network latency)

Point Lookup Performance

┌────────────────┬───────────┬──────────┬─────────┐
│ Approach │ Time │ Avg/op │ Speedup │
├────────────────┼───────────┼──────────┼─────────┤
│ All gRPC │ 95.2ms │ 95.2μs │ 1.0x │
│ Hybrid D+gRPC │ 68.0ms │ 68.0μs │ 1.4x │
│ Direct only │ 0.5ms │ 1.7μs │ 56.0x │
└────────────────┴───────────┴──────────┴─────────┘

Multi-Get Performance (Batched)

┌─────────────────────┬───────────┬───────────┬─────────┐
│ Approach │ Time │ Avg/batch │ Speedup │
├─────────────────────┼───────────┼───────────┼─────────┤
│ gRPC (individual) │ 92.4ms │ 0.92ms │ 1.0x │
│ Hybrid + multi_get │ 21.8ms │ 0.22ms │ 4.2x │
└─────────────────────┴───────────┴───────────┴─────────┘

Write Throughput

┌────────────────┬───────────┬──────────────┬─────────┐
│ Approach │ Time │ Throughput │ Speedup │
├────────────────┼───────────┼──────────────┼─────────┤
│ All gRPC │ 97.3ms │ 10,275/sec │ 1.0x │
│ Hybrid D+gRPC │ 70.2ms │ 14,255/sec │ 1.4x │
└────────────────┴───────────┴──────────────┴─────────┘

Data Population (Parallel Ingestion)

Scale │ Rows │ Time │ Throughput
──────────┼─────────────┼─────────┼────────────
4.5GB │ 22,500,000 │ 12.8s │ 1.76M rows/sec
15GB │ 75,000,000 │ 45.5s │ 1.65M rows/sec

When to Use Each Approach

Use Direct Access When:

  • Data is on the local shard
  • Point lookups (single key)
  • Low-latency writes
  • Single-shard transactions

Use gRPC When:

  • Data is on a remote shard
  • Cross-shard queries
  • Need network transport

Use Parallel Scatter-Gather When:

  • Full table scans across shards
  • Analytics queries (GROUP BY, aggregations)
  • Data size exceeds single-node capacity
  • Network latency can be hidden by parallelism

Use Aggregation Pushdown When:

  • COUNT, SUM, AVG, MIN, MAX operations
  • Reduces data transfer (send results, not raw data)
  • Parallel execution across shards

Implementation Example

use heliosdb_lite::storage::backend::{StorageBackend, NativeBackend};
use heliosdb_lite::grpc::{StorageNodeServer, StorageClient};
// Setup: 3 shards with gRPC servers
let shard0 = Arc::new(NativeBackend::open_in_memory()?);
let shard1 = Arc::new(NativeBackend::open_in_memory()?);
let shard2 = Arc::new(NativeBackend::open_in_memory()?);
// Start gRPC servers for remote access
StorageNodeServer::new(shard1.clone()).with_addr(addr1).serve_background();
StorageNodeServer::new(shard2.clone()).with_addr(addr2).serve_background();
// Connect clients for remote shards
let mut client1 = StorageClient::connect("127.0.0.1:50201").await?;
let mut client2 = StorageClient::connect("127.0.0.1:50202").await?;
// Hybrid point lookup
async fn get_key(key: &[u8], shard_id: usize) -> Result<Option<Vec<u8>>> {
let local_shard_id = 0;
if shard_id == local_shard_id {
// Direct access for local shard (sub-microsecond)
shard0.get(key)
} else {
// gRPC for remote shards (~100μs)
match shard_id {
1 => client1.get("table", key, None).await,
2 => client2.get("table", key, None).await,
_ => unreachable!(),
}
}
}
// Parallel scatter-gather for full scan
let (r0, r1, r2) = tokio::join!(
async { shard0.scan(b"data:", None) },
client1.scan("data", None, None, None),
client2.scan("data", None, None, None)
);
let total_rows = r0?.count() + r1?.len() + r2?.len();

NativeBackend Features

The hybrid architecture is built on NativeBackend, which provides:

FeatureDescription
O_DIRECT WALBypass OS cache for durability
Segment Storage64MB segments for efficient I/O
MVCC SnapshotsConsistent reads without blocking
Bloom/XOR FiltersFast negative lookups
Compressionzstd, lz4 support
True In-MemoryNo temp files required
Range ScansEfficient prefix iteration

Migration from RocksDB

RocksDB is deprecated in favor of NativeBackend + gRPC:

Terminal window
# Build without RocksDB (recommended for new projects)
cargo build --no-default-features --features encryption,vector-search,ring-crypto
# Build with RocksDB (for backward compatibility)
cargo build # includes rocksdb-backend feature by default

Running the Benchmark

Terminal window
# Quick test (150MB)
cargo run --release --example hybrid_benchmark -- 0.01
# Medium test (1.5GB)
cargo run --release --example hybrid_benchmark -- 0.1
# Large test (4.5GB, requires ~8GB RAM)
cargo run --release --example hybrid_benchmark -- 0.3
# Full test (15GB, requires ~24GB RAM)
cargo run --release --example hybrid_benchmark -- 1.0

See Also

  • examples/hybrid_benchmark.rs - Full benchmark implementation
  • src/storage/backend/mod.rs - Backend abstraction layer
  • src/grpc/ - gRPC protocol implementation