Hybrid Direct + gRPC Storage Architecture

Overview

HeliosDB uses a hybrid storage access pattern that optimizes for both local and remote operations in distributed deployments. This architecture combines:

Direct Access: Sub-microsecond latency for local shard operations
gRPC: ~100μs overhead for remote shard operations
Parallel Scatter-Gather: Concurrent access to all shards for analytics

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         Compute Layer                                │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │                      Query Router                              │  │
│  │                                                                │  │
│  │   ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐   │  │
│  │   │ Point Lookup│  │  Multi-Get  │  │   Full Scan/Agg     │   │  │
│  │   │             │  │             │  │                     │   │  │
│  │   │ Local: DIR  │  │   HYBRID    │  │ PARALLEL SCATTER    │   │  │
│  │   │ Remote: RPC │  │ Dir + Batch │  │     GATHER          │   │  │
│  │   └─────────────┘  └─────────────┘  └─────────────────────┘   │  │
│  └───────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
                                   │
          ┌────────────────────────┼────────────────────────┐
          │ Direct (local)         │ gRPC (remote)          │
          ▼                        ▼                        ▼
   ┌──────────────┐        ┌──────────────┐        ┌──────────────┐
   │ NativeBackend│        │ NativeBackend│        │ NativeBackend│
   │   Shard 0    │        │   Shard 1    │        │   Shard 2    │
   │   (local)    │        │   (remote)   │        │   (remote)   │
   │              │        │              │        │              │
   │  ┌────────┐  │        │  ┌────────┐  │        │  ┌────────┐  │
   │  │  WAL   │  │        │  │  WAL   │  │        │  │  WAL   │  │
   │  │O_DIRECT│  │        │  │O_DIRECT│  │        │  │O_DIRECT│  │
   │  └────────┘  │        │  └────────┘  │        │  └────────┘  │
   │  ┌────────┐  │        │  ┌────────┐  │        │  ┌────────┐  │
   │  │Segments│  │        │  │Segments│  │        │  │Segments│  │
   │  │ 64MB   │  │        │  │ 64MB   │  │        │  │ 64MB   │  │
   │  └────────┘  │        │  └────────┘  │        │  └────────┘  │
   │  ┌────────┐  │        │  ┌────────┐  │        │  ┌────────┐  │
   │  │Filters │  │        │  │Filters │  │        │  │Filters │  │
   │  │XOR/Bloom│ │        │  │XOR/Bloom│ │        │  │XOR/Bloom│ │
   │  └────────┘  │        │  └────────┘  │        │  └────────┘  │
   └──────────────┘        └──────────────┘        └──────────────┘

Operation Routing Strategy

Operation	Local Shard	Remote Shard	Rationale
Point Lookup	DIRECT (1-2μs)	gRPC (~100μs)	Avoid network for local
Multi-Get	DIRECT	gRPC batch	Batch amortizes RPC overhead
Range Scan	DIRECT	gRPC stream	Streaming reduces memory
Full Table Scan	DIRECT seq	PARALLEL scatter	Parallelism at scale
COUNT/SUM/AVG	DIRECT	PUSHDOWN	Compute at storage tier
INSERT/UPDATE	DIRECT	gRPC	Avoid network for local
Bulk Import	PARALLEL	PARALLEL	All shards simultaneously

Benchmark Results

Test Configuration

Scale: 4.5GB total (1.5GB per shard)
Rows: 22.5M total (7.5M per shard)
Shards: 3 NativeBackend instances
Environment: Localhost (no network latency)

Point Lookup Performance

┌────────────────┬───────────┬──────────┬─────────┐
│ Approach       │   Time    │  Avg/op  │ Speedup │
├────────────────┼───────────┼──────────┼─────────┤
│ All gRPC       │   95.2ms  │   95.2μs │   1.0x  │
│ Hybrid D+gRPC  │   68.0ms  │   68.0μs │   1.4x  │
│ Direct only    │    0.5ms  │    1.7μs │  56.0x  │
└────────────────┴───────────┴──────────┴─────────┘

Multi-Get Performance (Batched)

┌─────────────────────┬───────────┬───────────┬─────────┐
│ Approach            │   Time    │ Avg/batch │ Speedup │
├─────────────────────┼───────────┼───────────┼─────────┤
│ gRPC (individual)   │   92.4ms  │   0.92ms  │   1.0x  │
│ Hybrid + multi_get  │   21.8ms  │   0.22ms  │   4.2x  │
└─────────────────────┴───────────┴───────────┴─────────┘

Write Throughput

┌────────────────┬───────────┬──────────────┬─────────┐
│ Approach       │   Time    │  Throughput  │ Speedup │
├────────────────┼───────────┼──────────────┼─────────┤
│ All gRPC       │   97.3ms  │   10,275/sec │   1.0x  │
│ Hybrid D+gRPC  │   70.2ms  │   14,255/sec │   1.4x  │
└────────────────┴───────────┴──────────────┴─────────┘

Data Population (Parallel Ingestion)

Scale     │ Rows        │ Time    │ Throughput
──────────┼─────────────┼─────────┼────────────
4.5GB     │ 22,500,000  │ 12.8s   │ 1.76M rows/sec
15GB      │ 75,000,000  │ 45.5s   │ 1.65M rows/sec

When to Use Each Approach

Use Direct Access When:

Data is on the local shard
Point lookups (single key)
Low-latency writes
Single-shard transactions

Use gRPC When:

Data is on a remote shard
Cross-shard queries
Need network transport

Use Parallel Scatter-Gather When:

Full table scans across shards
Analytics queries (GROUP BY, aggregations)
Data size exceeds single-node capacity
Network latency can be hidden by parallelism

Use Aggregation Pushdown When:

COUNT, SUM, AVG, MIN, MAX operations
Reduces data transfer (send results, not raw data)
Parallel execution across shards

Implementation Example

use heliosdb_lite::storage::backend::{StorageBackend, NativeBackend};
use heliosdb_lite::grpc::{StorageNodeServer, StorageClient};

// Setup: 3 shards with gRPC servers
let shard0 = Arc::new(NativeBackend::open_in_memory()?);
let shard1 = Arc::new(NativeBackend::open_in_memory()?);
let shard2 = Arc::new(NativeBackend::open_in_memory()?);

// Start gRPC servers for remote access
StorageNodeServer::new(shard1.clone()).with_addr(addr1).serve_background();
StorageNodeServer::new(shard2.clone()).with_addr(addr2).serve_background();

// Connect clients for remote shards
let mut client1 = StorageClient::connect("127.0.0.1:50201").await?;
let mut client2 = StorageClient::connect("127.0.0.1:50202").await?;

// Hybrid point lookup
async fn get_key(key: &[u8], shard_id: usize) -> Result<Option<Vec<u8>>> {
    let local_shard_id = 0;

    if shard_id == local_shard_id {
        // Direct access for local shard (sub-microsecond)
        shard0.get(key)
    } else {
        // gRPC for remote shards (~100μs)
        match shard_id {
            1 => client1.get("table", key, None).await,
            2 => client2.get("table", key, None).await,
            _ => unreachable!(),
        }
    }
}

// Parallel scatter-gather for full scan
let (r0, r1, r2) = tokio::join!(
    async { shard0.scan(b"data:", None) },
    client1.scan("data", None, None, None),
    client2.scan("data", None, None, None)
);
let total_rows = r0?.count() + r1?.len() + r2?.len();

NativeBackend Features

The hybrid architecture is built on NativeBackend, which provides:

Feature	Description
O_DIRECT WAL	Bypass OS cache for durability
Segment Storage	64MB segments for efficient I/O
MVCC Snapshots	Consistent reads without blocking
Bloom/XOR Filters	Fast negative lookups
Compression	zstd, lz4 support
True In-Memory	No temp files required
Range Scans	Efficient prefix iteration

Migration from RocksDB

RocksDB is deprecated in favor of NativeBackend + gRPC:

# Build without RocksDB (recommended for new projects)
cargo build --no-default-features --features encryption,vector-search,ring-crypto

# Build with RocksDB (for backward compatibility)
cargo build  # includes rocksdb-backend feature by default

Running the Benchmark

# Quick test (150MB)
cargo run --release --example hybrid_benchmark -- 0.01

# Medium test (1.5GB)
cargo run --release --example hybrid_benchmark -- 0.1

# Large test (4.5GB, requires ~8GB RAM)
cargo run --release --example hybrid_benchmark -- 0.3

# Full test (15GB, requires ~24GB RAM)
cargo run --release --example hybrid_benchmark -- 1.0

Hybrid Direct + gRPC Storage Architecture

Hybrid Direct + gRPC Storage Architecture

Overview

Architecture Diagram

Operation Routing Strategy

Benchmark Results

Test Configuration

Point Lookup Performance

Multi-Get Performance (Batched)

Write Throughput

Data Population (Parallel Ingestion)

When to Use Each Approach

Use Direct Access When:

Use gRPC When:

Use Parallel Scatter-Gather When:

Use Aggregation Pushdown When:

Implementation Example

NativeBackend Features

Migration from RocksDB

Running the Benchmark

See Also