Skip to content

HeliosDB Initial Architecture Review

HeliosDB Initial Architecture Review

Review ID: 001 Date: 2025-10-10 Reviewer: Reviewer Agent (Hive Mind) Status: INITIAL ASSESSMENT


Executive Summary

This is the initial architecture review of the HeliosDB project, conducted after analyzing the design specifications in Design-Guidelines-1.md and Design-Guidelines-2.md, as well as the protocol compatibility requirements. The project is in its early stages with only basic scaffolding completed.

Overall Assessment: NEEDS ATTENTION - Architecture is well-designed on paper, but implementation has not yet begun in earnest. Several critical architectural decisions require validation before significant code is written.


1. Architecture Compliance Review

1.1 Shared-Nothing Architecture

Status: NOT YET IMPLEMENTED Specification Requirement: Two-tier, shared-nothing architecture with decoupled compute and storage.

Findings:

  • Workspace structure correctly separates concerns: heliosdb-compute, heliosdb-storage, heliosdb-metadata, heliosdb-network, heliosdb-vector
  • No implementation code exists yet to validate the separation
  • No cross-tier dependencies have been defined

Recommendations:

  1. Define clear trait boundaries between compute and storage tiers BEFORE implementing logic
  2. Create a heliosdb-protocol crate to enforce the contract between tiers
  3. Ensure compute nodes have NO direct file I/O access to storage data

Priority: CRITICAL


1.2 RDMA/RoCEv2 Network Layer

Status: NOT IMPLEMENTED Specification Requirement: RDMA over Converged Ethernet (RoCEv2) for inter-node communication.

Findings:

  • Workspace dependency declares rdma = "0.1" in Cargo.toml
  • The rdma crate version 0.1 is OUTDATED and not production-ready
  • No network protocol implementation exists in heliosdb-network
  • HIDB protocol layer not defined

Critical Issues:

  1. RDMA Crate Maturity: The current rdma crate is experimental. Consider alternatives:

    • Direct ibverbs FFI bindings for production use
    • Fallback to TCP/gRPC for initial development phases
    • Use rdma-core system libraries with custom bindings
  2. HIDB Protocol Not Defined: No protobuf definitions exist for:

    • PredicatePushdownRequest
    • FilteredResultSet
    • VectorSearchRequest
    • CacheInvalidationNotice
    • ReplicationDataStream

Recommendations:

  1. Create a phased network implementation:
    • Phase 1: TCP/gRPC for development and testing
    • Phase 2: RDMA integration with fallback to TCP
  2. Define protobuf schemas in heliosdb-network/proto/ directory
  3. Implement protocol auto-negotiation (RDMA vs TCP)
  4. Add latency benchmarking requirements (target: <10μs for RDMA operations)

Priority: HIGH


1.3 Metadata Service & Raft Consensus

Status: NOT IMPLEMENTED Specification Requirement: Raft-based consensus using etcd/raft library, 3-5 node cluster.

Findings:

  • Workspace declares raft = "0.7" dependency
  • Declares etcd-client = "0.14" but specification calls for etcd-io/raft library, not client
  • No implementation in heliosdb-metadata

Critical Issues:

  1. Dependency Confusion: The spec requires using the etcd-io/raft library (the consensus algorithm), but Cargo.toml includes etcd-client (for connecting to etcd servers). These serve different purposes.

  2. Missing Components:

    • Raft log storage layer (should use RocksDB)
    • gRPC transport for Raft messages
    • Snapshot mechanism
    • Leader election logic

Recommendations:

  1. Correct dependencies:

    raft = "0.7" # Keep this - correct Raft implementation
    # REMOVE etcd-client # Not needed - we're building our own metadata service
    prost = "0.13" # For Raft message serialization
  2. Create initial Raft integration in heliosdb-metadata:

    • Implement Storage trait from raft crate
    • Build network transport layer with gRPC
    • Define metadata state machine operations
  3. Design metadata schema:

    • Shard topology table
    • Schema definitions
    • Node health status
    • Configuration parameters

Priority: CRITICAL


2. Data Organization & Storage Review

2.1 LSM-Tree Storage Engine

Status: NOT IMPLEMENTED Specification Requirement: Log-Structured Merge-tree with configurable compaction strategies.

Findings:

  • Workspace declares rocksdb = "0.22" dependency
  • No implementation in heliosdb-storage
  • No evidence of custom LSM implementation plans

Architectural Decision Required:

Option A: Use RocksDB Directly

  • Pros: Battle-tested, high performance, rich feature set
  • Cons: Less control over internal behavior, C++ dependency, harder to customize for HeliosDB-specific features

Option B: Custom LSM Implementation

  • Pros: Full control, optimized for HeliosDB workloads, pure Rust
  • Cons: Significant development time, potential for bugs, need to prove performance parity

Recommendation:

  1. Phase 1: Use RocksDB directly with custom column families
  2. Phase 2: Evaluate performance bottlenecks
  3. Phase 3: Consider custom LSM only if RocksDB limitations are proven

Specific Implementation Guidance:

  • Use RocksDB column families to separate:
    • Row data (default CF)
    • Vector data (TOAST CF)
    • Index data (separate CFs per index type)
    • Tombstones (tracked separately for gc_grace_seconds)

Priority: HIGH


2.2 Sharding & Consistent Hashing

Status: NOT IMPLEMENTED Specification Requirement: System-managed consistent hashing with user-defined sharding key.

Findings:

  • No hash ring implementation
  • No shard assignment logic
  • No data migration framework

Critical Issues:

  1. Hash Function Not Specified: Must choose between:

    • Jump Consistent Hash (Google) - fast, minimal memory
    • Consistent Hashing with Virtual Nodes - better distribution
    • Rendezvous Hashing - stateless, simple
  2. Shard Rebalancing Not Designed: When nodes are added/removed, how is data moved?

Recommendations:

  1. Implement Jump Consistent Hash as default (fastest, proven)
  2. Create heliosdb-common/src/hash.rs with:
    pub fn compute_shard_id(key: &[u8], num_shards: u64) -> u64;
    pub fn get_shard_range(shard_id: u64, total_shards: u64) -> (u64, u64);
  3. Design shard migration protocol in HIDB
  4. Implement backpressure during rebalancing

Priority: HIGH


2.3 Hybrid Columnar Compression (HCC)

Status: NOT IMPLEMENTED Specification Requirement: Compression Units with columnar layout, LZ4/ZSTD compression.

Findings:

  • Workspace declares lz4 = "1.28" and zstd = "0.13"
  • No HCC implementation
  • No data format specification

Concerns:

  1. Format Stability: HCC format needs versioning from day one
  2. DML Performance: Spec correctly identifies HCC as problematic for updates - need clear guidance on when to use
  3. Memory Amplification: Decompression buffers could be significant

Recommendations:

  1. DO NOT implement HCC in Phase 1 - focus on core functionality first
  2. When implementing:
    • Create separate storage engine variant (not mixed with LSM)
    • Use separate RocksDB column family
    • Implement background migration from row format to HCC
  3. Define clear table-level storage directives:
    CREATE TABLE logs (...) WITH (
    storage_format = 'HCC',
    compression_mode = 'WAREHOUSE_OPTIMIZED'
    );

Priority: LOW (defer to Phase 2)


3. Vector Database Integration Review

3.1 VECTOR Data Type

Status: NOT IMPLEMENTED Specification Requirement: VECTOR(n) data type with TOAST-like storage.

Findings:

  • Workspace declares hnsw = "0.11" for vector indexing
  • No type system implementation
  • No TOAST implementation

Critical Issues:

  1. Type System Not Defined: No core type system exists yet to add VECTOR to
  2. HNSW Crate Limitations: Version 0.11 may not support filtered search
  3. No IVF Implementation: Spec requires both HNSW and IVF, but only HNSW dependency exists

Recommendations:

  1. Create type system first in heliosdb-common/src/types.rs:

    pub enum DataType {
    Int64,
    Float64,
    Varchar(u32),
    Vector(u32), // u32 = dimensions
    // ... other types
    }
  2. For vector indexing, consider using more mature libraries:

    • faiss-rs (bindings to Meta’s FAISS) - supports both HNSW and IVF
    • hnswlib-rs (bindings to hnswlib) - more mature than hnsw crate
    • Custom implementation only if necessary
  3. Implement TOAST storage strategy:

    • Store small vectors (<2KB) inline
    • Store large vectors in separate key-value space
    • Use reference counting for deduplication

Priority: MEDIUM


Status: NOT IMPLEMENTED Specification Requirement: Filter-aware HNSW traversal with bitmap allow-lists.

Findings:

  • Extremely complex feature requiring tight integration
  • No prototype or proof-of-concept exists
  • Algorithm described in spec is research-grade

Critical Issues:

  1. Complexity Underestimated: This is a PhD-level implementation challenge
  2. Performance Validation Required: No benchmarks exist to prove this approach works
  3. Alternative Approaches Not Considered

Recommendations:

  1. Phase 1: Implement post-filtering (retrieve top-K, then filter) - simple but works

  2. Phase 2: Implement pre-filtering with HNSW search on subset - moderate complexity

  3. Phase 3: Implement joint-filtering with multi-hop traversal - only if Phase 2 proves insufficient

  4. Prototype filtered search performance BEFORE committing to architecture:

    Test scenarios:
    - 1M vectors, 10% pass filter - measure recall and latency
    - 1M vectors, 1% pass filter - does algorithm still work?
    - 1M vectors, 0.1% pass filter - compare to brute force
  5. Consider using Weaviate’s open-source filtered HNSW as reference

Priority: MEDIUM (but needs early prototyping)


4. Protocol Compatibility Review

4.1 Multi-Protocol Support

Status: NOT IMPLEMENTED Specification Requirement: Support PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, Databricks, Pinecone protocols.

Findings:

  • No protocol handlers exist
  • No protocol detection logic
  • Extremely ambitious scope for initial release

Critical Concerns:

  1. Scope Creep Risk: Supporting 8 different protocols is a MASSIVE undertaking
  2. Resource Allocation: This could consume 80% of development effort
  3. Compliance Testing: Each protocol needs comprehensive test suite

Security Issues:

  1. Authentication Surface: 8 different auth mechanisms to secure
  2. SQL Injection: 8 different parameter binding styles to validate
  3. TLS Configuration: Different TLS requirements per protocol

Recommendations:

STRONGLY RECOMMEND PHASED APPROACH:

Phase 1 (MVP): PostgreSQL protocol ONLY

  • Rationale:
    • Most Python libraries support it (psycopg2, asyncpg, SQLAlchemy)
    • Clean wire protocol specification
    • Good tooling for testing
  • Target: Gold-level compliance

Phase 1.5: Add MySQL protocol

  • Rationale: Second most common in Python ecosystem
  • Target: Gold-level compliance

Phase 2: Add HTTP-based protocols (Snowflake, Databricks, Pinecone)

  • Rationale: Simpler than binary protocols, REST-based
  • Target: Silver-level compliance

Phase 3: Add enterprise protocols (SQL Server, DB2, Oracle)

  • Rationale: Only if customer demand exists
  • Target: Bronze-level compliance

Specific Implementation Plan:

  1. Create heliosdb-compute/src/protocol/ module structure:

    protocol/
    ├── mod.rs # Protocol detection & routing
    ├── postgres/ # PostgreSQL handler
    │ ├── mod.rs
    │ ├── auth.rs # SCRAM-SHA-256
    │ ├── codec.rs # Wire format
    │ └── handler.rs # Query execution
    ├── mysql/ # Phase 1.5
    └── http/ # Phase 2
  2. Implement protocol router from 02_PROTOCOL_ROUTER_PSEUDOCODE.md:

    • Use bytes::Buf for efficient peeking
    • Implement TLS negotiation first
    • Use ALPN for fast-path detection
  3. Create normalized internal query representation:

    pub struct QueryContext {
    pub user: String,
    pub database: String,
    pub query: LogicalPlan,
    pub parameters: Vec<Value>,
    }

Priority: CRITICAL (but reduce scope dramatically)


5. Code Quality & Safety Review

5.1 Error Handling

Status: BASIC IMPLEMENTATION EXISTS Review: /home/claude/DMD/heliosdb-common/src/error.rs

Findings: Good use of thiserror for ergonomic error types Proper error variants defined ⚠ Missing context information in errors ❌ No error codes for protocol compatibility

Issues:

  1. Error messages lack context (which shard? which key? which node?)
  2. No error code system for mapping to SQL error codes
  3. No distinction between retryable and non-retryable errors

Recommendations:

#[derive(Error, Debug)]
pub enum HeliosError {
#[error("Storage error on shard {shard_id}: {message}")]
Storage { shard_id: u64, message: String },
#[error("Network error communicating with {node}: {message}")]
Network { node: String, message: String },
// Add error codes for protocol mapping
#[error("SQL error {code}: {message}")]
Sql { code: SqlErrorCode, message: String },
}
pub enum SqlErrorCode {
ConnectionFailure,
SyntaxError,
ConstraintViolation,
// ... map to PostgreSQL error codes
}
impl HeliosError {
pub fn is_retryable(&self) -> bool {
match self {
HeliosError::Network { .. } => true,
HeliosError::Transaction(..) => true,
_ => false,
}
}
}

Priority: MEDIUM


5.2 Concurrency & Thread Safety

Status: NOT IMPLEMENTED

Findings:

  • Workspace declares crossbeam = "0.8" and parking_lot = "0.12"
  • No concurrent data structures implemented yet
  • No locking strategy defined

Concerns:

  1. Lock Granularity Not Defined: Will cause contention issues later
  2. Deadlock Prevention: No strategy documented
  3. Async vs Sync: Using Tokio but also crossbeam - need clear boundaries

Recommendations:

  1. Define clear async/sync boundaries:

    • Network I/O: async (Tokio)
    • Storage I/O: sync or async based on RocksDB usage
    • Compute operations: use Rayon for parallelism
  2. Use lock-free data structures where possible:

    • Crossbeam channels for message passing
    • Arc only when necessary
    • Prefer message passing over shared state
  3. Document locking order to prevent deadlocks:

    Lock Order Hierarchy:
    1. Metadata Service state
    2. Shard topology map
    3. Individual shard locks
    4. Transaction locks

Priority: HIGH


5.3 Memory Safety & Rust Patterns

Status: CANNOT ASSESS (no code yet)

Preemptive Guidance:

  1. Avoid unsafe unless absolutely necessary (RDMA operations only)

  2. Use #![forbid(unsafe_code)] in all crates except heliosdb-network

  3. All unsafe code requires:

    • Detailed safety comments
    • Mandatory code review
    • Comprehensive testing
  4. Prefer owned types over references in async code:

    // BAD - lifetime issues
    async fn process_query<'a>(query: &'a Query) -> Result<()>
    // GOOD - owned data
    async fn process_query(query: Query) -> Result<()>

Priority: HIGH (enforce early)


6. Testing & Quality Assurance

6.1 Test Coverage Requirements

Status: NO TESTS EXIST

Specification Gap: Design documents don’t mention testing strategy.

Required Test Categories:

  1. Unit Tests: (target 80% coverage)

    • All public APIs
    • Error handling paths
    • Edge cases (empty data, max values)
  2. Integration Tests:

    • Compute-storage interaction
    • Network protocol compliance
    • Consensus behavior (Raft)
  3. Protocol Compliance Tests: (from 01_PROTOCOL_TEST_MATRIX.md)

    • PostgreSQL: psycopg2, asyncpg, SQLAlchemy
    • MySQL: mysql-connector, PyMySQL
    • Must pass before merge
  4. Performance Tests:

    • Latency benchmarks (P50, P99, P999)
    • Throughput tests (inserts/sec, queries/sec)
    • Scalability tests (1 node, 3 nodes, 10 nodes)
  5. Chaos Tests:

    • Network partitions
    • Node failures
    • Concurrent writes to same shard

Recommendations:

  1. Set up CI/CD with GitHub Actions before writing code
  2. Create tests/ directory with protocol compliance tests first
  3. Use cargo-tarpaulin for coverage reporting
  4. Gate merges on test passage

Priority: CRITICAL


7. Security Review

7.1 Authentication & Authorization

Status: NOT IMPLEMENTED

Specification: Multiple auth mechanisms per protocol (SCRAM, password, OAuth, API keys)

Security Concerns:

  1. Credential Storage: How are passwords hashed? (recommend Argon2)
  2. Session Management: Token generation, expiration, revocation?
  3. TLS Configuration: What cipher suites? Certificate management?
  4. Authorization Model: Role-based access control not specified

Recommendations:

  1. Use industry-standard libraries:

    • argon2 for password hashing
    • rustls for TLS (pure Rust, safer than OpenSSL)
    • jsonwebtoken for JWT handling
  2. Implement principle of least privilege:

    • Default: no permissions
    • Explicit grants required
    • Row-level security for multi-tenant use
  3. Security testing requirements:

    • Penetration testing before production
    • Audit all SQL injection vectors
    • Test TLS configuration with ssllabs

Priority: HIGH


7.2 Data Security

Status: NOT SPECIFIED

Missing from Specification:

  • Encryption at rest
  • Encryption in transit (beyond TLS)
  • Key management
  • Audit logging

Recommendations:

  1. Add to specification:

    • Encryption at rest using RocksDB encryption feature
    • RDMA encryption (if available, or fallback to TLS)
    • Key rotation policy
  2. Implement audit logging:

    • All DDL operations
    • Authentication attempts
    • Data access patterns (for compliance)

Priority: MEDIUM (but required for enterprise use)


8. Observability & Operations

8.1 Logging & Metrics

Status: BASIC SETUP Workspace declares: tracing = "0.1" and tracing-subscriber = "0.3"

Missing:

  • Metrics collection (Prometheus)
  • Distributed tracing
  • Health check endpoints

Recommendations:

  1. Add dependencies:

    metrics = "0.21"
    metrics-exporter-prometheus = "0.12"
    opentelemetry = "0.20"
  2. Instrument critical paths:

    • Query latency (histogram)
    • Network operation latency (histogram)
    • Error rates (counter)
    • Active connections (gauge)
    • Compaction progress (gauge)
  3. Implement health check endpoints:

    GET /health/liveness -> 200 if process is alive
    GET /health/readiness -> 200 if ready to serve traffic
    GET /metrics -> Prometheus metrics

Priority: MEDIUM


9. Documentation Gaps

9.1 Missing Documentation

  1. API documentation (rustdoc)
  2. Deployment guide
  3. Performance tuning guide
  4. Troubleshooting guide
  5. Architecture decision records (ADRs)

Recommendations:

  1. Start ADR practice now:

    • docs/adr/001-use-rocksdb-for-storage.md
    • docs/adr/002-postgres-protocol-first.md
  2. Require rustdoc on all public APIs:

    #![warn(missing_docs)]

Priority: MEDIUM


10. Critical Path & Prioritization

Phase 1: Core Foundation (Months 1-3)

  1. Type system and data model (heliosdb-common)
  2. LSM storage engine with RocksDB (heliosdb-storage)
  3. Basic network layer with gRPC/TCP (heliosdb-network)
  4. Metadata service with Raft (heliosdb-metadata)
  5. PostgreSQL protocol handler (heliosdb-compute)
  6. Basic query execution (SELECT, INSERT, UPDATE, DELETE)

Success Criteria: Can run Python app with psycopg2, basic CRUD works


Phase 2: Distribution (Months 4-6)

  1. Sharding with consistent hashing
  2. Shard replication (primary + mirror)
  3. Witness-based failover
  4. Cache invalidation mechanism
  5. Query distribution & aggregation

Success Criteria: 3-node cluster, fault tolerance, distributed queries


Phase 3: Advanced Features (Months 7-12)

  1. Predicate pushdown engine
  2. Vector data type and storage
  3. HNSW indexing
  4. Filtered vector search (post-filtering only)
  5. MySQL protocol handler
  6. HTTP API gateway

Success Criteria: Vector search works, multi-protocol support


Phase 4: Performance & Polish (Months 13+)

  1. RDMA network layer
  2. HCC compression
  3. Online aggregation engine
  4. Additional protocol handlers
  5. Production hardening

11. Risk Assessment

Critical Risks:

  1. Scope Overreach (CRITICAL)

    • Risk: Attempting to build everything at once
    • Impact: Project never reaches production
    • Mitigation: Follow phased approach, MVP first
  2. RDMA Complexity (HIGH)

    • Risk: RDMA is difficult to implement and test
    • Impact: Network layer becomes bottleneck
    • Mitigation: Start with TCP, add RDMA later
  3. Protocol Compatibility Burden (HIGH)

    • Risk: Supporting 8 protocols is too much work
    • Impact: Poor implementation quality across all protocols
    • Mitigation: Focus on PostgreSQL only in Phase 1
  4. Consensus Implementation (HIGH)

    • Risk: Raft is complex, easy to get wrong
    • Impact: Data loss or split-brain scenarios
    • Mitigation: Use etcd/raft library, extensive testing
  5. Vector Search Performance (MEDIUM)

    • Risk: Filtered ANN search may not meet performance targets
    • Impact: Feature doesn’t work as advertised
    • Mitigation: Prototype early, have fallback strategy

12. Compliance Checklist

Architecture Compliance:

  • Shared-nothing compute-storage separation
  • RDMA/RoCEv2 network layer (or documented fallback)
  • LSM-tree storage engine
  • Raft consensus for metadata
  • Consistent hashing for sharding
  • Synchronous replication (primary + mirror)
  • Witness-based quorum
  • HCC compression
  • Predicate pushdown engine
  • Vector data type and TOAST storage
  • HNSW and IVF indexes
  • Filtered ANN search

Protocol Compliance:

  • PostgreSQL protocol (Gold level)
  • MySQL protocol (Gold level)
  • Snowflake HTTP API (Silver level)
  • Databricks SQL API (Silver level)
  • Pinecone API (Silver level)
  • SQL Server TDS (Bronze level)
  • DB2 DRDA (Bronze level)
  • Oracle Net (Bronze level)

Quality Standards:

  • 80% test coverage
  • Protocol compliance tests passing
  • Security audit completed
  • Performance benchmarks met
  • Documentation complete

13. Action Items

Immediate Actions (This Week):

  1. Architect meeting to review and approve this review
  2. Reduce protocol scope to PostgreSQL only for Phase 1
  3. Create project roadmap based on phased approach
  4. Set up CI/CD pipeline with basic tests
  5. Create ADR template and first ADRs

Short-term Actions (This Month):

  1. Implement basic type system
  2. Integrate RocksDB storage engine
  3. Build simple metadata service (single node first)
  4. Create PostgreSQL protocol parser
  5. Write first integration test

Long-term Actions (Next Quarter):

  1. Complete Phase 1 implementation
  2. Deploy test cluster
  3. Run performance benchmarks
  4. Conduct security review
  5. Prepare for Phase 2

14. Conclusion

Summary Assessment:

The HeliosDB design specifications are technically sound and well-researched. The architecture draws from proven systems (Exadata, Snowflake, TiDB, ScyllaDB) and makes reasonable trade-offs.

However, the implementation scope is extremely ambitious for a new project. The specifications describe a system that would take a large team (20+ engineers) several years to build to production quality.

Key Recommendations:

  1. Reduce scope dramatically for MVP - Focus on core HTAP capabilities with PostgreSQL protocol only

  2. Defer advanced features - RDMA, HCC, filtered vector search, multiple protocols can come in later phases

  3. Validate assumptions early - Build prototypes for risky components (Raft integration, vector search) before committing to architecture

  4. Prioritize correctness over performance - Get it working correctly first, optimize later

  5. Invest in testing infrastructure - This is a distributed database; bugs will be catastrophic

Confidence Level: MEDIUM

The architecture is solid, but execution risk is high due to scope and complexity. Success depends on disciplined prioritization and phased delivery.


Next Review: Scheduled after Phase 1 core components are implemented (estimated 3 months)

Reviewer Signature: Reviewer Agent (HeliosDB Hive Mind) Review Date: 2025-10-10