HeliosDB Initial Architecture Review

Review ID: 001 Date: 2025-10-10 Reviewer: Reviewer Agent (Hive Mind) Status: INITIAL ASSESSMENT

Executive Summary

This is the initial architecture review of the HeliosDB project, conducted after analyzing the design specifications in Design-Guidelines-1.md and Design-Guidelines-2.md, as well as the protocol compatibility requirements. The project is in its early stages with only basic scaffolding completed.

Overall Assessment: NEEDS ATTENTION - Architecture is well-designed on paper, but implementation has not yet begun in earnest. Several critical architectural decisions require validation before significant code is written.

1. Architecture Compliance Review

1.1 Shared-Nothing Architecture

Status: NOT YET IMPLEMENTED Specification Requirement: Two-tier, shared-nothing architecture with decoupled compute and storage.

Findings:

Workspace structure correctly separates concerns: heliosdb-compute, heliosdb-storage, heliosdb-metadata, heliosdb-network, heliosdb-vector
No implementation code exists yet to validate the separation
No cross-tier dependencies have been defined

Recommendations:

Define clear trait boundaries between compute and storage tiers BEFORE implementing logic
Create a heliosdb-protocol crate to enforce the contract between tiers
Ensure compute nodes have NO direct file I/O access to storage data

Priority: CRITICAL

1.2 RDMA/RoCEv2 Network Layer

Status: NOT IMPLEMENTED Specification Requirement: RDMA over Converged Ethernet (RoCEv2) for inter-node communication.

Findings:

Workspace dependency declares rdma = "0.1" in Cargo.toml
The rdma crate version 0.1 is OUTDATED and not production-ready
No network protocol implementation exists in heliosdb-network
HIDB protocol layer not defined

Critical Issues:

RDMA Crate Maturity: The current rdma crate is experimental. Consider alternatives:
- Direct ibverbs FFI bindings for production use
- Fallback to TCP/gRPC for initial development phases
- Use rdma-core system libraries with custom bindings
HIDB Protocol Not Defined: No protobuf definitions exist for:
- PredicatePushdownRequest
- FilteredResultSet
- VectorSearchRequest
- CacheInvalidationNotice
- ReplicationDataStream

Recommendations:

Create a phased network implementation:
- Phase 1: TCP/gRPC for development and testing
- Phase 2: RDMA integration with fallback to TCP
Define protobuf schemas in heliosdb-network/proto/ directory
Implement protocol auto-negotiation (RDMA vs TCP)
Add latency benchmarking requirements (target: <10μs for RDMA operations)

Priority: HIGH

1.3 Metadata Service & Raft Consensus

Status: NOT IMPLEMENTED Specification Requirement: Raft-based consensus using etcd/raft library, 3-5 node cluster.

Findings:

Workspace declares raft = "0.7" dependency
Declares etcd-client = "0.14" but specification calls for etcd-io/raft library, not client
No implementation in heliosdb-metadata

Critical Issues:

Dependency Confusion: The spec requires using the etcd-io/raft library (the consensus algorithm), but Cargo.toml includes etcd-client (for connecting to etcd servers). These serve different purposes.
Missing Components:
- Raft log storage layer (should use RocksDB)
- gRPC transport for Raft messages
- Snapshot mechanism
- Leader election logic

Recommendations:

Correct dependencies:

raft = "0.7"              # Keep this - correct Raft implementation
# REMOVE etcd-client       # Not needed - we're building our own metadata service
prost = "0.13"            # For Raft message serialization

Create initial Raft integration in heliosdb-metadata:
- Implement Storage trait from raft crate
- Build network transport layer with gRPC
- Define metadata state machine operations
Design metadata schema:
- Shard topology table
- Schema definitions
- Node health status
- Configuration parameters

Priority: CRITICAL

2. Data Organization & Storage Review

2.1 LSM-Tree Storage Engine

Status: NOT IMPLEMENTED Specification Requirement: Log-Structured Merge-tree with configurable compaction strategies.

Findings:

Workspace declares rocksdb = "0.22" dependency
No implementation in heliosdb-storage
No evidence of custom LSM implementation plans

Architectural Decision Required:

Option A: Use RocksDB Directly

Pros: Battle-tested, high performance, rich feature set
Cons: Less control over internal behavior, C++ dependency, harder to customize for HeliosDB-specific features

Option B: Custom LSM Implementation

Pros: Full control, optimized for HeliosDB workloads, pure Rust
Cons: Significant development time, potential for bugs, need to prove performance parity

Recommendation:

Phase 1: Use RocksDB directly with custom column families
Phase 2: Evaluate performance bottlenecks
Phase 3: Consider custom LSM only if RocksDB limitations are proven

Specific Implementation Guidance:

Use RocksDB column families to separate:
- Row data (default CF)
- Vector data (TOAST CF)
- Index data (separate CFs per index type)
- Tombstones (tracked separately for gc_grace_seconds)

Priority: HIGH

2.2 Sharding & Consistent Hashing

Status: NOT IMPLEMENTED Specification Requirement: System-managed consistent hashing with user-defined sharding key.

Findings:

No hash ring implementation
No shard assignment logic
No data migration framework

Critical Issues:

Hash Function Not Specified: Must choose between:
- Jump Consistent Hash (Google) - fast, minimal memory
- Consistent Hashing with Virtual Nodes - better distribution
- Rendezvous Hashing - stateless, simple
Shard Rebalancing Not Designed: When nodes are added/removed, how is data moved?

Recommendations:

Implement Jump Consistent Hash as default (fastest, proven)

Create heliosdb-common/src/hash.rs with:

pub fn compute_shard_id(key: &[u8], num_shards: u64) -> u64;
pub fn get_shard_range(shard_id: u64, total_shards: u64) -> (u64, u64);

Design shard migration protocol in HIDB
Implement backpressure during rebalancing

Priority: HIGH

2.3 Hybrid Columnar Compression (HCC)

Status: NOT IMPLEMENTED Specification Requirement: Compression Units with columnar layout, LZ4/ZSTD compression.

Findings:

Workspace declares lz4 = "1.28" and zstd = "0.13"
No HCC implementation
No data format specification

Concerns:

Format Stability: HCC format needs versioning from day one
DML Performance: Spec correctly identifies HCC as problematic for updates - need clear guidance on when to use
Memory Amplification: Decompression buffers could be significant

Recommendations:

DO NOT implement HCC in Phase 1 - focus on core functionality first
When implementing:
- Create separate storage engine variant (not mixed with LSM)
- Use separate RocksDB column family
- Implement background migration from row format to HCC

Define clear table-level storage directives:

CREATE TABLE logs (...) WITH (
  storage_format = 'HCC',
  compression_mode = 'WAREHOUSE_OPTIMIZED'
);

Priority: LOW (defer to Phase 2)

3. Vector Database Integration Review

3.1 VECTOR Data Type

Status: NOT IMPLEMENTED Specification Requirement: VECTOR(n) data type with TOAST-like storage.

Findings:

Workspace declares hnsw = "0.11" for vector indexing
No type system implementation
No TOAST implementation

Critical Issues:

Type System Not Defined: No core type system exists yet to add VECTOR to
HNSW Crate Limitations: Version 0.11 may not support filtered search
No IVF Implementation: Spec requires both HNSW and IVF, but only HNSW dependency exists

Recommendations:

Create type system first in heliosdb-common/src/types.rs:

pub enum DataType {
    Int64,
    Float64,
    Varchar(u32),
    Vector(u32),  // u32 = dimensions
    // ... other types
}

For vector indexing, consider using more mature libraries:
- faiss-rs (bindings to Meta’s FAISS) - supports both HNSW and IVF
- hnswlib-rs (bindings to hnswlib) - more mature than hnsw crate
- Custom implementation only if necessary
Implement TOAST storage strategy:
- Store small vectors (<2KB) inline
- Store large vectors in separate key-value space
- Use reference counting for deduplication

Priority: MEDIUM

3.2 Filtered ANN Search

Status: NOT IMPLEMENTED Specification Requirement: Filter-aware HNSW traversal with bitmap allow-lists.

Findings:

Extremely complex feature requiring tight integration
No prototype or proof-of-concept exists
Algorithm described in spec is research-grade

Critical Issues:

Complexity Underestimated: This is a PhD-level implementation challenge
Performance Validation Required: No benchmarks exist to prove this approach works
Alternative Approaches Not Considered

Recommendations:

Phase 1: Implement post-filtering (retrieve top-K, then filter) - simple but works
Phase 2: Implement pre-filtering with HNSW search on subset - moderate complexity
Phase 3: Implement joint-filtering with multi-hop traversal - only if Phase 2 proves insufficient

Prototype filtered search performance BEFORE committing to architecture:

Test scenarios:
- 1M vectors, 10% pass filter - measure recall and latency
- 1M vectors, 1% pass filter - does algorithm still work?
- 1M vectors, 0.1% pass filter - compare to brute force

Consider using Weaviate’s open-source filtered HNSW as reference

Priority: MEDIUM (but needs early prototyping)

4. Protocol Compatibility Review

4.1 Multi-Protocol Support

Status: NOT IMPLEMENTED Specification Requirement: Support PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, Databricks, Pinecone protocols.

Findings:

No protocol handlers exist
No protocol detection logic
Extremely ambitious scope for initial release

Critical Concerns:

Scope Creep Risk: Supporting 8 different protocols is a MASSIVE undertaking
Resource Allocation: This could consume 80% of development effort
Compliance Testing: Each protocol needs comprehensive test suite

Security Issues:

Authentication Surface: 8 different auth mechanisms to secure
SQL Injection: 8 different parameter binding styles to validate
TLS Configuration: Different TLS requirements per protocol

Recommendations:

STRONGLY RECOMMEND PHASED APPROACH:

Phase 1 (MVP): PostgreSQL protocol ONLY

Rationale:
- Most Python libraries support it (psycopg2, asyncpg, SQLAlchemy)
- Clean wire protocol specification
- Good tooling for testing
Target: Gold-level compliance

Phase 1.5: Add MySQL protocol

Rationale: Second most common in Python ecosystem
Target: Gold-level compliance

Phase 2: Add HTTP-based protocols (Snowflake, Databricks, Pinecone)

Rationale: Simpler than binary protocols, REST-based
Target: Silver-level compliance

Phase 3: Add enterprise protocols (SQL Server, DB2, Oracle)

Rationale: Only if customer demand exists
Target: Bronze-level compliance

Specific Implementation Plan:

Create heliosdb-compute/src/protocol/ module structure:

protocol/
├── mod.rs          # Protocol detection & routing
├── postgres/       # PostgreSQL handler
│   ├── mod.rs
│   ├── auth.rs     # SCRAM-SHA-256
│   ├── codec.rs    # Wire format
│   └── handler.rs  # Query execution
├── mysql/          # Phase 1.5
└── http/           # Phase 2

Implement protocol router from 02_PROTOCOL_ROUTER_PSEUDOCODE.md:
- Use bytes::Buf for efficient peeking
- Implement TLS negotiation first
- Use ALPN for fast-path detection

Create normalized internal query representation:

pub struct QueryContext {
    pub user: String,
    pub database: String,
    pub query: LogicalPlan,
    pub parameters: Vec<Value>,
}

Priority: CRITICAL (but reduce scope dramatically)

5. Code Quality & Safety Review

5.1 Error Handling

Status: BASIC IMPLEMENTATION EXISTS Review: /home/claude/DMD/heliosdb-common/src/error.rs

Findings: Good use of thiserror for ergonomic error types Proper error variants defined ⚠ Missing context information in errors ❌ No error codes for protocol compatibility

Issues:

Error messages lack context (which shard? which key? which node?)
No error code system for mapping to SQL error codes
No distinction between retryable and non-retryable errors

Recommendations:

#[derive(Error, Debug)]
pub enum HeliosError {
    #[error("Storage error on shard {shard_id}: {message}")]
    Storage { shard_id: u64, message: String },

    #[error("Network error communicating with {node}: {message}")]
    Network { node: String, message: String },

    // Add error codes for protocol mapping
    #[error("SQL error {code}: {message}")]
    Sql { code: SqlErrorCode, message: String },
}

pub enum SqlErrorCode {
    ConnectionFailure,
    SyntaxError,
    ConstraintViolation,
    // ... map to PostgreSQL error codes
}

impl HeliosError {
    pub fn is_retryable(&self) -> bool {
        match self {
            HeliosError::Network { .. } => true,
            HeliosError::Transaction(..) => true,
            _ => false,
        }
    }
}

Priority: MEDIUM

5.2 Concurrency & Thread Safety

Status: NOT IMPLEMENTED

Findings:

Workspace declares crossbeam = "0.8" and parking_lot = "0.12"
No concurrent data structures implemented yet
No locking strategy defined

Concerns:

Lock Granularity Not Defined: Will cause contention issues later
Deadlock Prevention: No strategy documented
Async vs Sync: Using Tokio but also crossbeam - need clear boundaries

Recommendations:

Define clear async/sync boundaries:
- Network I/O: async (Tokio)
- Storage I/O: sync or async based on RocksDB usage
- Compute operations: use Rayon for parallelism
Use lock-free data structures where possible:
- Crossbeam channels for message passing
- Arc only when necessary
- Prefer message passing over shared state

Document locking order to prevent deadlocks:

Lock Order Hierarchy:
1. Metadata Service state
2. Shard topology map
3. Individual shard locks
4. Transaction locks

Priority: HIGH

5.3 Memory Safety & Rust Patterns

Status: CANNOT ASSESS (no code yet)

Preemptive Guidance:

Avoid unsafe unless absolutely necessary (RDMA operations only)
Use #![forbid(unsafe_code)] in all crates except heliosdb-network
All unsafe code requires:
- Detailed safety comments
- Mandatory code review
- Comprehensive testing

Prefer owned types over references in async code:

// BAD - lifetime issues
async fn process_query<'a>(query: &'a Query) -> Result<()>

// GOOD - owned data
async fn process_query(query: Query) -> Result<()>

Priority: HIGH (enforce early)

6. Testing & Quality Assurance

6.1 Test Coverage Requirements

Status: NO TESTS EXIST

Specification Gap: Design documents don’t mention testing strategy.

Required Test Categories:

Unit Tests: (target 80% coverage)
- All public APIs
- Error handling paths
- Edge cases (empty data, max values)
Integration Tests:
- Compute-storage interaction
- Network protocol compliance
- Consensus behavior (Raft)
Protocol Compliance Tests: (from 01_PROTOCOL_TEST_MATRIX.md)
- PostgreSQL: psycopg2, asyncpg, SQLAlchemy
- MySQL: mysql-connector, PyMySQL
- Must pass before merge
Performance Tests:
- Latency benchmarks (P50, P99, P999)
- Throughput tests (inserts/sec, queries/sec)
- Scalability tests (1 node, 3 nodes, 10 nodes)
Chaos Tests:
- Network partitions
- Node failures
- Concurrent writes to same shard

Recommendations:

Set up CI/CD with GitHub Actions before writing code
Create tests/ directory with protocol compliance tests first
Use cargo-tarpaulin for coverage reporting
Gate merges on test passage

Priority: CRITICAL

7. Security Review

7.1 Authentication & Authorization

Status: NOT IMPLEMENTED

Specification: Multiple auth mechanisms per protocol (SCRAM, password, OAuth, API keys)

Security Concerns:

Credential Storage: How are passwords hashed? (recommend Argon2)
Session Management: Token generation, expiration, revocation?
TLS Configuration: What cipher suites? Certificate management?
Authorization Model: Role-based access control not specified

Recommendations:

Use industry-standard libraries:
- argon2 for password hashing
- rustls for TLS (pure Rust, safer than OpenSSL)
- jsonwebtoken for JWT handling
Implement principle of least privilege:
- Default: no permissions
- Explicit grants required
- Row-level security for multi-tenant use
Security testing requirements:
- Penetration testing before production
- Audit all SQL injection vectors
- Test TLS configuration with ssllabs

Priority: HIGH

7.2 Data Security

Status: NOT SPECIFIED

Missing from Specification:

Encryption at rest
Encryption in transit (beyond TLS)
Key management
Audit logging

Recommendations:

Add to specification:
- Encryption at rest using RocksDB encryption feature
- RDMA encryption (if available, or fallback to TLS)
- Key rotation policy
Implement audit logging:
- All DDL operations
- Authentication attempts
- Data access patterns (for compliance)

Priority: MEDIUM (but required for enterprise use)

8. Observability & Operations

8.1 Logging & Metrics

Status: BASIC SETUP Workspace declares: tracing = "0.1" and tracing-subscriber = "0.3"

Missing:

Metrics collection (Prometheus)
Distributed tracing
Health check endpoints

Recommendations:

Add dependencies:

metrics = "0.21"
metrics-exporter-prometheus = "0.12"
opentelemetry = "0.20"

Instrument critical paths:
- Query latency (histogram)
- Network operation latency (histogram)
- Error rates (counter)
- Active connections (gauge)
- Compaction progress (gauge)

Implement health check endpoints:

GET /health/liveness  -> 200 if process is alive
GET /health/readiness -> 200 if ready to serve traffic
GET /metrics          -> Prometheus metrics

Priority: MEDIUM

9. Documentation Gaps

9.1 Missing Documentation

API documentation (rustdoc)
Deployment guide
Performance tuning guide
Troubleshooting guide
Architecture decision records (ADRs)

Recommendations:

Start ADR practice now:
- docs/adr/001-use-rocksdb-for-storage.md
- docs/adr/002-postgres-protocol-first.md
Require rustdoc on all public APIs:
```
#![warn(missing_docs)]
```

Priority: MEDIUM

10. Critical Path & Prioritization

Recommended Implementation Order:

Phase 1: Core Foundation (Months 1-3)

Type system and data model (heliosdb-common)
LSM storage engine with RocksDB (heliosdb-storage)
Basic network layer with gRPC/TCP (heliosdb-network)
Metadata service with Raft (heliosdb-metadata)
PostgreSQL protocol handler (heliosdb-compute)
Basic query execution (SELECT, INSERT, UPDATE, DELETE)

Success Criteria: Can run Python app with psycopg2, basic CRUD works

Phase 2: Distribution (Months 4-6)

Sharding with consistent hashing
Shard replication (primary + mirror)
Witness-based failover
Cache invalidation mechanism
Query distribution & aggregation

Success Criteria: 3-node cluster, fault tolerance, distributed queries

Phase 3: Advanced Features (Months 7-12)

Predicate pushdown engine
Vector data type and storage
HNSW indexing
Filtered vector search (post-filtering only)
MySQL protocol handler
HTTP API gateway

Success Criteria: Vector search works, multi-protocol support

Phase 4: Performance & Polish (Months 13+)

RDMA network layer
HCC compression
Online aggregation engine
Additional protocol handlers
Production hardening

11. Risk Assessment

Critical Risks:

Scope Overreach (CRITICAL)
- Risk: Attempting to build everything at once
- Impact: Project never reaches production
- Mitigation: Follow phased approach, MVP first
RDMA Complexity (HIGH)
- Risk: RDMA is difficult to implement and test
- Impact: Network layer becomes bottleneck
- Mitigation: Start with TCP, add RDMA later
Protocol Compatibility Burden (HIGH)
- Risk: Supporting 8 protocols is too much work
- Impact: Poor implementation quality across all protocols
- Mitigation: Focus on PostgreSQL only in Phase 1
Consensus Implementation (HIGH)
- Risk: Raft is complex, easy to get wrong
- Impact: Data loss or split-brain scenarios
- Mitigation: Use etcd/raft library, extensive testing
Vector Search Performance (MEDIUM)
- Risk: Filtered ANN search may not meet performance targets
- Impact: Feature doesn’t work as advertised
- Mitigation: Prototype early, have fallback strategy

12. Compliance Checklist

Architecture Compliance:

Protocol Compliance:

PostgreSQL protocol (Gold level)
MySQL protocol (Gold level)
Snowflake HTTP API (Silver level)
Databricks SQL API (Silver level)
Pinecone API (Silver level)
SQL Server TDS (Bronze level)
DB2 DRDA (Bronze level)
Oracle Net (Bronze level)

Quality Standards:

13. Action Items

Immediate Actions (This Week):

Architect meeting to review and approve this review
Reduce protocol scope to PostgreSQL only for Phase 1
Create project roadmap based on phased approach
Set up CI/CD pipeline with basic tests
Create ADR template and first ADRs

Short-term Actions (This Month):

Implement basic type system
Integrate RocksDB storage engine
Build simple metadata service (single node first)
Create PostgreSQL protocol parser
Write first integration test

Long-term Actions (Next Quarter):

14. Conclusion

Summary Assessment:

The HeliosDB design specifications are technically sound and well-researched. The architecture draws from proven systems (Exadata, Snowflake, TiDB, ScyllaDB) and makes reasonable trade-offs.

However, the implementation scope is extremely ambitious for a new project. The specifications describe a system that would take a large team (20+ engineers) several years to build to production quality.

Key Recommendations:

Reduce scope dramatically for MVP - Focus on core HTAP capabilities with PostgreSQL protocol only
Defer advanced features - RDMA, HCC, filtered vector search, multiple protocols can come in later phases
Validate assumptions early - Build prototypes for risky components (Raft integration, vector search) before committing to architecture
Prioritize correctness over performance - Get it working correctly first, optimize later
Invest in testing infrastructure - This is a distributed database; bugs will be catastrophic

Confidence Level: MEDIUM

The architecture is solid, but execution risk is high due to scope and complexity. Success depends on disciplined prioritization and phased delivery.

Next Review: Scheduled after Phase 1 core components are implemented (estimated 3 months)

Reviewer Signature: Reviewer Agent (HeliosDB Hive Mind) Review Date: 2025-10-10