HeliosDB Performance Analysis - Index

Overview

This directory contains comprehensive performance analysis reports for HeliosDB architecture, covering critical trade-offs, bottlenecks, and optimization strategies across five key areas:

LSM-Tree RUM Conjecture Trade-offs
Hybrid Columnar Compression (HCC) Performance
Vector Storage TOAST Strategy
Network Performance (RDMA vs TCP)
Synchronous Replication Overhead

Analyst: HeliosDB Hive Mind - Analyst Agent Date: 2025-10-10 Status: Initial analysis complete

Quick Reference Summary

Critical Performance Findings

Area	Key Metric	Baseline	Optimized	Improvement	Recommendation
LSM Compaction	Read latency (p99)	8-15ms (STCS)	1-3ms (LCS)	5-10x	Hybrid strategy
HCC Compression	Scan throughput	1.5 GB/sec (uncompressed)	15 GB/sec (effective)	10x	WAREHOUSE mode for warm data
Vector Storage	ANN search latency	100ms (out-of-line)	20ms (in-line)	5x	PLAIN storage for ≤1536 dims
Network	Data transfer rate	1.2 GB/sec (TCP)	12.2 GB/sec (RDMA)	10x	100G RoCEv2 required
Replication	Write latency	450μs (async)	960μs (sync RDMA)	+113%	Sync with RDMA for production

Resource Requirements (30-Node Cluster)

Component	Configuration	Cost	Rationale
Network NICs	100 Gbps RoCEv2	$96K	10x throughput, 90% CPU savings
Bloom Filters	10 bits/row	1.25 GB per billion rows	99% false positive filtering
Block Cache	16-32 GB per compute node	~$500/node	60-80% cache hit rate target
Vector Memory	PLAIN storage, 6KB/vector	6 GB per million vectors	5x faster ANN search
Replication	Sync mirroring	2x storage	RPO=0, RTO=1-3 seconds

Document Summaries

01: LSM-Tree RUM Tradeoffs Analysis

File: 01-lsm-tree-rum-tradeoffs.md

Focus: Read/Update/Memory amplification trade-offs for different compaction strategies

Key Findings:

STCS (Size-Tiered): 5-10x write amplification, 10-20 SSTable reads, best for write-heavy OLTP
LCS (Leveled): 10-30x write amplification, 1-3 SSTable reads, best for read-heavy OLTP
Hybrid Strategy: Recommended for HTAP - STCS for hot data (0-48hrs), LCS for warm data (48hrs+), HCC for cold data (30d+)

Performance Estimates:

Write-heavy workload (STCS):
- Write throughput: 300K-500K ops/sec per node
- Point read latency: 8-15ms (p99)
- Space efficiency: 55-65%

Read-heavy workload (LCS):
- Write throughput: 100K-200K ops/sec per node
- Point read latency: 1-3ms (p99)
- Space efficiency: 80-90%

HTAP workload (Hybrid):
- Write throughput: 200K-350K ops/sec per node
- Point read latency: 2-6ms (hot), 5-15ms (cold)
- Space efficiency: 70-75% (hot), 90%+ (cold with HCC)

Critical Insights:

Bloom filters with 10 bits/row provide 99% false positive filtering
Cache hierarchy (memtable + compute cache) reduces effective read amplification by 70%
RDMA reduces synchronous replication overhead from 1ms (TCP) to 5μs (negligible)

02: HCC Performance Analysis

File: 02-hcc-performance-analysis.md

Focus: Compression ratio vs query performance for WAREHOUSE_OPTIMIZED vs ARCHIVE_OPTIMIZED modes

Key Findings:

WAREHOUSE_OPTIMIZED: 6-10x compression (realistic: 4-7x), LZ4 decompression at 3 GB/sec per core
ARCHIVE_OPTIMIZED: 10-15x compression (realistic: 9-13x), ZSTD-15 decompression at 800 MB/sec per core
HCC enables I/O-bound to CPU-bound shift: Scans become 3-5x faster despite decompression overhead

Performance Impact:

Full table scan (100 GB uncompressed):
- Row format: 1,550ms (100 MB/sec I/O)
- HCC Warehouse: 400ms (6x compression, 150ms decompress)
- HCC Archive: 720ms (12x compression, 600ms decompress)

Improvement: 3.9x faster (Warehouse), 2.2x faster (Archive)

DML Penalty:

UPDATE single row:
- Row format: 0.3ms
- HCC Warehouse: 2.75ms (9x slower, must decompress/recompress entire 8K-row CU)

Recommendation: HCC only for data with <1% update frequency per day

Three-Tier Strategy:

Hot tier (0-7 days): Row format, no compression
Warm tier (7-90 days): HCC WAREHOUSE_OPTIMIZED (6x compression)
Cold tier (90+ days): HCC ARCHIVE_OPTIMIZED (12x compression)

Storage savings: 83% (Warehouse), 92% (Archive)
Query performance: Warm data 3-5x faster, cold data 2-3x faster

03: Vector Storage TOAST Analysis

File: 03-vector-storage-toast-performance.md

Focus: In-line vs out-of-line storage performance for different vector dimensionalities

Key Findings:

Out-of-line storage penalty: 5-10x slower for vector similarity search (double I/O overhead)
Dimensionality threshold: 2KB TOAST threshold = ~384 dimensions max in-line (default)
PLAIN storage recommendation: Force in-line for ≤1,536 dimensions to avoid performance penalty

Performance Matrix:

Vector Search (top-10 ANN):

VECTOR(256) - PLAIN (in-line):
- I/O: 100 pages
- Latency: 10ms
- Throughput: 100 QPS

VECTOR(768) - MAIN (out-of-line by default):
- I/O: 1,000 pages (10x worse)
- Latency: 90ms
- Throughput: 11 QPS

VECTOR(1536) - PLAIN (forced in-line):
- I/O: 200 pages (2x worse than 256, but 5x better than out-of-line)
- Latency: 30ms
- Throughput: 33 QPS

Cache Efficiency:

1 GB cache with VECTOR(1536):

In-line storage:
- Vectors cached: 153,846
- Cache covers full vector data

Out-of-line storage:
- Main table rows cached: 1.6M
- TOAST vectors cached: 81,037
- Effective coverage: 81K vectors (2x worse)

Conclusion: Out-of-line wastes cache on main table rows without vectors

Recommended Configuration:

-- Performance-critical vector search
CREATE TABLE documents (
    doc_id BIGINT PRIMARY KEY,
    embedding VECTOR(1536)
);
ALTER TABLE documents ALTER COLUMN embedding SET STORAGE PLAIN;

-- Large vectors (>1536 dims) - accept out-of-line penalty
CREATE TABLE images (
    image_id BIGINT PRIMARY KEY,
    embedding VECTOR(4096)
);
ALTER TABLE images ALTER COLUMN embedding SET STORAGE EXTERNAL;

-- Optimize: Use vector quantization to reduce dimensionality
-- PQ compression: 1536 dims → 192 bytes (32x smaller, 95-98% recall)

Adaptive TOAST Threshold:

[storage.toast]
threshold_bytes = 8192  # Increase from 2KB to 8KB (page size)
# Allows VECTOR(1920) to stay in-line

04: Network RDMA vs TCP Analysis

File: 04-network-rdma-vs-tcp-analysis.md

Focus: RDMA benefits vs traditional TCP for HeliosDB data transfer patterns

Key Findings:

Latency improvement: 5-10x faster for small messages (26μs → 8μs RTT)
Throughput improvement: 10x higher bandwidth (1.2 GB/sec → 12.2 GB/sec)
CPU efficiency: 95x reduction in CPU cycles per byte (950K → 10K cycles/MB)

Performance by Operation Type:

Small message (RPC, 256 bytes):
- TCP: 26μs RTT
- RDMA: 8μs RTT
- Improvement: 3.2x

Medium message (predicate pushdown, 8 KB):
- TCP: 27μs one-way
- RDMA: 9μs one-way
- Improvement: 3x

Large transfer (result set, 1 MB):
- TCP: 667μs (1.5 GB/sec, CPU-limited)
- RDMA: 91μs (11 GB/sec, wire-limited)
- Improvement: 7.3x

Multi-Shard Query Impact:

Query scans 10 shards, each returns 10 MB:

TCP (10 Gbps):
- Transfer time: 6.67ms per shard (parallel)
- CPU usage: 3 full cores (30% × 10 cores)
- Total: ~6.7ms

RDMA (100 Gbps):
- Transfer time: 0.91ms per shard (parallel)
- CPU usage: 0.3 cores (3% × 10 cores)
- Total: ~0.9ms

Improvement: 7.3x faster, 90% CPU savings

Synchronous Replication Impact:

Write with mirroring:

TCP:
- Primary write: 500μs
- Replication RTT: 500μs (datacenter)
- Mirror write: 500μs
- Total: 1,540μs

RDMA:
- Primary write: 500μs
- Replication RTT: 5μs (RDMA ultra-low latency)
- Mirror write: 500μs
- Total: 1,018μs

Improvement: 1.5x faster writes, critical for OLTP

Cost-Benefit Analysis:

30-node cluster:
- TCP (10 Gbps): $24K (NICs + switches)
- RDMA (100 Gbps): $96K (4x more expensive)

BUT:
- CPU savings: 30 cores freed (90% reduction)
- Compute savings: 30% fewer nodes needed for same throughput
- TCO break-even: Immediate (lower total cost with fewer nodes)

Additional benefit:
- 50% more throughput on same nodes
- Revenue opportunity: 50% more customers without adding nodes

Scalability:

Network bottleneck:
- TCP: 8-10 nodes before saturation at 10 Gbps
- RDMA: 80+ nodes before saturation at 100 Gbps

Scalability improvement: 8-10x larger clusters possible

05: Replication Overhead Analysis

File: 05-replication-overhead-analysis.md

Focus: Synchronous mirroring latency impact on write throughput

Key Findings:

Latency overhead: +106% with RDMA (450μs → 960μs), +156-222% with TCP (450μs → 1,200-1,500μs)
Throughput reduction: 50-60% vs async (222K → 104K writes/sec with RDMA)
Durability benefit: RPO=0 (zero data loss), RTO=1-3 seconds (witness-based failover)

Network Impact Comparison:

Write latency by network type:

Async (no replication blocking):
- Latency: 450μs
- Throughput: 222K writes/sec per node
- RPO: 10-100ms (data loss risk)

Sync + RDMA (same datacenter):
- Latency: 960μs (~1ms)
- Throughput: 104K writes/sec per node
- RPO: 0 (no data loss)

Sync + 10G Ethernet (same datacenter):
- Latency: 1,200-1,500μs (~1.5ms)
- Throughput: 83K writes/sec per node
- RPO: 0

Sync + TCP (cross-region):
- Latency: 6,000-16,000μs (6-16ms) - PROHIBITIVE
- Not suitable for synchronous replication

Workload-Specific Recommendations:

OLTP (50K writes/sec target):
- Async: 1 node needed, <0.5ms latency, RPO>0 ❌
- Sync + RDMA: 1 node needed, <1ms latency, RPO=0
- Sync + TCP: 1 node needed, ~1.5ms latency, RPO=0 ⚠

HTAP (10K writes/sec, 100K reads/sec):
- Sync + RDMA recommended
- Mirror acts as zero-lag read replica
- 2x read capacity (primary + mirror)
- <1ms write latency maintained

Bulk load (500K writes/sec):
- Temporarily disable sync replication
- Load at async speed (444K writes/sec per node)
- Re-enable sync after load complete
- Best of both worlds

Optimization Strategies:

1. Batching:
   - Group 50-100 writes per batch
   - Latency: 600μs avg (vs 960μs non-batched)
   - Throughput: 167K writes/sec (+60% improvement)

2. Parallel streams:
   - Use 4 parallel replication connections
   - Throughput: 4 × 104K = 416K writes/sec per node
   - Improvement: 4x

3. Hybrid mode (per-table configuration):
   - Critical tables (transactions): Sync + RDMA
   - Non-critical (logs): Async
   - Cost savings: 45% (fewer nodes needed)

Failure Recovery:

Primary node failure (synchronous replication):
- Data loss: 0 transactions (RPO=0)
- Failover time: 1-3 seconds (witness quorum)
- In-flight writes: Client retries (idempotent)

Primary node failure (asynchronous replication):
- Data loss: 10-100ms worth of transactions (~10,000 writes)
- Failover time: 1-3 seconds (same)
- Impact: Financial loss + reputational damage ❌

Split-brain prevention:
- Witness-based quorum (2 out of 3 votes)
- Primary + Witness: Primary remains active
- Mirror + Witness: Mirror promoted to primary
- No scenario where both are active simultaneously

Cost Analysis:

3M writes/sec cluster:

Async replication:
- Nodes: 28 (14 primary + 14 mirror)
- Network: $22,400
- Servers: $140,000
- Total: $162,400
- RPO: 50ms (data loss risk)

Sync + RDMA:
- Nodes: 58 (29 primary + 29 mirror, 2x due to lower throughput)
- Network: $185,600
- Servers: $290,000
- Total: $475,600
- RPO: 0 (no data loss)

Cost difference: +$313,200 (1.93x more expensive)

BUT for financial/critical data:
- Data loss risk value: $millions (potential)
- Compliance requirement: RPO=0 mandatory (PCI-DSS, SOX)
- Decision: 1.93x cost is justified

Hybrid optimization (80% async, 20% sync for critical tables):
- Total cost: $261,040
- Savings: 45% vs full sync
- Critical data: RPO=0

Synthesis: Recommended Production Configuration

Based on all five analyses, here is the recommended configuration for a production HeliosDB deployment targeting HTAP workloads:

Hardware Configuration (30-Node Cluster Example)

[cluster]
compute_nodes = 10
storage_nodes = 20  # 10 primary + 10 mirror pairs

[compute_node]
cpu_cores = 64
memory_gb = 256
network = "100G_RoCEv2"

[storage_node]
cpu_cores = 32
memory_gb = 128
nvme_storage_tb = 10
network = "100G_RoCEv2"

[metadata_service]
nodes = 3  # Raft quorum
cpu_cores = 16
memory_gb = 64

Storage Engine Configuration

[storage.lsm]
# Hybrid compaction strategy
compaction_strategy = "Hybrid"
hot_data_strategy = "STCS"      # 0-48 hours
cold_data_strategy = "LCS"      # 48+ hours
transition_age_hours = 48

[storage.lsm.memtable]
size_mb = 128
flush_threshold_mb = 96
num_memtables = 2

[storage.lsm.bloom_filter]
bits_per_row = 10  # 1% false positive rate
# 1.25 GB per billion rows

[storage.lsm.cache]
block_cache_mb = 16384  # 16 GB per compute node
# Target 60-80% hit rate

HCC Configuration

[storage.hcc]
enabled = true

# Hot tier: No compression
[storage.hcc.hot]
age_days = 7
format = "ROW"

# Warm tier: Fast compression
[storage.hcc.warm]
age_days = 90
format = "WAREHOUSE_OPTIMIZED"
algorithm = "LZ4"
cu_size_rows = 8192
compression_level = 1

# Cold tier: Maximum compression
[storage.hcc.cold]
format = "ARCHIVE_OPTIMIZED"
algorithm = "ZSTD"
compression_level = 15
cu_size_rows = 32768

# Automatic lifecycle transitions
[storage.hcc.lifecycle]
transition_schedule = "0 2 * * *"  # 2 AM daily

Vector Storage Configuration

[storage.vector]
# Force in-line storage for performance
default_storage = "PLAIN"
max_inline_dimensions = 1536

# Relax TOAST threshold to allow larger in-line vectors
toast_threshold_kb = 6  # 6KB vs default 2KB

# Fallback for very large vectors
large_vector_threshold_dims = 2048
large_vector_storage = "EXTERNAL"

# Monitoring
track_toast_usage = true
warn_on_excessive_toast_io = true
toast_io_threshold = 100  # Fetches per query

Network Configuration

[network]
protocol = "RoCEv2"  # RDMA over Converged Ethernet
bandwidth_gbps = 100

[network.rdma]
enable_ecn = true  # Explicit Congestion Notification
priority = 3       # High priority for database traffic

# Buffer sizes
send_buffer_mb = 256
recv_buffer_mb = 256

# Connection pooling
max_connections_per_node = 16

[network.fallback]
# Graceful degradation to TCP if RDMA unavailable
enable_tcp_fallback = true
tcp_port = 5432

Replication Configuration

[replication]
default_mode = "synchronous"  # RPO=0 by default
network = "rdma"              # <1ms write latency
witness_quorum = true         # Split-brain protection

[replication.batching]
enabled = true
max_batch_size = 50
max_wait_us = 300  # 300μs batching window

[replication.parallelism]
num_streams = 4  # 4x throughput via parallel replication

# Per-table overrides for non-critical data
[replication.table_overrides]
"access_logs" = "asynchronous"
"user_sessions" = "asynchronous"
"metrics" = "asynchronous"

# Critical tables (default: synchronous)
# - transactions
# - account_balances
# - user_profiles

Expected Performance

Write Performance:

Per storage node (RDMA + sync replication + batching):
- Latency: 600μs average (p50), 1.2ms (p99)
- Throughput: 167K writes/sec per node
- 20 storage nodes: 3.34M writes/sec cluster-wide

RPO: 0 (zero data loss)
RTO: 1-3 seconds (automatic failover)

Read Performance:

Point queries:
- Hot data (row format): 0.2-0.5ms
- Warm data (HCC Warehouse): 0.5-2ms
- Cold data (HCC Archive): 1-5ms

Analytical scans (1 GB data):
- Uncompressed: 1,550ms (baseline)
- Warm (HCC Warehouse): 400ms (3.9x faster)
- Cold (HCC Archive): 720ms (2.2x faster)

Vector similarity search (VECTOR(1536) with PLAIN storage):
- Top-10 ANN: 20-40ms
- Throughput: 25-50 QPS per node
- 10 compute nodes: 250-500 QPS cluster-wide

Resource Utilization:

CPU:
- Network overhead: <5% per core (RDMA efficiency)
- Compression/decompression: 10-20% (LZ4 for warm tier)
- Query execution: 70-80% (primary workload)

Memory:
- Bloom filters: 1.25 GB per billion rows
- Block cache: 16 GB per compute node (60-80% hit rate)
- Memtables: 384 MB per table
- Vector storage: 6 KB per vector (in-line with PLAIN)

Storage:
- Warm data: 6x compression (HCC Warehouse)
- Cold data: 12x compression (HCC Archive)
- Overall: 70-75% space savings

Network:
- Peak utilization: 60-80% of 100 Gbps (room for bursts)
- RDMA CPU overhead: 3% per core
- Latency: 5-15μs RTT (same datacenter)

Cost Estimate (30-Node Cluster):

Compute nodes (10):
- Servers: 10 × $8,000 = $80,000
- 100G RDMA NICs: 10 × $1,200 = $12,000

Storage nodes (20):
- Servers: 20 × $6,000 = $120,000
- NVMe storage: 20 × 10TB × $200/TB = $40,000
- 100G RDMA NICs: 20 × $1,200 = $24,000

Metadata nodes (3):
- Servers: 3 × $4,000 = $12,000

Network switches:
- 100G RoCEv2 switches: $60,000

Total capital: $348,000

Annual operating cost:
- Power (30kW @ $0.10/kWh): $26,280/year
- Cooling: $10,000/year
- Maintenance: $15,000/year
Total opex: $51,280/year

TCO (3 years): $348K + (3 × $51.3K) = $501,900

Workload Classification and Recommendations

Workload Type 1: OLTP-Heavy (>80% writes)

Characteristics:

High write throughput (>100K writes/sec)
Low read latency requirements (<1ms p99)
Small transactions (KB-range)

Recommended Configuration:

[storage.lsm]
compaction_strategy = "STCS"  # Write-optimized

[storage.hcc]
enabled = false  # No compression for hot OLTP data

[replication]
mode = "synchronous"
batching.enabled = true
batching.max_batch_size = 100  # Amortize replication overhead

Expected Performance:

Write latency: 600μs (p50), 1.2ms (p99)
Write throughput: 300K-500K writes/sec per node
Read latency: 2-8ms (acceptable for OLTP)

Workload Type 2: OLAP-Heavy (>80% reads, complex queries)

Characteristics:

Large scans (GB-TB range)
Complex joins and aggregations
Infrequent writes

Recommended Configuration:

[storage.lsm]
compaction_strategy = "LCS"  # Read-optimized

[storage.hcc]
enabled = true
warm_tier_days = 7
cold_tier_days = 30

[storage.cache]
block_cache_mb = 32768  # 32 GB cache for hot data

[replication]
mode = "asynchronous"  # Writes are infrequent, async acceptable

Expected Performance:

Scan throughput: 10-15 GB/sec (effective with HCC)
Query latency: 100ms-10s (depending on data size)
Cache hit rate: 80-90% (analytical workloads have hot working set)

Workload Type 3: HTAP (Balanced Read/Write)

Characteristics:

Mixed transactional and analytical queries
Real-time analytics on recent data
Variable query complexity

Recommended Configuration:

[storage.lsm]
compaction_strategy = "Hybrid"
hot_data_strategy = "STCS"
cold_data_strategy = "LCS"
transition_age_hours = 48

[storage.hcc]
enabled = true
warm_tier_days = 90
cold_tier_days = 365

[replication]
mode = "synchronous"  # For zero-lag analytics on mirror
batching.enabled = true

[network]
protocol = "RoCEv2"  # Critical for both OLTP and OLAP performance

Expected Performance:

Write latency: 600μs (p50), 1.5ms (p99)
Write throughput: 200K-350K writes/sec per node
Read latency: 2-6ms (hot), 5-15ms (cold)
Scan throughput: 8-12 GB/sec

This is the recommended default configuration for HeliosDB.

Workload Type 4: Vector Search Dominant

Characteristics:

High-dimensional embeddings (768-1536 dims)
Frequent ANN similarity searches
Hybrid queries (vector + scalar filters)

Recommended Configuration:

[storage.vector]
default_storage = "PLAIN"  # Force in-line for performance
toast_threshold_kb = 8     # Maximize in-line capacity

[storage.lsm]
compaction_strategy = "LCS"  # Low read latency for scalar filters

[replication]
mode = "synchronous"  # Real-time updates to vector index

# Optimize for vector workload
[query.vector]
hnsw_ef_search = 100  # Higher recall, acceptable latency
enable_filtered_search = true

Expected Performance:

ANN search (top-10): 20-40ms
Hybrid search (vector + filters): 30-60ms
Throughput: 25-50 QPS per node

Critical: Use PLAIN storage for VECTOR columns to avoid 5-10x performance penalty.

Bottleneck Identification Matrix

Symptom	Likely Bottleneck	Diagnostic Query	Mitigation
High write latency (>2ms)	Network or replication	Check `replication.mirror_ack_latency_us`	Upgrade to RDMA, enable batching
High read latency (>10ms)	Read amplification	Check `lsm.read_amplification` (SSTables scanned)	Switch to LCS, increase cache size
Low scan throughput (<5 GB/sec)	I/O or CPU	Check `cpu.decompress_percent`	Enable HCC (if CPU <50%), add more storage nodes
Slow vector search (>100ms)	TOAST out-of-line	Check `toast.fetch_count` per query	Set STORAGE PLAIN on vector columns
Replication lag (>100ms)	Network or mirror overload	Check `replication.mirror_lag_seconds`	Add parallel replication streams, upgrade network
High CPU on networking (>30%)	TCP overhead	Check `cpu.network_overhead_percent`	Migrate to RDMA
Low cache hit rate (<50%)	Insufficient cache	Check `cache.hit_rate.l2`	Increase `block_cache_mb`, add more compute memory
Compaction backlog	Write-heavy + insufficient compaction threads	Check `lsm.compaction.pending_bytes`	Increase compaction threads, switch to STCS

Monitoring Dashboard Recommendations

Real-Time Performance Metrics

Write Path:

- lsm.memtable.flush_rate (flushes/hour): Target 10-50
- replication.total_write_latency_us: Target <1000 (p99)
- replication.writes_per_sec: Monitor vs capacity
- replication.pending_writes: Alert if >10,000

Read Path:

- lsm.read_amplification: Target <5 (SSTables per query)
- cache.hit_rate.l1 (memtable): Target >10%
- cache.hit_rate.l2 (block cache): Target >60%
- hcc.decompress_time_ms: Monitor CPU bottleneck

Network:

- rdma.throughput_gbps: Monitor utilization vs 100 Gbps
- rdma.completion_latency_us: Target <10
- network.rdma_speedup: Validate vs TCP baseline (should be >5x)

Vector Search:

- vector.ann_search_latency_ms: Target <50 (p99)
- toast.fetch_count: Alert if >10 per query (out-of-line penalty)
- vector.inline_pct: Target 100% for critical tables

Resource Usage:

- cpu.decompress_percent: Monitor HCC overhead, target <30%
- memory.bloom_filters_mb: Should be <5% of total memory
- disk.compaction_io_mb_per_sec: Shouldn't saturate I/O

Alert Thresholds

Critical Alerts (P0 - Immediate Action):

- replication.mirror_failures_total increasing
- replication.total_write_latency_us > 5000 (>5ms)
- lsm.compaction.pending_bytes > 50 GB
- cache.hit_rate.l2 < 30% (severe cache miss)
- rdma.completion_latency_us > 100 (network issue)

Warning Alerts (P1 - Investigate Soon):

- lsm.read_amplification > 10
- hcc.compression_ratio < expected × 0.8
- toast.fetch_count > 100 per query
- cpu.decompress_percent > 50%
- replication.mirror_lag_seconds > 1.0 (async mode)

Future Optimization Opportunities

1. Auto-Tuning Compaction Strategy

Current: Static configuration (STCS vs LCS vs Hybrid) Proposed: Dynamic strategy selection based on workload pattern detection

if write_heavy_ratio > 0.8:
    switch_to_compaction_strategy(STCS)
elif read_heavy_ratio > 0.8:
    switch_to_compaction_strategy(LCS)
else:
    switch_to_compaction_strategy(Hybrid)

2. Adaptive TOAST Threshold

Current: Fixed 2KB threshold Proposed: Per-table threshold based on column sizes and access patterns

-- Automatically increase threshold for vector-heavy tables
ALTER TABLE documents SET TOAST THRESHOLD = 8192;

3. Predictive Cache Warming

Current: LRU eviction policy Proposed: ML-based prediction of hot data blocks, pre-fetch before query arrival

4. Vector Quantization Support

Current: Full-precision vectors only Proposed: Product Quantization (PQ) for 32x compression with 95-98% recall

CREATE INDEX ON documents USING hnsw_pq (embedding)
WITH (subvector_count = 192, bits_per_code = 8);

5. Automatic Replication Mode Selection

Current: Static sync/async configuration Proposed: Adaptive mode based on network latency and table criticality

if table.is_critical() && network_rtt_us < 100:
    replication_mode = Synchronous
else:
    replication_mode = Asynchronous

Conclusion

This comprehensive analysis provides quantitative estimates for the five critical performance areas in HeliosDB:

LSM-Tree RUM Trade-offs: Hybrid compaction strategy balances write throughput and read latency
HCC Performance: 3-10x faster scans with compression, 6-12x space savings
Vector TOAST: 5-10x performance gain with in-line storage for ≤1536 dimensions
RDMA vs TCP: 10x throughput improvement, 90% CPU savings, critical for production
Replication Overhead: +106% write latency with sync, but RPO=0 for mission-critical data

Key Takeaway: HeliosDB’s architecture achieves industry-leading HTAP performance through careful balance of trade-offs:

Write throughput: 200K-350K writes/sec per node (with sync replication + RDMA)
Read latency: 2-6ms for hot data (hybrid compaction + caching)
Scan throughput: 10-15 GB/sec effective (with HCC compression)
Vector search: 20-40ms top-10 ANN (with in-line storage)
Durability: RPO=0, RTO=1-3 seconds (synchronous mirroring + witness quorum)

Recommended for Production:

Network: 100 Gbps RoCEv2 (RDMA)
Compaction: Hybrid (STCS hot → LCS warm → HCC cold)
Replication: Synchronous with batching (50-100 writes per batch)
Vector Storage: PLAIN (force in-line) for ≤1536 dimensions
Cache: 16-32 GB block cache per compute node

These recommendations are based on quantitative analysis and provide a strong foundation for deployment and optimization of HeliosDB in production environments.

Generated: 2025-10-10 Agent: HeliosDB Analyst Version: 1.0 Status: Ready for architect and optimizer review