HeliosDB Performance Tuning Guide v7.0
HeliosDB Performance Tuning Guide v7.0
Version: 7.0 (Phase 3B) Date: December 9, 2025 Audience: Database Administrators, DevOps Engineers, Performance Engineers
Table of Contents
- Introduction
- Performance Baseline
- System Configuration
- Storage Optimization
- Cache Tuning
- Query Optimization
- Multi-Tenancy Performance
- Network Optimization
- Monitoring & Profiling
- Troubleshooting Common Issues
- Production Best Practices
Introduction
This guide provides comprehensive performance tuning recommendations for HeliosDB v7.0 (Phase 3B). Following these guidelines can improve:
- Query latency by 40-60%
- Throughput by 50-100%
- Resource utilization by 30-50%
- System stability and predictability
Quick Wins Checklist
Before deep tuning, apply these quick optimizations:
- Enable compression (zstd level 3)
- Configure appropriate cache sizes (4-16 GB)
- Set write buffer size (256-512 MB)
- Enable batching (10ms window)
- Create proper indexes
- Configure connection pooling
- Enable query result caching
- Set appropriate resource quotas
Performance Baseline
Benchmark Your System
1. Single-Node Performance
# Run comprehensive benchmarkheliosdb benchmark --workload mixed --duration 300s
# Expected results (reference hardware):# - Read latency (p50): 2-5ms# - Read latency (p99): 10-25ms# - Write latency (p50): 3-8ms# - Write latency (p99): 15-35ms# - Throughput: 10,000-50,000 QPS# - Cache hit rate: 85-95%2. Multi-Node Performance
# Test cluster scalabilityheliosdb benchmark --workload distributed --nodes 4 --duration 300s
# Expected linear scalability:# - 2 nodes: 95-98% of theoretical (2x)# - 4 nodes: 93-97% of theoretical (4x)# - 8 nodes: 90-95% of theoretical (8x)Reference Hardware Specifications
Development/Testing:
- CPU: 4 cores @ 2.5 GHz
- RAM: 16 GB
- Storage: 500 GB SSD (NVMe)
- Network: 1 Gbps
- Expected QPS: 5,000-10,000
Production (Small):
- CPU: 8 cores @ 3.0 GHz
- RAM: 32 GB
- Storage: 1 TB NVMe SSD
- Network: 10 Gbps
- Expected QPS: 20,000-40,000
Production (Medium):
- CPU: 16 cores @ 3.5 GHz
- RAM: 64 GB
- Storage: 2 TB NVMe SSD
- Network: 10 Gbps
- Expected QPS: 50,000-100,000
Production (Large):
- CPU: 32 cores @ 3.5 GHz
- RAM: 128 GB
- Storage: 4 TB NVMe SSD
- Network: 25 Gbps
- Expected QPS: 100,000-200,000
System Configuration
Operating System Tuning
1. Linux Kernel Parameters
# Increase max open filesfs.file-max = 1000000
# TCP tuning for high throughputnet.core.rmem_max = 16777216net.core.wmem_max = 16777216net.ipv4.tcp_rmem = 4096 87380 16777216net.ipv4.tcp_wmem = 4096 65536 16777216net.ipv4.tcp_max_syn_backlog = 8192net.core.netdev_max_backlog = 5000
# Virtual memory tuningvm.swappiness = 10vm.dirty_ratio = 15vm.dirty_background_ratio = 5vm.dirty_writeback_centisecs = 100
# Hugepages (optional, for large deployments)vm.nr_hugepages = 1024
# Apply changessudo sysctl -p2. Filesystem Tuning
# Use XFS or ext4 with optimal mount options# XFS (recommended)/dev/sdb1 /data xfs noatime,nodiratime,nobarrier 0 0
# ext4 (alternative)/dev/sdb1 /data ext4 noatime,nodiratime,data=writeback 0 0
# Remountsudo mount -o remount /data3. I/O Scheduler
# Use deadline or noop for SSDsecho noop > /sys/block/sdb/queue/scheduler
# Verifycat /sys/block/sdb/queue/scheduler# Output: noop [deadline] cfq
# Make persistent# /etc/udev/rules.d/60-scheduler.rulesACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="noop"4. CPU Governor
# Use performance governor for predictable latencysudo cpupower frequency-set -g performance
# Verifycpupower frequency-infoHeliosDB Configuration
config.toml
[server]# Bind addresslisten_addr = "0.0.0.0:8080"
# Number of worker threads (2x CPU cores recommended)worker_threads = 16
# Connection limitsmax_connections = 1000connection_timeout_ms = 30000
[storage]# Data directory (use fast SSD/NVMe)data_dir = "/data/heliosdb"
# Write-ahead log (WAL) settingswal_sync_mode = "normal" # Options: none, normal, fullwal_buffer_size_mb = 64
# Write buffer (MemTable) sizewrite_buffer_size_mb = 256
# Number of write buffersmax_write_buffer_number = 4
# Background compaction threadsmax_background_compactions = 4max_background_flushes = 2
# SSTable block sizeblock_size_kb = 16
# Bloom filter bits per keybloom_filter_bits_per_key = 10
[compression]# Enable compressionenabled = true
# Algorithm: zstd, lz4, snappy, nonealgorithm = "zstd"
# Compression level (1-22 for zstd, higher = better compression, slower)level = 3
# Minimum size to compress (bytes)min_size_bytes = 1024
[cache]# Unified cache size (20-40% of RAM recommended)size_mb = 16384 # 16 GB
# Block cache sizeblock_cache_mb = 12288 # 12 GB
# Row cache sizerow_cache_mb = 4096 # 4 GB
# Cache policy: lru, lfu, arcpolicy = "lru"
# Cache TTL (seconds, 0 = no TTL)ttl_seconds = 3600
[performance]# Enable batchingenable_batching = true
# Batch window (milliseconds)batch_window_ms = 10
# Max batch size (operations)max_batch_size = 1000
# Enable query result cachingenable_query_cache = true
# Query cache size (MB)query_cache_mb = 2048
# Query cache TTL (seconds)query_cache_ttl = 600
# Parallel query executionmax_parallel_queries = 8
# SIMD accelerationenable_simd = true
[multitenancy]# Enable multi-tenancyenabled = true
# Default isolation modedefault_isolation_mode = "logical"
# Tenant metadata cache sizetenant_cache_mb = 256
# RLS policy cache sizerls_cache_mb = 128
[monitoring]# Enable metricsmetrics_enabled = true
# Metrics endpointmetrics_addr = "0.0.0.0:9090"
# Enable detailed statisticsenable_statistics = true
# Statistics collection interval (seconds)stats_interval_seconds = 60Storage Optimization
LSM-Tree Tuning
1. Write Amplification Reduction
[storage]# Reduce write amplificationmax_write_buffer_number = 6 # More buffers = less frequent flusheswrite_buffer_size_mb = 512 # Larger buffers = fewer SSTables
# Compaction strategycompaction_style = "leveled" # Options: leveled, universal, fifo
# Level 0 file limitslevel0_file_num_compaction_trigger = 4level0_slowdown_writes_trigger = 20level0_stop_writes_trigger = 36
# Target file sizetarget_file_size_base_mb = 64target_file_size_multiplier = 22. Read Amplification Reduction
[storage]# Bloom filters reduce read amplificationbloom_filter_bits_per_key = 10 # 1% false positive rate
# Block cache for frequently accessed datablock_cache_mb = 16384
# Enable index and filter cachingcache_index_and_filter_blocks = truepin_l0_filter_and_index_blocks_in_cache = true3. Space Amplification Reduction
[compression]# Aggressive compressionenabled = truealgorithm = "zstd"level = 5 # Higher level for better compression
[storage]# More aggressive compactionmax_bytes_for_level_base_mb = 512max_bytes_for_level_multiplier = 8HCC (Hierarchical Columnar Compression)
Configuration:
[hcc]# Enable HCCenabled = true
# Compression ratio (4-15x typical)target_ratio = 8.0
# Chunk size (KB)chunk_size_kb = 64
# Enable adaptive compressionadaptive = true
# Cold data threshold (days)cold_data_threshold_days = 30Benefits:
- Storage reduction: 60-80%
- Cost savings: 60-80% for cloud storage
- Performance impact: <5% for reads, <10% for writes
Cloud Tiering
Configuration:
[cloud_tiering]# Enable tieringenabled = true
# Cloud provider: s3, gcs, azureprovider = "s3"
# S3 configuration[cloud_tiering.s3]bucket = "heliosdb-cold-storage"region = "us-west-2"endpoint = "https://s3.us-west-2.amazonaws.com"
# Tiering policies[cloud_tiering.policy]# Move to cloud after N dayscold_data_threshold_days = 90
# Access frequency thresholdaccess_frequency_threshold = 0.01 # <1% access rate
# Storage class: standard, glacier, deep_archivestorage_class = "glacier"Performance Impact:
- Hot data (local SSD): 2-5ms latency
- Warm data (local with prefetch): 5-15ms latency
- Cold data (S3 standard): 50-200ms latency
- Cold data (Glacier): Minutes to hours (restore first)
Cache Tuning
Multi-Level Cache Architecture
Level 1: In-Memory Metadata Cache
[cache.l1]# Tenant metadatatenant_metadata_mb = 256
# RLS policy cacherls_policy_mb = 128
# Configuration cacheconfig_mb = 64
# TTL (seconds, 0 = infinite)ttl_seconds = 0 # Metadata rarely changesLevel 2: Query Result Cache (Redis)
[cache.l2]# Enable distributed cachingenabled = true
# Redis configurationredis_url = "redis://localhost:6379"
# Cache sizesize_mb = 4096
# TTL (seconds)ttl_seconds = 600 # 10 minutes
# Eviction policy: lru, lfu, randomeviction_policy = "lru"Level 3: Block Cache
[cache.l3]# Block cache for storagesize_mb = 16384 # 20-40% of RAM
# Index and filter block cachingcache_index_and_filter_blocks = true
# Pin frequently accessed blockspin_l0_filter_and_index_blocks_in_cache = trueCache Sizing Guidelines
Total RAM: 32 GB
OS and System: 4 GB (12%)Application: 4 GB (12%)Block Cache: 12 GB (38%)Row Cache: 4 GB (12%)Write Buffers: 2 GB (6%)Query Cache: 2 GB (6%)Tenant Metadata: 512 MB (2%)Remaining: 4 GB (12%)Total RAM: 64 GB
OS and System: 8 GB (12%)Application: 8 GB (12%)Block Cache: 24 GB (38%)Row Cache: 8 GB (12%)Write Buffers: 4 GB (6%)Query Cache: 4 GB (6%)Tenant Metadata: 1 GB (2%)Remaining: 7 GB (11%)Cache Monitoring
# Get cache statisticscurl http://localhost:9090/metrics | grep cache
# Key metrics:# - heliosdb_cache_hit_rate (target: >90%)# - heliosdb_cache_evictions_total (monitor for sizing)# - heliosdb_cache_memory_usage_bytes (utilization)Optimization Guidelines:
- Hit rate < 80%: Increase cache size
- Hit rate > 95%: Cache may be oversized
- High evictions: Increase cache or reduce TTL
- Low utilization: Reduce cache size
Query Optimization
Index Strategy
1. B-Tree Indexes (Range Queries)
-- Primary key index (automatic)CREATE TABLE orders ( order_id BIGINT PRIMARY KEY, ...);
-- Secondary indexes for common filtersCREATE INDEX idx_user_id ON orders(user_id);CREATE INDEX idx_created_at ON orders(created_at);CREATE INDEX idx_status ON orders(status);
-- Composite indexes for multi-column filtersCREATE INDEX idx_user_status ON orders(user_id, status);CREATE INDEX idx_user_created ON orders(user_id, created_at DESC);2. Vector Indexes (Similarity Search)
-- HNSW index for high-recall searchesCREATE VECTOR INDEX idx_product_embedding ON productsUSING hnsw (embedding)WITH ( dimension = 768, metric = 'cosine', m = 16, -- Connectivity (higher = better recall, slower build) ef_construction = 200, -- Build-time parameter ef_search = 100 -- Query-time parameter);
-- IVF index for large-scale searchesCREATE VECTOR INDEX idx_large_embedding ON large_tableUSING ivf (embedding)WITH ( dimension = 768, metric = 'l2', nlist = 1024, -- Number of clusters nprobe = 32 -- Clusters to search);3. Full-Text Indexes
CREATE FULLTEXT INDEX idx_content ON documents(content)WITH ( language = 'english', stemming = true, min_word_length = 3, stop_words = true);Query Patterns
1. Use EXPLAIN ANALYZE
EXPLAIN ANALYZESELECT *FROM ordersWHERE user_id = 12345 AND created_at >= '2025-01-01' AND status = 'completed';
-- Output shows:-- - Execution plan-- - Index usage-- - Row counts-- - Actual timing2. Optimize WHERE Clauses
-- Bad: Function on indexed columnSELECT * FROM ordersWHERE DATE(created_at) = '2025-12-09';
-- Good: Range on indexed columnSELECT * FROM ordersWHERE created_at >= '2025-12-09 00:00:00' AND created_at < '2025-12-10 00:00:00';**3. Avoid SELECT ***
-- Bad: Retrieves unnecessary dataSELECT * FROM orders WHERE user_id = 12345;
-- Good: Select only needed columnsSELECT order_id, amount, status, created_atFROM ordersWHERE user_id = 12345;4. Use LIMIT for Large Results
-- Always use LIMIT for paginationSELECT order_id, amount, created_atFROM ordersWHERE user_id = 12345ORDER BY created_at DESCLIMIT 100 OFFSET 0;5. Batch Inserts
-- Bad: Individual insertsINSERT INTO orders (order_id, user_id, amount) VALUES (1, 100, 99.99);INSERT INTO orders (order_id, user_id, amount) VALUES (2, 100, 149.99);
-- Good: Batch insertINSERT INTO orders (order_id, user_id, amount) VALUES (1, 100, 99.99), (2, 100, 149.99), (3, 101, 199.99); -- Up to 1000 rows per batchQuery Result Caching
[performance]enable_query_cache = truequery_cache_mb = 2048query_cache_ttl = 600 # 10 minutes
# Cache key includes:# - Query SQL (normalized)# - Query parameters# - Tenant context# - User context (for RLS)Benefits:
- Identical query latency: <1ms (cache hit)
- Reduces database load: 40-60%
- Especially effective for dashboards and analytics
Multi-Tenancy Performance
Tenant Isolation Modes
1. Logical Isolation (Cost-Effective)
[multitenancy]default_isolation_mode = "logical"
# Shared resources with quotas[multitenancy.quotas]default_storage_quota_gb = 100default_qps_limit = 1000default_max_connections = 50Performance Characteristics:
- Latency overhead: <5%
- Throughput impact: <10%
- Resource efficiency: 95%+
- Tenant density: 1,000+ per node
2. Physical Isolation (High-Performance)
[multitenancy]default_isolation_mode = "physical"
# Dedicated resources per tenant[multitenancy.physical]dedicated_cpu_cores = 4dedicated_ram_gb = 16dedicated_storage_gb = 500Performance Characteristics:
- Latency overhead: <1%
- Throughput impact: <2%
- Resource efficiency: 60-70%
- Tenant density: 10-50 per node
3. Hybrid Isolation (Balanced)
[multitenancy]default_isolation_mode = "hybrid"
# Shared storage, dedicated compute[multitenancy.hybrid]dedicated_cpu_cores = 2shared_storage = truededicated_ram_gb = 8Row-Level Security (RLS) Performance
Optimization:
-- Create indexes for RLS predicatesCREATE INDEX idx_tenant_user ON orders(tenant_id, user_id);
-- RLS policy automatically uses indexCREATE POLICY user_own_orders ON ordersFOR SELECTUSING (tenant_id = current_tenant_id() AND user_id = current_user_id());Performance Impact:
- Policy lookup: <0.5ms (cached)
- Query rewrite: <1ms
- Execution overhead: <5% (with proper indexes)
Quota Enforcement Performance
[multitenancy.quota_enforcement]# Check frequencycheck_interval_ms = 100
# Soft limit warningssoft_limit_threshold = 0.9 # 90% of quota
# Hard limit enforcementhard_limit_action = "throttle" # Options: throttle, reject
# Quota cache TTLquota_cache_ttl_ms = 1000Performance:
- Quota check latency: <0.1ms (cached)
- Storage quota overhead: <1%
- QPS quota overhead: <2%
Network Optimization
Connection Pooling
Client-Side Configuration:
use heliosdb_client::{ConnectionPool, PoolConfig};
let pool = ConnectionPool::new(PoolConfig { // Pool size (2-4x application threads) max_size: 20, min_idle: 5,
// Timeouts connection_timeout_ms: 5000, idle_timeout_ms: 300000, // 5 minutes max_lifetime_ms: 1800000, // 30 minutes
// Validation test_on_checkout: false, // Disable for performance test_on_return: false,}).await?;Server-Side Configuration:
[server]max_connections = 1000connection_timeout_ms = 30000keepalive_timeout_ms = 60000HTTP/2 and Compression
[server.http]# Enable HTTP/2enable_http2 = true
# Response compressionenable_compression = truecompression_algorithm = "br" # Brotli (better than gzip)compression_level = 4min_compression_size_bytes = 1024
# Request decompressionenable_decompression = truePerformance:
- HTTP/2 multiplexing: 40% fewer connections
- Brotli compression: 20-30% smaller responses
- Network bandwidth reduction: 60-70%
TCP Tuning
# Server-side TCP tuningnet.ipv4.tcp_fin_timeout = 30net.ipv4.tcp_keepalive_time = 600net.ipv4.tcp_keepalive_intvl = 60net.ipv4.tcp_keepalive_probes = 3
# Enable TCP Fast Opennet.ipv4.tcp_fastopen = 3
# BBR congestion control (Linux 4.9+)net.core.default_qdisc = fqnet.ipv4.tcp_congestion_control = bbrMonitoring & Profiling
Key Performance Metrics
1. Latency Metrics
# Query latency percentilesheliosdb_query_latency_seconds{quantile="0.5"} # p50heliosdb_query_latency_seconds{quantile="0.95"} # p95heliosdb_query_latency_seconds{quantile="0.99"} # p99heliosdb_query_latency_seconds{quantile="0.999"} # p999
# Target thresholds:# p50 < 10ms# p95 < 50ms# p99 < 100ms# p999 < 500ms2. Throughput Metrics
# Queries per secondrate(heliosdb_query_total[1m])
# Throughput by operationrate(heliosdb_query_total{operation="read"}[1m])rate(heliosdb_query_total{operation="write"}[1m])
# Target: >10,000 QPS per node3. Resource Metrics
# CPU utilization (target: 60-80%)heliosdb_cpu_usage_percent
# Memory utilization (target: 70-85%)heliosdb_memory_usage_bytes / heliosdb_memory_total_bytes
# Disk I/O (IOPS)rate(heliosdb_disk_reads_total[1m])rate(heliosdb_disk_writes_total[1m])
# Network I/O (bytes/sec)rate(heliosdb_network_received_bytes[1m])rate(heliosdb_network_sent_bytes[1m])4. Cache Metrics
# Cache hit rate (target: >90%)rate(heliosdb_cache_hits_total[1m]) / (rate(heliosdb_cache_hits_total[1m]) + rate(heliosdb_cache_misses_total[1m]))
# Cache evictions (monitor for capacity issues)rate(heliosdb_cache_evictions_total[1m])Profiling Tools
1. CPU Profiling
# Enable profiling endpointcurl http://localhost:9090/debug/pprof/profile?seconds=30 > cpu.prof
# Analyze with pprofgo tool pprof cpu.prof2. Memory Profiling
# Heap profilecurl http://localhost:9090/debug/pprof/heap > heap.prof
# Analyzego tool pprof heap.prof3. Query Profiling
-- Enable query loggingSET log_queries = true;SET log_min_duration_ms = 100; -- Log queries >100ms
-- View slow queriesSELECT * FROM heliosdb_slow_queriesORDER BY duration_ms DESCLIMIT 10;Troubleshooting Common Issues
High Latency
Symptoms:
- p99 latency > 200ms
- Slow dashboard loads
- Timeout errors
Diagnosis:
# Check cache hit ratecurl http://localhost:9090/metrics | grep cache_hit_rate# If <80%, increase cache size
# Check compactioncurl http://localhost:9090/metrics | grep compaction# If high, adjust compaction threads
# Check disk I/O waitiostat -x 1# If >10%, use faster storage
# Check slow queriesSELECT * FROM heliosdb_slow_queries WHERE duration_ms > 100;Solutions:
- Increase cache size
- Add missing indexes
- Optimize slow queries
- Upgrade to faster storage (NVMe)
- Enable query result caching
Low Throughput
Symptoms:
- QPS < 5,000 per node
- CPU utilization < 40%
- High request queuing
Diagnosis:
# Check connection poolcurl http://localhost:9090/metrics | grep connections# If maxed out, increase pool size
# Check batch settings# If disabled, enable batching
# Check parallelismcurl http://localhost:9090/metrics | grep parallel_queriesSolutions:
- Increase worker threads
- Enable batching
- Increase connection pool
- Enable parallel query execution
- Scale horizontally (add nodes)
High Memory Usage
Symptoms:
- Memory usage > 90%
- OOM errors
- Frequent cache evictions
Diagnosis:
# Check memory breakdowncurl http://localhost:9090/metrics | grep memory
# Check cache sizes# block_cache + row_cache + write_buffers + query_cache
# Check for leakspmap -x $(pidof heliosdb) | tail -20Solutions:
- Reduce cache sizes
- Reduce write buffer size
- Reduce max_write_buffer_number
- Enable compression
- Add more RAM or scale horizontally
Production Best Practices
1. Configuration Template
config.prod.toml (64 GB RAM, 16 CPU node)
[server]listen_addr = "0.0.0.0:8080"worker_threads = 32 # 2x CPUsmax_connections = 2000
[storage]data_dir = "/data/heliosdb"write_buffer_size_mb = 512max_write_buffer_number = 6max_background_compactions = 8max_background_flushes = 4
[compression]enabled = truealgorithm = "zstd"level = 3
[cache]size_mb = 32768 # 50% of RAMblock_cache_mb = 24576row_cache_mb = 8192policy = "lru"
[performance]enable_batching = truebatch_window_ms = 10max_batch_size = 1000enable_query_cache = truequery_cache_mb = 4096max_parallel_queries = 16enable_simd = true
[multitenancy]enabled = truedefault_isolation_mode = "logical"
[monitoring]metrics_enabled = truemetrics_addr = "0.0.0.0:9090"enable_statistics = truestats_interval_seconds = 602. Deployment Checklist
- Hardware meets minimum requirements
- OS tuning parameters applied
- Filesystem optimized (XFS/ext4, noatime)
- I/O scheduler set (noop/deadline for SSD)
- Network tuning applied (BBR, TCP Fast Open)
- HeliosDB configuration optimized
- Monitoring and alerting configured
- Backups scheduled
- High availability setup (if required)
- Load testing completed
- Runbook prepared
3. Monitoring Alerts
Critical Alerts:
- p99 latency > 500ms (5 minutes)
- Error rate > 1% (5 minutes)
- CPU usage > 90% (10 minutes)
- Memory usage > 95% (5 minutes)
- Disk usage > 90% (immediate)
- Replication lag > 10s (5 minutes)
Warning Alerts:
- p99 latency > 200ms (10 minutes)
- Cache hit rate < 80% (15 minutes)
- Compaction lag > 100 files (15 minutes)
- Connection pool exhausted (5 minutes)
- Disk I/O wait > 20% (10 minutes)
Conclusion
Performance tuning is an iterative process. Start with the quick wins, establish baselines, monitor metrics, and continuously optimize based on workload patterns.
Key Takeaways:
- Cache is critical: 90%+ hit rate is the goal
- Indexes matter: Create indexes for all query patterns
- Batch operations: 10x throughput improvement
- Monitor everything: You can’t optimize what you don’t measure
- Test under load: Production workloads differ from synthetic benchmarks
Related Documentation:
Document Information:
- Version: 7.0
- Last Updated: December 9, 2025
- Lines: 1,000+
- Status: Production Ready