HeliosDB Resource Limits Reference
HeliosDB Resource Limits Reference
A comprehensive guide to understanding, configuring, and managing resource limits in HeliosDB production deployments. This reference covers system-level limits, database-level limits, capacity planning, and optimization strategies for different workloads.
Table of Contents
- Quick Reference Tables
- System-Level Resource Limits
- Database-Level Resource Limits
- Configuration Examples
- Capacity Planning Guide
- Monitoring Resource Usage
- Workload-Specific Tuning
- Best Practices
Quick Reference Tables
System-Level Limits Summary
| Resource | Default | Small Deployment | Medium Deployment | Large Deployment | Impact |
|---|---|---|---|---|---|
| Memory | 8GB | 4GB | 16GB | 64GB+ | Query performance, Cache hit rate |
| Max Connections | 10,000 | 500 | 5,000 | 20,000+ | Concurrent users, Connection overhead |
| Connections per User | 100 | 10 | 50 | 200 | Application concurrency |
| File Descriptors | 10,000 | 2,500 | 25,000 | 65,000+ | Parallel I/O operations |
| CPU Cores | Auto-detect | 2-4 | 8-16 | 32+ | Parallelism, Query throughput |
| Disk Space | 500GB | 100GB | 1TB | 10TB+ | Data retention, Compaction overhead |
Database-Level Limits Summary
| Resource | Default | Max | Configurable | Impact |
|---|---|---|---|---|
| Query Timeout | 30 seconds | 5 minutes | Per-query | Query response time |
| Query Memory Limit | 1GB | Configurable | Yes | Large joins, aggregations |
| Connection Memory | 512MB | Configurable | Yes | Connection overhead |
| Total Query Memory | 8GB | Configurable | Yes | Concurrent query capacity |
| Max Result Rows | 1,000,000 | Configurable | Yes | Memory usage, Client buffering |
| Statement Complexity Limit | 10,000 | Configurable | Yes | Optimizer performance |
| Transaction Size Limit | 2GB | Configurable | Yes | Memory, Log files |
| Index Size Limit | Unlimited | Configurable | Yes | Disk space, Build time |
System-Level Resource Limits
Memory Limits
Buffer Pool Configuration
The buffer pool manages in-memory storage of data blocks and caches query results.
Default Configuration:
- Buffer Pool Size: 50% of total system memory (4GB on 8GB system)
- Page Size: 16KB
- Max Pages: Dynamic allocation
- Eviction Policy: LRU (Least Recently Used)
Recommended Values:
# Small Deployment (4GB RAM total)[storage]buffer_pool_mb = 2048 # 2GBpage_size_kb = 16eviction_policy = "lru"
# Medium Deployment (16GB RAM total)[storage]buffer_pool_mb = 8192 # 8GBpage_size_kb = 16eviction_policy = "lru"
# Large Deployment (64GB+ RAM total)[storage]buffer_pool_mb = 32768 # 32GBpage_size_kb = 16eviction_policy = "arc" # Adaptive Replacement CachePerformance Impact:
- Too Small (<2GB): High disk I/O, reduced cache hit rates, slower queries
- Too Large (>80% RAM): OOM errors, system swapping, unstable performance
- Optimal: 60-75% of available RAM, cache hit rate >90%
Warning Signs:
- Cache hit rate drops below 70%
- Disk I/O utilization >80%
- Page eviction rate increases significantly
- Query latency increases unexpectedly
Working Memory
Memory allocated for query execution (sorting, hashing, temporary tables).
Default Configuration:
- Per Query: 1GB
- Per Connection: 512MB
- Total Limit: 8GB
Recommended Values:
| Workload | Per Query | Per Connection | Recommended Total |
|---|---|---|---|
| OLTP | 256MB | 128MB | 2GB |
| OLAP | 2-4GB | 512MB | 16GB |
| Mixed | 1GB | 256MB | 8GB |
| Edge/IoT | 64MB | 32MB | 512MB |
Configuration:
// Rust APIuse heliosdb_common::MemoryLimits;
let limits = MemoryLimits { per_query_mb: 1024, per_connection_mb: 512, total_mb: 8192, pressure_threshold: 0.80,};Connection Limits
Maximum Connections
Total number of simultaneous database connections.
Default: 10,000 connections
Considerations:
- Each connection consumes ~1-2MB RAM for connection context
- File descriptor (FD) limit must be higher than connection limit
- Connection pooling can reduce actual connection count
Recommended Values:
# Small Deployment - 500 Usersmax_connections = 500max_connections_per_user = 10idle_timeout_seconds = 600
# Medium Deployment - 5,000 Usersmax_connections = 5000max_connections_per_user = 50idle_timeout_seconds = 600
# Large Deployment - 50,000 Usersmax_connections = 20000 # Use connection poolingmax_connections_per_user = 200idle_timeout_seconds = 900Connection Memory Calculation:
Total Connection Memory = max_connections * connection_context_sizeTypical: 10,000 connections * 1.5MB = ~15GB overheadWarning Signs:
- “Too many connections” errors
- Connection acquisition timeout errors
- Memory usage increases unexpectedly during high-load
- Idle connections not being cleaned up
Per-User Connection Limits
Prevents single user/application from consuming all connections.
Default: 100 connections per user
Recommended:
- Development: 5-10 per user
- Testing: 20-50 per user
- Production: 50-200 per user
File Descriptor Limits
System-Level FD Limits
Every database file, socket, and pipe requires a file descriptor.
Important: OS-level limits must exceed HeliosDB limits.
Check current limits:
# View soft and hard limitsulimit -Sn # Soft limitulimit -Hn # Hard limit
# Recommended permanent configurationecho "* soft nofile 65536" | sudo tee -a /etc/security/limits.confecho "* hard nofile 131072" | sudo tee -a /etc/security/limits.confDatabase-Level Configuration:
# Small Deploymentmax_open_files = 2500file_handle_timeout_seconds = 300
# Medium Deploymentmax_open_files = 25000file_handle_timeout_seconds = 600
# Large Deploymentmax_open_files = 65536file_handle_timeout_seconds = 900FD Consumption Breakdown:
| Category | Count | Notes |
|---|---|---|
| Connections | max_connections | 1-2 FDs per connection |
| SSTable Files | memtable_count * 5 | 5-10 FDs per compaction level |
| WAL (Write-Ahead Log) | 10-20 | Rotating log files |
| Temporary Files | Variable | Query temp files, index builds |
| System FDs | 100-200 | Network, pipes, internal |
Calculation Example:
10,000 connections × 1.5 FDs = 15,0001,000 SSTable files × 1 FD = 1,000100 temporary files = 100System overhead = 200─────────────────────────────Typical requirement: 16,300 FDsRecommended limit: 25,000+ FDsWarning Signs:
- “Too many open files” errors
- Query execution fails with file access errors
- Compaction operations stall
- Unable to open new database files
Disk Space Considerations
Data Storage Requirements
Includes table data, indexes, WAL, and compaction overhead.
Estimation Formula:
Total Disk = (Raw Data Size × Compression Factor) + (Indexes × Index Overhead) + (WAL × Log Retention) + (Temp Space × 25%)Compression Factors (HeliosDB defaults):
- Structured data: 5-10x compression
- Numeric data: 10-100x compression (with delta/dictionary encoding)
- Text data: 2-3x compression
- No compression: 1x
Index Overhead:
- B-Tree indexes: 10-20% of table size
- HNSW vector indexes: 5-15% per index
- Multiple indexes on same table: 50-100% total overhead
Example Calculation (Medium Deployment):
Raw Data Size: 1TBCompression Factor: 7xCompressed Size: 143GB
Primary Table: 143GBB-Tree Indexes (3 indexes): 42GB (30% of table)Vector Index (HNSW): 24GB (20% of table)WAL (30-day retention): 15GBTemporary Space (25%): 51GB─────────────────────────Total Disk Requirement: 275GBRecommended Allocation: 400GB (with 45% headroom)Recommended Disk Space Headroom:
- Small deployments: 30-50% free space
- Medium deployments: 25-30% free space
- Large deployments: 20-25% free space
- Production critical: Never go below 20% free
Warning Signs:
- Disk usage >80%
- Compaction operations slow down
- Index creation fails
- WAL archival delays
- Query performance degrades
CPU Core Utilization
Parallelism Configuration
HeliosDB automatically scales to available CPU cores.
Default: Auto-detect all available cores
Configuration Options:
# Small Deployment - 2-4 cores[compute]max_parallel_tasks = 4worker_threads = 4io_threads = 2
# Medium Deployment - 8-16 cores[compute]max_parallel_tasks = 16worker_threads = 16io_threads = 4
# Large Deployment - 32+ cores[compute]max_parallel_tasks = 64worker_threads = 32io_threads = 8Core Allocation Strategy:
| System Cores | Query Workers | I/O Workers | Compaction Workers |
|---|---|---|---|
| 2-4 | 2 | 1 | 1 |
| 8-16 | 8 | 2 | 2 |
| 32-64 | 28 | 4 | 4 |
| 64+ | N-4 | 4 | 4 |
Consideration: Hyper-Threading
- Logical cores = Physical cores × 2 (with HT)
- For database workloads, use logical core count
- Monitor CPU utilization; >90% sustained indicates bottleneck
Warning Signs:
- CPU utilization consistently <30% (underutilization)
- CPU utilization consistently >95% (bottleneck)
- Query latency increases with load (insufficient parallelism)
- Context switches too high (oversubscription)
Database-Level Resource Limits
Query Execution Limits
Query Timeout
Maximum time a query can execute before automatic cancellation.
Default: 30 seconds
Configuration:
// Global configurationlet config = QueryConfig { timeout_ms: 30_000, // 30 seconds default max_timeout_ms: 300_000, // 5 minute absolute max auto_cancel: true,};
// Per-query overridelet context = QueryContext::new(user) .with_timeout(120_000); // 2 minutes for this query
// Via SQLSET SESSION query_timeout = 60000; -- 60 secondsRecommended Values:
| Query Type | Timeout | Max Timeout | Notes |
|---|---|---|---|
| Simple SELECT | 5s | 30s | Primary key lookup |
| Analytical JOIN | 30s | 300s | Requires more I/O |
| Complex Aggregation | 60s | 600s | Full scan, heavy computation |
| Batch ETL | 300s | 3600s | Large data movement |
| Streaming | 0 | 0 | Timeout not applicable |
Configuration Impact:
- Too Short: Valid queries abort, user frustration, incomplete results
- Too Long: Resource exhaustion, cascading failures, poor user experience
- Optimal: Balanced for workload, 90th percentile query time + 50% buffer
Query Memory Limit
Maximum memory a single query can allocate.
Default: 1GB per query
Configuration:
[query]max_memory_mb = 1024 # 1GB defaultmax_memory_per_connection_mb = 512
# Override per connectionSET SESSION work_mem = '2GB';Memory Usage Scenarios:
| Scenario | Typical Usage | Recommended Limit |
|---|---|---|
| Simple SELECT | 10-50MB | 256MB |
| Join 2-3 tables | 100-500MB | 1GB |
| Complex JOIN 5+ tables | 500MB-2GB | 4GB |
| Aggregation on 1B rows | 1-4GB | 8GB |
| Window functions | 500MB-2GB | 4GB |
Out-of-Memory Behavior:
// When query exceeds memory limit:// 1. Attempt disk spillover (temp files)// 2. If temp space exhausted, abort query// 3. Return MemoryExceeded error
// Configuration: Enable spillover[query]enable_temp_spill = truemax_temp_spill_mb = 10240 # 10GB temp spaceWarning Signs:
- Queries consistently hit memory limit
- Disk I/O spikes during large queries
- OOM errors in logs
- Query performance degrades with data size growth
Result Set Size Limits
Maximum rows in query result set.
Default: 1,000,000 rows
Configuration:
[query]max_result_rows = 1000000
# SQL overrideSET SESSION result_limit = 5000000;Impact on Different Layers:
| Component | Impact | Calculation |
|---|---|---|
| Network Transfer | Bandwidth usage | 1M rows × 100 bytes/row = 100MB |
| Client Buffering | Client memory | 1M rows × 100 bytes = 100MB |
| Server Memory | Result storage | Varies by serialization |
| Wire Protocol | Encoding overhead | ~20% additional |
Recommended Values:
- REST/HTTP API: 10,000-100,000 rows (minimize network traffic)
- Direct connection: 100,000-1,000,000 rows
- Batch exports: 1,000,000+ rows (implement pagination)
Statement Complexity Limits
Query Optimization Limits
Prevents optimizer from spending excessive time on complex queries.
Default: 10,000 optimization operations
Configuration:
[optimizer]max_optimization_operations = 10000max_join_order_permutations = 1000max_plan_cache_entries = 10000Complexity Indicators:
Operation Count = JOINs × CTE_count × UNION_count × subqueriesExamples:
- Simple SELECT: 1-10 operations
- 5-table JOIN: 100-500 operations
- 10-table JOIN: 1000-5000 operations
- Query with 10+ CTEs: >10,000 operations
Performance Impact:
- Exceeded: Falls back to simple plan, suboptimal execution
- Well-tuned: Query optimized before 500ms, good performance
- Warning: Optimization takes >2 seconds for query <500MB data
Transaction Size Limits
Maximum Transaction Size
Total data modification per transaction.
Default: 2GB per transaction
Configuration:
[transactions]max_transaction_size_mb = 2048
# SQL levelBEGIN TRANSACTION;-- INSERT/UPDATE/DELETE operations-- Total writes must be < 2GBCOMMIT;Memory Impact Calculation:
Transaction Memory = INSERT rows × avg_row_size + UPDATE rows × avg_row_size + Index overheadExample:
Inserting 10M rows × 500 bytes = 5GB dataIndex overhead (3 indexes × 15%) = 2.25GBLock info = 1GB─────────────────────────────────Total: ~8.25GB memory requiredRecommendations:
| Workload | Max Size | Batch Size | Impact |
|---|---|---|---|
| OLTP | 100MB | 1M rows | Fast commits |
| OLAP | 2GB+ | Larger batches | Higher resource use |
| Streaming | 500MB | Continuous | Balanced |
| ETL | 1GB | Job-specific | Memory + I/O |
WAL (Write-Ahead Log) Impact:
WAL Size ≈ Transaction Size × 1.2x30-day retention: 5GB/day transaction rate = 150GB WALIndex Size Limits
Per-Index Limits
HeliosDB supports unlimited index size by default, but consider:
Memory Impact During Index Builds:
Index Build Memory = Data Size × 2-3xBuild Time ≈ (Data Size / (Cores × 100MB/s))Examples:
| Index Type | Data Size | Build Memory | Build Time (8 cores) |
|---|---|---|---|
| B-Tree | 100GB | 200-300GB | 15-25 minutes |
| HNSW Vector | 10GB (1M vectors) | 30-50GB | 30-60 minutes |
| Inverted Text | 50GB | 100-150GB | 10-20 minutes |
| BRIN (Block Range) | 100GB | 5-10GB | 2-5 minutes |
Recommendation: Set advisory limits
[storage.index_limits]max_bree_index_size_gb = 500max_vector_index_size_gb = 100max_text_index_size_gb = 200
# Triggers warning but doesn't blockwarning_threshold_percent = 80Warning Signs:
- Index build runs out of memory
- Index build takes >1 hour
- Queries not using indexes due to size
- Random-access performance degradation
Configuration Examples
Example 1: Small Deployment (Development/Testing)
# Small HeliosDB Configuration (4GB RAM, 4 cores)
[storage]data_dir = "/var/lib/heliosdb/data"memtable_size_mb = 64sstable_size_mb = 128buffer_pool_mb = 2048compaction_strategy = "size_tiered"
[network]listen_addr = "0.0.0.0:5432"max_message_size_mb = 64max_connections = 500
[compute]worker_threads = 4io_threads = 2max_parallel_tasks = 4
[query]timeout_ms = 30000max_memory_mb = 512max_result_rows = 100000
[resources]max_connections = 500max_connections_per_user = 10max_open_files = 2500idle_timeout_seconds = 300
[memory]total_limit_mb = 3000query_limit_mb = 512connection_limit_mb = 256Expected Performance:
- Throughput: 100-500 ops/sec
- Latency: 10-100ms p95
- Concurrent users: 10-20
- Max data: 10-50GB
Example 2: Medium Production Deployment
# Medium HeliosDB Configuration (16GB RAM, 16 cores)
[storage]data_dir = "/var/lib/heliosdb/data"memtable_size_mb = 256sstable_size_mb = 512buffer_pool_mb = 8192compaction_strategy = "leveled"compaction_level_multiplier = 10
[network]listen_addr = "0.0.0.0:5432"rdma_enabled = falsemax_message_size_mb = 128max_connections = 5000
[compute]worker_threads = 16io_threads = 4max_parallel_tasks = 16
[query]timeout_ms = 60000max_memory_mb = 2048max_result_rows = 1000000enable_temp_spill = truemax_temp_spill_mb = 5120
[resources]max_connections = 5000max_connections_per_user = 50max_open_files = 25000idle_timeout_seconds = 600connection_memory_mb = 512
[memory]total_limit_mb = 12000query_limit_mb = 2048connection_limit_mb = 512
[autoscale]enabled = truemin_cu = 10.0max_cu = 100.0target_cpu_percent = 70.0scale_up_threshold = 80.0scale_down_threshold = 30.0Expected Performance:
- Throughput: 10,000-50,000 ops/sec
- Latency: 5-20ms p95
- Concurrent users: 100-500
- Max data: 500GB-2TB
Example 3: Large OLAP Deployment
# Large HeliosDB OLAP Configuration (64GB RAM, 32 cores)
[storage]data_dir = "/var/lib/heliosdb/data"memtable_size_mb = 1024sstable_size_mb = 2048buffer_pool_mb = 32768compaction_strategy = "leveled"compaction_level_multiplier = 20bloom_filter_bits_per_key = 15
[network]listen_addr = "0.0.0.0:5432"rdma_enabled = truemax_message_size_mb = 256max_connections = 20000
[compute]worker_threads = 32io_threads = 8max_parallel_tasks = 64simd_enabled = true
[query]timeout_ms = 600000 # 10 minutes for OLAPmax_memory_mb = 8192max_result_rows = 10000000enable_temp_spill = truemax_temp_spill_mb = 51200
[resources]max_connections = 20000max_connections_per_user = 200max_open_files = 65536idle_timeout_seconds = 900connection_memory_mb = 512
[memory]total_limit_mb = 48000query_limit_mb = 8192connection_limit_mb = 512
[vector]index_type = "hnsw"hnsw_m = 32hnsw_ef_construction = 400
[gpu]enabled = truememory_limit = 16106127360 # 15GBdevice_id = 0enable_aggregations = trueenable_joins = trueExpected Performance:
- Throughput: 100,000+ ops/sec for analytical queries
- Latency: 50-500ms p95 (OLAP workload)
- Concurrent users: 1000+
- Max data: 10TB+
Example 4: Edge/IoT Constrained Environment
# Edge/IoT HeliosDB Configuration (512MB RAM, 2 cores)
[storage]data_dir = "/var/lib/heliosdb/data"memtable_size_mb = 16sstable_size_mb = 32buffer_pool_mb = 256compaction_strategy = "size_tiered"gc_grace_seconds = 86400 # 1 day
[network]listen_addr = "127.0.0.1:5432"max_connections = 100max_message_size_mb = 16
[compute]worker_threads = 2io_threads = 1max_parallel_tasks = 2
[query]timeout_ms = 5000 # 5 secondsmax_memory_mb = 64max_result_rows = 10000enable_temp_spill = false
[resources]max_connections = 100max_connections_per_user = 5max_open_files = 1024idle_timeout_seconds = 60connection_memory_mb = 32
[memory]total_limit_mb = 400query_limit_mb = 64connection_limit_mb = 32Expected Performance:
- Throughput: 10-100 ops/sec
- Latency: 50-500ms p95
- Concurrent users: 1-5
- Max data: 1-10GB
Capacity Planning Guide
Estimating Total Resource Needs
Formula:
Required_Resources = Baseline + Workload + Safety_MarginStep 1: Baseline Overhead
CPU: 1-2 cores (OS, background tasks)RAM: 2-4GB (OS, system services)Disk: 10% of total data capacityStep 2: Workload Calculation
For concurrent queries:
RAM needed = (Concurrent_Queries × Avg_Query_Memory) + (Connections × Connection_Overhead) + (Buffer_Pool_Size) + BaselineFor disk requirements:
Disk needed = (Raw_Data × Compression_Factor) + (Indexes × Index_Overhead) + (WAL × Retention_Days / 365) + BaselineStep 3: Safety Margin
Recommended headroom:
- Small deployments: +50% (budget for growth, peaks)
- Medium deployments: +30% (account for spikes)
- Large deployments: +20% (economies of scale)
- Production critical: +40% minimum
Capacity Planning Example
Scenario: 500GB data, 1000 concurrent users, mixed OLTP/OLAP
Step 1: Baseline
- CPU: 2 cores
- RAM: 3GB
- Disk: 50GB
Step 2: Workload
Data storage:
500GB × 7x compression = 71GB compressedIndexes (4 per table × 15%) = 20GBWAL 30-day retention = 5GBTemp space = 18GB (25% of data)─────────────────────Total disk: 164GBConcurrent query memory:
1000 users × 50MB avg connection = 50GBConcurrent queries (20% of users): 200 queries × 500MB avg = 100GBBuffer pool = 16GB─────────────────────Total RAM: ~166GB (impractical)Solution: Connection pooling
1000 users → 5000 pooled connectionsUse 500 actual DB connections500 × 2MB = 1GB overhead200 concurrent queries × 500MB = 100GBBuffer pool = 16GB────────────────────Total RAM: 117GB (still high)Alternative: Query optimization
Reduce concurrent OLAP to 10 max (queue excess)10 × 2GB (larger) = 20GB500 users × 20MB (smaller) = 10GBBuffer pool = 16GB────────────────────Total RAM: 46GBFinal Capacity Recommendation:
RAM: 64GB (headroom for future growth)CPU: 32 cores (support parallelism)Disk: 250GB (network RAID for redundancy)Network: 10Gbps (support concurrent users)Growth Projections
Plan for growth using this formula:
Year N = Current Size × (1 + Annual Growth Rate)^NExample: 500GB current, 30% annual growth
| Year | Data Size | Disk Allocation | RAM Needed | CPU Cores |
|---|---|---|---|---|
| 0 | 500GB | 250GB | 32GB | 16 |
| 1 | 650GB | 325GB | 40GB | 20 |
| 2 | 845GB | 425GB | 52GB | 24 |
| 3 | 1.1TB | 550GB | 68GB | 32 |
| 4 | 1.4TB | 700GB | 88GB | 40 |
| 5 | 1.8TB | 900GB | 114GB | 48 |
Recommendation: Plan major upgrades every 18-24 months to stay ahead of curve.
Monitoring Resource Usage
Key Metrics to Track
Memory Metrics
-- Query memory usageSELECT query_id, user_id, memory_used_mb, memory_limit_mb, (memory_used_mb::float / memory_limit_mb) AS usage_percent, CASE WHEN memory_used_mb::float / memory_limit_mb > 0.90 THEN 'CRITICAL' WHEN memory_used_mb::float / memory_limit_mb > 0.75 THEN 'HIGH' WHEN memory_used_mb::float / memory_limit_mb > 0.50 THEN 'MEDIUM' ELSE 'LOW' END AS pressure_levelFROM system.active_queriesORDER BY memory_used_mb DESC;
-- System memory pressureSELECT buffer_pool_usage_percent, temp_spill_mb, eviction_rate, cache_hit_rate, gc_pause_msFROM system.memory_stats;Connection Metrics
-- Connection usageSELECT COUNT(*) AS total_connections, COUNT(DISTINCT user_id) AS unique_users, MAX(EXTRACT(EPOCH FROM (NOW() - last_activity))) AS max_idle_seconds, COUNT(*) FILTER (WHERE state = 'idle') AS idle_connections, COUNT(*) FILTER (WHERE state = 'active') AS active_connectionsFROM system.connections;
-- Per-user connection limitsSELECT user_id, COUNT(*) AS connection_count, max_connections_per_user, CASE WHEN COUNT(*) > max_connections_per_user * 0.9 THEN 'WARNING' WHEN COUNT(*) >= max_connections_per_user THEN 'LIMIT_EXCEEDED' ELSE 'OK' END AS statusFROM system.connectionsGROUP BY user_id, max_connections_per_userHAVING COUNT(*) > max_connections_per_user * 0.7ORDER BY connection_count DESC;Disk I/O Metrics
-- File descriptor usageSELECT open_files, max_open_files, (open_files::float / max_open_files) * 100 AS usage_percent, CASE WHEN open_files::float / max_open_files > 0.90 THEN 'CRITICAL' WHEN open_files::float / max_open_files > 0.75 THEN 'HIGH' ELSE 'OK' END AS statusFROM system.resource_limits;
-- I/O statisticsSELECT reads_per_second, writes_per_second, read_latency_ms, write_latency_ms, cache_miss_rateFROM system.io_stats;CPU Metrics
-- Query execution distributionSELECT query_type, COUNT(*) AS query_count, AVG(execution_time_ms) AS avg_time_ms, MAX(execution_time_ms) AS max_time_ms, PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) AS p95_time_msFROM system.query_logWHERE timestamp > NOW() - INTERVAL '1 hour'GROUP BY query_typeORDER BY query_count DESC;
-- CPU utilizationSELECT worker_id, cpu_percent, active_queries, queued_queries, context_switches_per_secFROM system.worker_stats;Alerting Thresholds
| Metric | Warning | Critical | Action |
|---|---|---|---|
| Memory Usage | >80% | >95% | Kill idle queries, reduce cache |
| Connections | >80% limit | At limit | Close idle connections |
| File Descriptors | >80% limit | >95% limit | Reduce open files |
| Query Memory | >1GB | >Limit | Spill to disk or reject |
| CPU Usage | >85% sustained | >98% | Throttle clients, add resources |
| Disk Free | <20% | <10% | Archive data, add storage |
| Compaction Lag | >1 hour | >6 hours | Increase compaction threads |
Monitoring Setup Example
// HeliosDB monitoring exampleuse heliosdb_common::ResourceLimitsManager;use std::time::Duration;
async fn monitor_resources(manager: &ResourceLimitsManager) { let mut interval = tokio::time::interval(Duration::from_secs(60));
loop { interval.tick().await;
let stats = manager.get_statistics(); let health = manager.health_check().await;
// Log metrics info!("Memory: {}%", stats.memory_usage_percent); info!("Connections: {}/{}", stats.active_connections, stats.max_connections); info!("Health: {}", health.summary());
// Alert on thresholds if stats.memory_usage_percent > 90.0 { warn!("Memory pressure critical!"); }
if stats.active_connections as f64 / stats.max_connections as f64 > 0.9 { warn!("Connection limit near threshold"); }
if !health.is_healthy { error!("System health degraded: {}", health.summary()); } }}Workload-Specific Tuning
OLTP Optimization
Characteristics:
- Small, frequent queries
- Low latency requirement (<100ms p95)
- High throughput (1000s ops/sec)
- Multi-user contention
Configuration:
[query]timeout_ms = 10000 # 10 second maxmax_memory_mb = 256 # Small queriesmax_result_rows = 10000
[resources]max_connections = 5000 # Many concurrent usersmax_connections_per_user = 50
[compute]worker_threads = <num_cores> # Use all coresio_threads = <num_cores / 4>
[cache]buffer_pool_ratio = 0.60 # 60% of RAM for cachelru_eviction_enabled = trueQuery Optimization:
-- Create covering indexes for common queriesCREATE INDEX idx_orders_user_createdON orders(user_id, created_at)INCLUDE (order_total);
-- Use prepared statementsPREPARE get_user_orders (INT) ASSELECT * FROM orders WHERE user_id = $1;
-- Enable query caching for read-heavy workloadSET SESSION query_cache_enabled = true;Monitoring:
-- Monitor slow queriesSELECT query, COUNT(*) AS executions, AVG(execution_time_ms) AS avg_ms, MAX(execution_time_ms) AS max_msFROM system.query_logWHERE execution_time_ms > 100GROUP BY queryORDER BY max_ms DESCLIMIT 20;OLAP Optimization
Characteristics:
- Large, complex queries
- Higher latency acceptable (seconds to minutes)
- Moderate throughput requirement
- Batch processing
Configuration:
[query]timeout_ms = 600000 # 10 minutes for large scansmax_memory_mb = 8192 # Large aggregate allocationsmax_result_rows = 10000000 # Bulk exports
[resources]max_connections = 100 # Fewer concurrent usersmax_connections_per_user = 5
[compute]worker_threads = <num_cores>io_threads = <num_cores / 2>
[cache]buffer_pool_ratio = 0.70 # More RAM for scanningcolumnar_cache_enabled = true
[vector]enable_vectorization = truehnsw_ef_construction = 400 # Better search qualityQuery Optimization:
-- Partitioning for faster scansCREATE TABLE events ( id INT, timestamp TIMESTAMP, data TEXT) PARTITION BY DATE (timestamp);
-- Materialized views for common aggregationsCREATE MATERIALIZED VIEW daily_stats ASSELECT DATE(timestamp) AS date, event_type, COUNT(*) AS count, AVG(value) AS avg_valueFROM eventsGROUP BY DATE(timestamp), event_type;
-- Use column-oriented storage for wide tablesCREATE TABLE analytics_data ( id INT, timestamp TIMESTAMP, event_type TEXT, user_id INT, -- 50+ columns) WITH (storage_format = 'columnar');Monitoring:
-- Monitor long-running queriesSELECT query_id, query, EXTRACT(EPOCH FROM (NOW() - start_time)) AS elapsed_seconds, memory_used_mb, rows_scanned, rows_returnedFROM system.active_queriesWHERE EXTRACT(EPOCH FROM (NOW() - start_time)) > 30ORDER BY elapsed_seconds DESC;Mixed Workload Balancing
For systems handling both OLTP and OLAP:
Resource Isolation:
[workload_isolation]# Separate thread poolsoltp_threads = <num_cores / 2>olap_threads = <num_cores / 2>
# Separate memory poolsoltp_memory_mb = 8192olap_memory_mb = 32768
# Query routingoltp_max_query_time_ms = 5000olap_max_query_time_ms = 600000Query Classification:
-- Automatic routing based on characteristicsSELECT CASE WHEN execution_time_ms < 100 THEN 'OLTP' WHEN execution_time_ms < 5000 THEN 'OLTP_BOUNDARY' WHEN execution_time_ms < 60000 THEN 'MIXED' ELSE 'OLAP' END AS workload_type, COUNT(*) AS frequencyFROM system.query_logGROUP BY workload_type;Concurrency Control:
// Implement query queuing for balanceif is_olap_query(&query) && olap_queue.len() > 10 { // Throttle aggressive OLAP clients apply_backpressure();} else if is_oltp_query(&query) { // Prioritize OLTP prioritize_execution();}Best Practices
1. Right-Sizing for Your Workload
- Measure before deploying: Run load tests with representative data
- Use historical data: Analyze past usage patterns
- Plan for growth: Add 25-50% capacity headroom
- Regular reviews: Adjust configuration quarterly based on actual usage
2. Connection Management
// Use connection poolinguse sqlx::postgres::PgPoolOptions;
let pool = PgPoolOptions::new() .max_connections(100) .min_connections(10) .acquire_timeout(Duration::from_secs(5)) .connect(&database_url) .await?;
// Reuse connectionslet connection = pool.acquire().await?;3. Query Optimization
- Use EXPLAIN ANALYZE to understand query plans
- Create appropriate indexes based on query patterns
- Denormalize when necessary for read-heavy workloads
- Monitor query performance continuously
4. Memory Management
- Set realistic memory limits based on available resources
- Enable spill to disk for temporary large datasets
- Monitor cache hit rates (target >85%)
- Regularly clear temporary files
5. Disk Space Management
# Monitor disk usagedf -h /var/lib/heliosdb
# Archive old dataheliosdb-cli archive --table events --before 2024-01-01
# Cleanup temporary filesrm -rf /var/lib/heliosdb/temp/*6. Production Hardening
- Set OS-level limits for file descriptors
- Enable resource accounting for audit trails
- Implement monitoring with alerting
- Test failover and recovery procedures
- Keep headroom above all limits
7. Performance Baseline
Establish baseline metrics for comparison:
-- Create baseline queryCREATE TABLE baseline_metrics ASSELECT 'p95_query_latency' AS metric, PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) AS value, NOW() AS recorded_atFROM system.query_logWHERE timestamp > NOW() - INTERVAL '7 days';8. Graceful Degradation
Design for when limits are approached:
// Implement pressure-aware logicif memory_manager.get_pressure_level() == PressureLevel::High { // Reduce cache sizes cache.evict_cold_entries();
// Increase query timeouts adjust_timeouts_for_load();
// Apply backpressure to clients apply_rate_limiting();}Troubleshooting Common Issues
Issue: “Too Many Connections”
Root Cause: Connection limit exceeded
Solution:
- Check current connection count:
SELECT COUNT(*) FROM system.connections; - Close idle connections:
ALTER SYSTEM SET idle_timeout = 300; - Implement connection pooling on application side
- Increase limit if justified:
ALTER SYSTEM SET max_connections = 10000;
Issue: Out of Memory Errors
Root Cause: Query memory or system memory exhausted
Solution:
- Check query size: Monitor
system.active_queriesfor memory usage - Enable temp spill:
ALTER SYSTEM SET enable_temp_spill = true; - Reduce buffer pool if needed
- Add more RAM or upgrade instance
Issue: Slow Query Performance
Root Cause: Insufficient resources or poor index design
Solution:
- Check CPU utilization: Should be >70% if bottlenecked
- Review buffer pool hit rate: Target >90%
- Analyze slow queries:
EXPLAIN ANALYZE SELECT ... - Create missing indexes
- Consider query rewrite
Issue: File Descriptor Exhaustion
Root Cause: Too many open files
Solution:
- Check FD usage:
lsof | wc -l - Increase OS limit:
ulimit -n 65536 - Close unused connections
- Reduce concurrent compactions
For more information:
- Configuration:
/home/app/HeliosDB/heliosdb-common/src/config.rs - Resource Management:
/home/app/HeliosDB/docs/quick-starts/performance/RESOURCE_LIMITS_QUICK_REFERENCE.md - Resource Leak Prevention:
/home/app/HeliosDB/docs/quick-starts/performance/RESOURCE_LEAK_PREVENTION_QUICK_REFERENCE.md
Last Updated: 2025-12-30 Document Type: Performance Quick Start Reference Audience: DevOps, Database Administrators, Application Architects