Skip to content

HeliosDB Resource Limits Reference

HeliosDB Resource Limits Reference

A comprehensive guide to understanding, configuring, and managing resource limits in HeliosDB production deployments. This reference covers system-level limits, database-level limits, capacity planning, and optimization strategies for different workloads.

Table of Contents

  1. Quick Reference Tables
  2. System-Level Resource Limits
  3. Database-Level Resource Limits
  4. Configuration Examples
  5. Capacity Planning Guide
  6. Monitoring Resource Usage
  7. Workload-Specific Tuning
  8. Best Practices

Quick Reference Tables

System-Level Limits Summary

ResourceDefaultSmall DeploymentMedium DeploymentLarge DeploymentImpact
Memory8GB4GB16GB64GB+Query performance, Cache hit rate
Max Connections10,0005005,00020,000+Concurrent users, Connection overhead
Connections per User1001050200Application concurrency
File Descriptors10,0002,50025,00065,000+Parallel I/O operations
CPU CoresAuto-detect2-48-1632+Parallelism, Query throughput
Disk Space500GB100GB1TB10TB+Data retention, Compaction overhead

Database-Level Limits Summary

ResourceDefaultMaxConfigurableImpact
Query Timeout30 seconds5 minutesPer-queryQuery response time
Query Memory Limit1GBConfigurableYesLarge joins, aggregations
Connection Memory512MBConfigurableYesConnection overhead
Total Query Memory8GBConfigurableYesConcurrent query capacity
Max Result Rows1,000,000ConfigurableYesMemory usage, Client buffering
Statement Complexity Limit10,000ConfigurableYesOptimizer performance
Transaction Size Limit2GBConfigurableYesMemory, Log files
Index Size LimitUnlimitedConfigurableYesDisk space, Build time

System-Level Resource Limits

Memory Limits

Buffer Pool Configuration

The buffer pool manages in-memory storage of data blocks and caches query results.

Default Configuration:

  • Buffer Pool Size: 50% of total system memory (4GB on 8GB system)
  • Page Size: 16KB
  • Max Pages: Dynamic allocation
  • Eviction Policy: LRU (Least Recently Used)

Recommended Values:

# Small Deployment (4GB RAM total)
[storage]
buffer_pool_mb = 2048 # 2GB
page_size_kb = 16
eviction_policy = "lru"
# Medium Deployment (16GB RAM total)
[storage]
buffer_pool_mb = 8192 # 8GB
page_size_kb = 16
eviction_policy = "lru"
# Large Deployment (64GB+ RAM total)
[storage]
buffer_pool_mb = 32768 # 32GB
page_size_kb = 16
eviction_policy = "arc" # Adaptive Replacement Cache

Performance Impact:

  • Too Small (<2GB): High disk I/O, reduced cache hit rates, slower queries
  • Too Large (>80% RAM): OOM errors, system swapping, unstable performance
  • Optimal: 60-75% of available RAM, cache hit rate >90%

Warning Signs:

  • Cache hit rate drops below 70%
  • Disk I/O utilization >80%
  • Page eviction rate increases significantly
  • Query latency increases unexpectedly

Working Memory

Memory allocated for query execution (sorting, hashing, temporary tables).

Default Configuration:

  • Per Query: 1GB
  • Per Connection: 512MB
  • Total Limit: 8GB

Recommended Values:

WorkloadPer QueryPer ConnectionRecommended Total
OLTP256MB128MB2GB
OLAP2-4GB512MB16GB
Mixed1GB256MB8GB
Edge/IoT64MB32MB512MB

Configuration:

// Rust API
use heliosdb_common::MemoryLimits;
let limits = MemoryLimits {
per_query_mb: 1024,
per_connection_mb: 512,
total_mb: 8192,
pressure_threshold: 0.80,
};

Connection Limits

Maximum Connections

Total number of simultaneous database connections.

Default: 10,000 connections

Considerations:

  • Each connection consumes ~1-2MB RAM for connection context
  • File descriptor (FD) limit must be higher than connection limit
  • Connection pooling can reduce actual connection count

Recommended Values:

# Small Deployment - 500 Users
max_connections = 500
max_connections_per_user = 10
idle_timeout_seconds = 600
# Medium Deployment - 5,000 Users
max_connections = 5000
max_connections_per_user = 50
idle_timeout_seconds = 600
# Large Deployment - 50,000 Users
max_connections = 20000 # Use connection pooling
max_connections_per_user = 200
idle_timeout_seconds = 900

Connection Memory Calculation:

Total Connection Memory = max_connections * connection_context_size
Typical: 10,000 connections * 1.5MB = ~15GB overhead

Warning Signs:

  • “Too many connections” errors
  • Connection acquisition timeout errors
  • Memory usage increases unexpectedly during high-load
  • Idle connections not being cleaned up

Per-User Connection Limits

Prevents single user/application from consuming all connections.

Default: 100 connections per user

Recommended:

  • Development: 5-10 per user
  • Testing: 20-50 per user
  • Production: 50-200 per user

File Descriptor Limits

System-Level FD Limits

Every database file, socket, and pipe requires a file descriptor.

Important: OS-level limits must exceed HeliosDB limits.

Check current limits:

Terminal window
# View soft and hard limits
ulimit -Sn # Soft limit
ulimit -Hn # Hard limit
# Recommended permanent configuration
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 131072" | sudo tee -a /etc/security/limits.conf

Database-Level Configuration:

# Small Deployment
max_open_files = 2500
file_handle_timeout_seconds = 300
# Medium Deployment
max_open_files = 25000
file_handle_timeout_seconds = 600
# Large Deployment
max_open_files = 65536
file_handle_timeout_seconds = 900

FD Consumption Breakdown:

CategoryCountNotes
Connectionsmax_connections1-2 FDs per connection
SSTable Filesmemtable_count * 55-10 FDs per compaction level
WAL (Write-Ahead Log)10-20Rotating log files
Temporary FilesVariableQuery temp files, index builds
System FDs100-200Network, pipes, internal

Calculation Example:

10,000 connections × 1.5 FDs = 15,000
1,000 SSTable files × 1 FD = 1,000
100 temporary files = 100
System overhead = 200
─────────────────────────────
Typical requirement: 16,300 FDs
Recommended limit: 25,000+ FDs

Warning Signs:

  • “Too many open files” errors
  • Query execution fails with file access errors
  • Compaction operations stall
  • Unable to open new database files

Disk Space Considerations

Data Storage Requirements

Includes table data, indexes, WAL, and compaction overhead.

Estimation Formula:

Total Disk = (Raw Data Size × Compression Factor)
+ (Indexes × Index Overhead)
+ (WAL × Log Retention)
+ (Temp Space × 25%)

Compression Factors (HeliosDB defaults):

  • Structured data: 5-10x compression
  • Numeric data: 10-100x compression (with delta/dictionary encoding)
  • Text data: 2-3x compression
  • No compression: 1x

Index Overhead:

  • B-Tree indexes: 10-20% of table size
  • HNSW vector indexes: 5-15% per index
  • Multiple indexes on same table: 50-100% total overhead

Example Calculation (Medium Deployment):

Raw Data Size: 1TB
Compression Factor: 7x
Compressed Size: 143GB
Primary Table: 143GB
B-Tree Indexes (3 indexes): 42GB (30% of table)
Vector Index (HNSW): 24GB (20% of table)
WAL (30-day retention): 15GB
Temporary Space (25%): 51GB
─────────────────────────
Total Disk Requirement: 275GB
Recommended Allocation: 400GB (with 45% headroom)

Recommended Disk Space Headroom:

  • Small deployments: 30-50% free space
  • Medium deployments: 25-30% free space
  • Large deployments: 20-25% free space
  • Production critical: Never go below 20% free

Warning Signs:

  • Disk usage >80%
  • Compaction operations slow down
  • Index creation fails
  • WAL archival delays
  • Query performance degrades

CPU Core Utilization

Parallelism Configuration

HeliosDB automatically scales to available CPU cores.

Default: Auto-detect all available cores

Configuration Options:

# Small Deployment - 2-4 cores
[compute]
max_parallel_tasks = 4
worker_threads = 4
io_threads = 2
# Medium Deployment - 8-16 cores
[compute]
max_parallel_tasks = 16
worker_threads = 16
io_threads = 4
# Large Deployment - 32+ cores
[compute]
max_parallel_tasks = 64
worker_threads = 32
io_threads = 8

Core Allocation Strategy:

System CoresQuery WorkersI/O WorkersCompaction Workers
2-4211
8-16822
32-642844
64+N-444

Consideration: Hyper-Threading

  • Logical cores = Physical cores × 2 (with HT)
  • For database workloads, use logical core count
  • Monitor CPU utilization; >90% sustained indicates bottleneck

Warning Signs:

  • CPU utilization consistently <30% (underutilization)
  • CPU utilization consistently >95% (bottleneck)
  • Query latency increases with load (insufficient parallelism)
  • Context switches too high (oversubscription)

Database-Level Resource Limits

Query Execution Limits

Query Timeout

Maximum time a query can execute before automatic cancellation.

Default: 30 seconds

Configuration:

// Global configuration
let config = QueryConfig {
timeout_ms: 30_000, // 30 seconds default
max_timeout_ms: 300_000, // 5 minute absolute max
auto_cancel: true,
};
// Per-query override
let context = QueryContext::new(user)
.with_timeout(120_000); // 2 minutes for this query
// Via SQL
SET SESSION query_timeout = 60000; -- 60 seconds

Recommended Values:

Query TypeTimeoutMax TimeoutNotes
Simple SELECT5s30sPrimary key lookup
Analytical JOIN30s300sRequires more I/O
Complex Aggregation60s600sFull scan, heavy computation
Batch ETL300s3600sLarge data movement
Streaming00Timeout not applicable

Configuration Impact:

  • Too Short: Valid queries abort, user frustration, incomplete results
  • Too Long: Resource exhaustion, cascading failures, poor user experience
  • Optimal: Balanced for workload, 90th percentile query time + 50% buffer

Query Memory Limit

Maximum memory a single query can allocate.

Default: 1GB per query

Configuration:

[query]
max_memory_mb = 1024 # 1GB default
max_memory_per_connection_mb = 512
# Override per connection
SET SESSION work_mem = '2GB';

Memory Usage Scenarios:

ScenarioTypical UsageRecommended Limit
Simple SELECT10-50MB256MB
Join 2-3 tables100-500MB1GB
Complex JOIN 5+ tables500MB-2GB4GB
Aggregation on 1B rows1-4GB8GB
Window functions500MB-2GB4GB

Out-of-Memory Behavior:

// When query exceeds memory limit:
// 1. Attempt disk spillover (temp files)
// 2. If temp space exhausted, abort query
// 3. Return MemoryExceeded error
// Configuration: Enable spillover
[query]
enable_temp_spill = true
max_temp_spill_mb = 10240 # 10GB temp space

Warning Signs:

  • Queries consistently hit memory limit
  • Disk I/O spikes during large queries
  • OOM errors in logs
  • Query performance degrades with data size growth

Result Set Size Limits

Maximum rows in query result set.

Default: 1,000,000 rows

Configuration:

[query]
max_result_rows = 1000000
# SQL override
SET SESSION result_limit = 5000000;

Impact on Different Layers:

ComponentImpactCalculation
Network TransferBandwidth usage1M rows × 100 bytes/row = 100MB
Client BufferingClient memory1M rows × 100 bytes = 100MB
Server MemoryResult storageVaries by serialization
Wire ProtocolEncoding overhead~20% additional

Recommended Values:

  • REST/HTTP API: 10,000-100,000 rows (minimize network traffic)
  • Direct connection: 100,000-1,000,000 rows
  • Batch exports: 1,000,000+ rows (implement pagination)

Statement Complexity Limits

Query Optimization Limits

Prevents optimizer from spending excessive time on complex queries.

Default: 10,000 optimization operations

Configuration:

[optimizer]
max_optimization_operations = 10000
max_join_order_permutations = 1000
max_plan_cache_entries = 10000

Complexity Indicators:

Operation Count = JOINs × CTE_count × UNION_count × subqueries

Examples:

  • Simple SELECT: 1-10 operations
  • 5-table JOIN: 100-500 operations
  • 10-table JOIN: 1000-5000 operations
  • Query with 10+ CTEs: >10,000 operations

Performance Impact:

  • Exceeded: Falls back to simple plan, suboptimal execution
  • Well-tuned: Query optimized before 500ms, good performance
  • Warning: Optimization takes >2 seconds for query <500MB data

Transaction Size Limits

Maximum Transaction Size

Total data modification per transaction.

Default: 2GB per transaction

Configuration:

[transactions]
max_transaction_size_mb = 2048
# SQL level
BEGIN TRANSACTION;
-- INSERT/UPDATE/DELETE operations
-- Total writes must be < 2GB
COMMIT;

Memory Impact Calculation:

Transaction Memory = INSERT rows × avg_row_size
+ UPDATE rows × avg_row_size
+ Index overhead

Example:

Inserting 10M rows × 500 bytes = 5GB data
Index overhead (3 indexes × 15%) = 2.25GB
Lock info = 1GB
─────────────────────────────────
Total: ~8.25GB memory required

Recommendations:

WorkloadMax SizeBatch SizeImpact
OLTP100MB1M rowsFast commits
OLAP2GB+Larger batchesHigher resource use
Streaming500MBContinuousBalanced
ETL1GBJob-specificMemory + I/O

WAL (Write-Ahead Log) Impact:

WAL Size ≈ Transaction Size × 1.2x
30-day retention: 5GB/day transaction rate = 150GB WAL

Index Size Limits

Per-Index Limits

HeliosDB supports unlimited index size by default, but consider:

Memory Impact During Index Builds:

Index Build Memory = Data Size × 2-3x
Build Time ≈ (Data Size / (Cores × 100MB/s))

Examples:

Index TypeData SizeBuild MemoryBuild Time (8 cores)
B-Tree100GB200-300GB15-25 minutes
HNSW Vector10GB (1M vectors)30-50GB30-60 minutes
Inverted Text50GB100-150GB10-20 minutes
BRIN (Block Range)100GB5-10GB2-5 minutes

Recommendation: Set advisory limits

[storage.index_limits]
max_bree_index_size_gb = 500
max_vector_index_size_gb = 100
max_text_index_size_gb = 200
# Triggers warning but doesn't block
warning_threshold_percent = 80

Warning Signs:

  • Index build runs out of memory
  • Index build takes >1 hour
  • Queries not using indexes due to size
  • Random-access performance degradation

Configuration Examples

Example 1: Small Deployment (Development/Testing)

# Small HeliosDB Configuration (4GB RAM, 4 cores)
[storage]
data_dir = "/var/lib/heliosdb/data"
memtable_size_mb = 64
sstable_size_mb = 128
buffer_pool_mb = 2048
compaction_strategy = "size_tiered"
[network]
listen_addr = "0.0.0.0:5432"
max_message_size_mb = 64
max_connections = 500
[compute]
worker_threads = 4
io_threads = 2
max_parallel_tasks = 4
[query]
timeout_ms = 30000
max_memory_mb = 512
max_result_rows = 100000
[resources]
max_connections = 500
max_connections_per_user = 10
max_open_files = 2500
idle_timeout_seconds = 300
[memory]
total_limit_mb = 3000
query_limit_mb = 512
connection_limit_mb = 256

Expected Performance:

  • Throughput: 100-500 ops/sec
  • Latency: 10-100ms p95
  • Concurrent users: 10-20
  • Max data: 10-50GB

Example 2: Medium Production Deployment

# Medium HeliosDB Configuration (16GB RAM, 16 cores)
[storage]
data_dir = "/var/lib/heliosdb/data"
memtable_size_mb = 256
sstable_size_mb = 512
buffer_pool_mb = 8192
compaction_strategy = "leveled"
compaction_level_multiplier = 10
[network]
listen_addr = "0.0.0.0:5432"
rdma_enabled = false
max_message_size_mb = 128
max_connections = 5000
[compute]
worker_threads = 16
io_threads = 4
max_parallel_tasks = 16
[query]
timeout_ms = 60000
max_memory_mb = 2048
max_result_rows = 1000000
enable_temp_spill = true
max_temp_spill_mb = 5120
[resources]
max_connections = 5000
max_connections_per_user = 50
max_open_files = 25000
idle_timeout_seconds = 600
connection_memory_mb = 512
[memory]
total_limit_mb = 12000
query_limit_mb = 2048
connection_limit_mb = 512
[autoscale]
enabled = true
min_cu = 10.0
max_cu = 100.0
target_cpu_percent = 70.0
scale_up_threshold = 80.0
scale_down_threshold = 30.0

Expected Performance:

  • Throughput: 10,000-50,000 ops/sec
  • Latency: 5-20ms p95
  • Concurrent users: 100-500
  • Max data: 500GB-2TB

Example 3: Large OLAP Deployment

# Large HeliosDB OLAP Configuration (64GB RAM, 32 cores)
[storage]
data_dir = "/var/lib/heliosdb/data"
memtable_size_mb = 1024
sstable_size_mb = 2048
buffer_pool_mb = 32768
compaction_strategy = "leveled"
compaction_level_multiplier = 20
bloom_filter_bits_per_key = 15
[network]
listen_addr = "0.0.0.0:5432"
rdma_enabled = true
max_message_size_mb = 256
max_connections = 20000
[compute]
worker_threads = 32
io_threads = 8
max_parallel_tasks = 64
simd_enabled = true
[query]
timeout_ms = 600000 # 10 minutes for OLAP
max_memory_mb = 8192
max_result_rows = 10000000
enable_temp_spill = true
max_temp_spill_mb = 51200
[resources]
max_connections = 20000
max_connections_per_user = 200
max_open_files = 65536
idle_timeout_seconds = 900
connection_memory_mb = 512
[memory]
total_limit_mb = 48000
query_limit_mb = 8192
connection_limit_mb = 512
[vector]
index_type = "hnsw"
hnsw_m = 32
hnsw_ef_construction = 400
[gpu]
enabled = true
memory_limit = 16106127360 # 15GB
device_id = 0
enable_aggregations = true
enable_joins = true

Expected Performance:

  • Throughput: 100,000+ ops/sec for analytical queries
  • Latency: 50-500ms p95 (OLAP workload)
  • Concurrent users: 1000+
  • Max data: 10TB+

Example 4: Edge/IoT Constrained Environment

# Edge/IoT HeliosDB Configuration (512MB RAM, 2 cores)
[storage]
data_dir = "/var/lib/heliosdb/data"
memtable_size_mb = 16
sstable_size_mb = 32
buffer_pool_mb = 256
compaction_strategy = "size_tiered"
gc_grace_seconds = 86400 # 1 day
[network]
listen_addr = "127.0.0.1:5432"
max_connections = 100
max_message_size_mb = 16
[compute]
worker_threads = 2
io_threads = 1
max_parallel_tasks = 2
[query]
timeout_ms = 5000 # 5 seconds
max_memory_mb = 64
max_result_rows = 10000
enable_temp_spill = false
[resources]
max_connections = 100
max_connections_per_user = 5
max_open_files = 1024
idle_timeout_seconds = 60
connection_memory_mb = 32
[memory]
total_limit_mb = 400
query_limit_mb = 64
connection_limit_mb = 32

Expected Performance:

  • Throughput: 10-100 ops/sec
  • Latency: 50-500ms p95
  • Concurrent users: 1-5
  • Max data: 1-10GB

Capacity Planning Guide

Estimating Total Resource Needs

Formula:

Required_Resources = Baseline + Workload + Safety_Margin

Step 1: Baseline Overhead

CPU: 1-2 cores (OS, background tasks)
RAM: 2-4GB (OS, system services)
Disk: 10% of total data capacity

Step 2: Workload Calculation

For concurrent queries:

RAM needed = (Concurrent_Queries × Avg_Query_Memory)
+ (Connections × Connection_Overhead)
+ (Buffer_Pool_Size)
+ Baseline

For disk requirements:

Disk needed = (Raw_Data × Compression_Factor)
+ (Indexes × Index_Overhead)
+ (WAL × Retention_Days / 365)
+ Baseline

Step 3: Safety Margin

Recommended headroom:

  • Small deployments: +50% (budget for growth, peaks)
  • Medium deployments: +30% (account for spikes)
  • Large deployments: +20% (economies of scale)
  • Production critical: +40% minimum

Capacity Planning Example

Scenario: 500GB data, 1000 concurrent users, mixed OLTP/OLAP

Step 1: Baseline

  • CPU: 2 cores
  • RAM: 3GB
  • Disk: 50GB

Step 2: Workload

Data storage:

500GB × 7x compression = 71GB compressed
Indexes (4 per table × 15%) = 20GB
WAL 30-day retention = 5GB
Temp space = 18GB (25% of data)
─────────────────────
Total disk: 164GB

Concurrent query memory:

1000 users × 50MB avg connection = 50GB
Concurrent queries (20% of users):
200 queries × 500MB avg = 100GB
Buffer pool = 16GB
─────────────────────
Total RAM: ~166GB (impractical)

Solution: Connection pooling

1000 users → 5000 pooled connections
Use 500 actual DB connections
500 × 2MB = 1GB overhead
200 concurrent queries × 500MB = 100GB
Buffer pool = 16GB
────────────────────
Total RAM: 117GB (still high)

Alternative: Query optimization

Reduce concurrent OLAP to 10 max (queue excess)
10 × 2GB (larger) = 20GB
500 users × 20MB (smaller) = 10GB
Buffer pool = 16GB
────────────────────
Total RAM: 46GB

Final Capacity Recommendation:

RAM: 64GB (headroom for future growth)
CPU: 32 cores (support parallelism)
Disk: 250GB (network RAID for redundancy)
Network: 10Gbps (support concurrent users)

Growth Projections

Plan for growth using this formula:

Year N = Current Size × (1 + Annual Growth Rate)^N

Example: 500GB current, 30% annual growth

YearData SizeDisk AllocationRAM NeededCPU Cores
0500GB250GB32GB16
1650GB325GB40GB20
2845GB425GB52GB24
31.1TB550GB68GB32
41.4TB700GB88GB40
51.8TB900GB114GB48

Recommendation: Plan major upgrades every 18-24 months to stay ahead of curve.

Monitoring Resource Usage

Key Metrics to Track

Memory Metrics

-- Query memory usage
SELECT
query_id,
user_id,
memory_used_mb,
memory_limit_mb,
(memory_used_mb::float / memory_limit_mb) AS usage_percent,
CASE
WHEN memory_used_mb::float / memory_limit_mb > 0.90 THEN 'CRITICAL'
WHEN memory_used_mb::float / memory_limit_mb > 0.75 THEN 'HIGH'
WHEN memory_used_mb::float / memory_limit_mb > 0.50 THEN 'MEDIUM'
ELSE 'LOW'
END AS pressure_level
FROM system.active_queries
ORDER BY memory_used_mb DESC;
-- System memory pressure
SELECT
buffer_pool_usage_percent,
temp_spill_mb,
eviction_rate,
cache_hit_rate,
gc_pause_ms
FROM system.memory_stats;

Connection Metrics

-- Connection usage
SELECT
COUNT(*) AS total_connections,
COUNT(DISTINCT user_id) AS unique_users,
MAX(EXTRACT(EPOCH FROM (NOW() - last_activity))) AS max_idle_seconds,
COUNT(*) FILTER (WHERE state = 'idle') AS idle_connections,
COUNT(*) FILTER (WHERE state = 'active') AS active_connections
FROM system.connections;
-- Per-user connection limits
SELECT
user_id,
COUNT(*) AS connection_count,
max_connections_per_user,
CASE
WHEN COUNT(*) > max_connections_per_user * 0.9 THEN 'WARNING'
WHEN COUNT(*) >= max_connections_per_user THEN 'LIMIT_EXCEEDED'
ELSE 'OK'
END AS status
FROM system.connections
GROUP BY user_id, max_connections_per_user
HAVING COUNT(*) > max_connections_per_user * 0.7
ORDER BY connection_count DESC;

Disk I/O Metrics

-- File descriptor usage
SELECT
open_files,
max_open_files,
(open_files::float / max_open_files) * 100 AS usage_percent,
CASE
WHEN open_files::float / max_open_files > 0.90 THEN 'CRITICAL'
WHEN open_files::float / max_open_files > 0.75 THEN 'HIGH'
ELSE 'OK'
END AS status
FROM system.resource_limits;
-- I/O statistics
SELECT
reads_per_second,
writes_per_second,
read_latency_ms,
write_latency_ms,
cache_miss_rate
FROM system.io_stats;

CPU Metrics

-- Query execution distribution
SELECT
query_type,
COUNT(*) AS query_count,
AVG(execution_time_ms) AS avg_time_ms,
MAX(execution_time_ms) AS max_time_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) AS p95_time_ms
FROM system.query_log
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY query_type
ORDER BY query_count DESC;
-- CPU utilization
SELECT
worker_id,
cpu_percent,
active_queries,
queued_queries,
context_switches_per_sec
FROM system.worker_stats;

Alerting Thresholds

MetricWarningCriticalAction
Memory Usage>80%>95%Kill idle queries, reduce cache
Connections>80% limitAt limitClose idle connections
File Descriptors>80% limit>95% limitReduce open files
Query Memory>1GB>LimitSpill to disk or reject
CPU Usage>85% sustained>98%Throttle clients, add resources
Disk Free<20%<10%Archive data, add storage
Compaction Lag>1 hour>6 hoursIncrease compaction threads

Monitoring Setup Example

// HeliosDB monitoring example
use heliosdb_common::ResourceLimitsManager;
use std::time::Duration;
async fn monitor_resources(manager: &ResourceLimitsManager) {
let mut interval = tokio::time::interval(Duration::from_secs(60));
loop {
interval.tick().await;
let stats = manager.get_statistics();
let health = manager.health_check().await;
// Log metrics
info!("Memory: {}%", stats.memory_usage_percent);
info!("Connections: {}/{}",
stats.active_connections,
stats.max_connections);
info!("Health: {}", health.summary());
// Alert on thresholds
if stats.memory_usage_percent > 90.0 {
warn!("Memory pressure critical!");
}
if stats.active_connections as f64 / stats.max_connections as f64 > 0.9 {
warn!("Connection limit near threshold");
}
if !health.is_healthy {
error!("System health degraded: {}", health.summary());
}
}
}

Workload-Specific Tuning

OLTP Optimization

Characteristics:

  • Small, frequent queries
  • Low latency requirement (<100ms p95)
  • High throughput (1000s ops/sec)
  • Multi-user contention

Configuration:

[query]
timeout_ms = 10000 # 10 second max
max_memory_mb = 256 # Small queries
max_result_rows = 10000
[resources]
max_connections = 5000 # Many concurrent users
max_connections_per_user = 50
[compute]
worker_threads = <num_cores> # Use all cores
io_threads = <num_cores / 4>
[cache]
buffer_pool_ratio = 0.60 # 60% of RAM for cache
lru_eviction_enabled = true

Query Optimization:

-- Create covering indexes for common queries
CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at)
INCLUDE (order_total);
-- Use prepared statements
PREPARE get_user_orders (INT) AS
SELECT * FROM orders WHERE user_id = $1;
-- Enable query caching for read-heavy workload
SET SESSION query_cache_enabled = true;

Monitoring:

-- Monitor slow queries
SELECT
query,
COUNT(*) AS executions,
AVG(execution_time_ms) AS avg_ms,
MAX(execution_time_ms) AS max_ms
FROM system.query_log
WHERE execution_time_ms > 100
GROUP BY query
ORDER BY max_ms DESC
LIMIT 20;

OLAP Optimization

Characteristics:

  • Large, complex queries
  • Higher latency acceptable (seconds to minutes)
  • Moderate throughput requirement
  • Batch processing

Configuration:

[query]
timeout_ms = 600000 # 10 minutes for large scans
max_memory_mb = 8192 # Large aggregate allocations
max_result_rows = 10000000 # Bulk exports
[resources]
max_connections = 100 # Fewer concurrent users
max_connections_per_user = 5
[compute]
worker_threads = <num_cores>
io_threads = <num_cores / 2>
[cache]
buffer_pool_ratio = 0.70 # More RAM for scanning
columnar_cache_enabled = true
[vector]
enable_vectorization = true
hnsw_ef_construction = 400 # Better search quality

Query Optimization:

-- Partitioning for faster scans
CREATE TABLE events (
id INT,
timestamp TIMESTAMP,
data TEXT
) PARTITION BY DATE (timestamp);
-- Materialized views for common aggregations
CREATE MATERIALIZED VIEW daily_stats AS
SELECT
DATE(timestamp) AS date,
event_type,
COUNT(*) AS count,
AVG(value) AS avg_value
FROM events
GROUP BY DATE(timestamp), event_type;
-- Use column-oriented storage for wide tables
CREATE TABLE analytics_data (
id INT,
timestamp TIMESTAMP,
event_type TEXT,
user_id INT,
-- 50+ columns
) WITH (storage_format = 'columnar');

Monitoring:

-- Monitor long-running queries
SELECT
query_id,
query,
EXTRACT(EPOCH FROM (NOW() - start_time)) AS elapsed_seconds,
memory_used_mb,
rows_scanned,
rows_returned
FROM system.active_queries
WHERE EXTRACT(EPOCH FROM (NOW() - start_time)) > 30
ORDER BY elapsed_seconds DESC;

Mixed Workload Balancing

For systems handling both OLTP and OLAP:

Resource Isolation:

[workload_isolation]
# Separate thread pools
oltp_threads = <num_cores / 2>
olap_threads = <num_cores / 2>
# Separate memory pools
oltp_memory_mb = 8192
olap_memory_mb = 32768
# Query routing
oltp_max_query_time_ms = 5000
olap_max_query_time_ms = 600000

Query Classification:

-- Automatic routing based on characteristics
SELECT
CASE
WHEN execution_time_ms < 100 THEN 'OLTP'
WHEN execution_time_ms < 5000 THEN 'OLTP_BOUNDARY'
WHEN execution_time_ms < 60000 THEN 'MIXED'
ELSE 'OLAP'
END AS workload_type,
COUNT(*) AS frequency
FROM system.query_log
GROUP BY workload_type;

Concurrency Control:

// Implement query queuing for balance
if is_olap_query(&query) && olap_queue.len() > 10 {
// Throttle aggressive OLAP clients
apply_backpressure();
} else if is_oltp_query(&query) {
// Prioritize OLTP
prioritize_execution();
}

Best Practices

1. Right-Sizing for Your Workload

  • Measure before deploying: Run load tests with representative data
  • Use historical data: Analyze past usage patterns
  • Plan for growth: Add 25-50% capacity headroom
  • Regular reviews: Adjust configuration quarterly based on actual usage

2. Connection Management

// Use connection pooling
use sqlx::postgres::PgPoolOptions;
let pool = PgPoolOptions::new()
.max_connections(100)
.min_connections(10)
.acquire_timeout(Duration::from_secs(5))
.connect(&database_url)
.await?;
// Reuse connections
let connection = pool.acquire().await?;

3. Query Optimization

  • Use EXPLAIN ANALYZE to understand query plans
  • Create appropriate indexes based on query patterns
  • Denormalize when necessary for read-heavy workloads
  • Monitor query performance continuously

4. Memory Management

  • Set realistic memory limits based on available resources
  • Enable spill to disk for temporary large datasets
  • Monitor cache hit rates (target >85%)
  • Regularly clear temporary files

5. Disk Space Management

Terminal window
# Monitor disk usage
df -h /var/lib/heliosdb
# Archive old data
heliosdb-cli archive --table events --before 2024-01-01
# Cleanup temporary files
rm -rf /var/lib/heliosdb/temp/*

6. Production Hardening

  • Set OS-level limits for file descriptors
  • Enable resource accounting for audit trails
  • Implement monitoring with alerting
  • Test failover and recovery procedures
  • Keep headroom above all limits

7. Performance Baseline

Establish baseline metrics for comparison:

-- Create baseline query
CREATE TABLE baseline_metrics AS
SELECT
'p95_query_latency' AS metric,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) AS value,
NOW() AS recorded_at
FROM system.query_log
WHERE timestamp > NOW() - INTERVAL '7 days';

8. Graceful Degradation

Design for when limits are approached:

// Implement pressure-aware logic
if memory_manager.get_pressure_level() == PressureLevel::High {
// Reduce cache sizes
cache.evict_cold_entries();
// Increase query timeouts
adjust_timeouts_for_load();
// Apply backpressure to clients
apply_rate_limiting();
}

Troubleshooting Common Issues

Issue: “Too Many Connections”

Root Cause: Connection limit exceeded

Solution:

  1. Check current connection count: SELECT COUNT(*) FROM system.connections;
  2. Close idle connections: ALTER SYSTEM SET idle_timeout = 300;
  3. Implement connection pooling on application side
  4. Increase limit if justified: ALTER SYSTEM SET max_connections = 10000;

Issue: Out of Memory Errors

Root Cause: Query memory or system memory exhausted

Solution:

  1. Check query size: Monitor system.active_queries for memory usage
  2. Enable temp spill: ALTER SYSTEM SET enable_temp_spill = true;
  3. Reduce buffer pool if needed
  4. Add more RAM or upgrade instance

Issue: Slow Query Performance

Root Cause: Insufficient resources or poor index design

Solution:

  1. Check CPU utilization: Should be >70% if bottlenecked
  2. Review buffer pool hit rate: Target >90%
  3. Analyze slow queries: EXPLAIN ANALYZE SELECT ...
  4. Create missing indexes
  5. Consider query rewrite

Issue: File Descriptor Exhaustion

Root Cause: Too many open files

Solution:

  1. Check FD usage: lsof | wc -l
  2. Increase OS limit: ulimit -n 65536
  3. Close unused connections
  4. Reduce concurrent compactions

For more information:

  • Configuration: /home/app/HeliosDB/heliosdb-common/src/config.rs
  • Resource Management: /home/app/HeliosDB/docs/quick-starts/performance/RESOURCE_LIMITS_QUICK_REFERENCE.md
  • Resource Leak Prevention: /home/app/HeliosDB/docs/quick-starts/performance/RESOURCE_LEAK_PREVENTION_QUICK_REFERENCE.md

Last Updated: 2025-12-30 Document Type: Performance Quick Start Reference Audience: DevOps, Database Administrators, Application Architects