Skip to content

HeliosDB Performance Tuning Guide

HeliosDB Performance Tuning Guide

Document Version: 1.0 Created: November 24, 2025 Last Updated: November 24, 2025 Target Audience: Database Administrators, Performance Engineers, DevOps Applies To: HeliosDB v6.0+, v7.0 (in development)


Table of Contents

  1. Introduction
  2. Quick Start Performance Checklist
  3. GPU Acceleration Tuning
  4. Query Optimization
  5. Connection Pool Configuration
  6. Memory Management
  7. Storage Layer Tuning
  8. Protocol-Specific Optimization
  9. Caching Strategies
  10. Monitoring & Diagnostics
  11. Capacity Planning
  12. Troubleshooting Performance Issues

1. Introduction

1.1 Purpose

This guide provides comprehensive instructions for optimizing HeliosDB performance across various workload types: OLTP, OLAP (analytical), and HTAP (hybrid transactional/analytical processing).

1.2 Performance Goals

Workload TypeTarget Metrics
OLTP<1ms P50 latency, <5ms P99 latency, 50K-200K ops/sec
Analytical (with GPU)10-30x speedup vs CPU, <100ms for 1M row aggregation
Analytical (CPU only)<1s for 1M row aggregation, <10s for complex joins
Mixed (HTAP)OLTP metrics + 5-20x analytical speedup

1.3 Hardware Assumptions

Minimum Production:

  • 16 CPU cores
  • 64 GB RAM
  • 1 TB NVMe SSD (50K+ IOPS)
  • 10 Gbps network
  • Optional: 1x GPU (16+ GB VRAM)

Recommended Production:

  • 32-64 CPU cores
  • 128-256 GB RAM
  • 2-10 TB NVMe SSD (100K+ IOPS)
  • 25 Gbps network
  • 1-2x GPU (24-48 GB VRAM)

2. Quick Start Performance Checklist

2.1 Essential Configuration (First 5 Minutes)

config/heliosdb.toml
[server]
max_connections = 1000
worker_threads = 32 # Equal to CPU core count
[memory]
total_memory_limit = "128GB"
query_memory_limit = "4GB"
cache_memory_limit = "32GB"
[storage]
data_dir = "/mnt/nvme/heliosdb"
wal_dir = "/mnt/nvme/heliosdb/wal"
compaction_threads = 4
[gpu]
enabled = true
device_ids = [0] # Use GPU 0
min_rows_for_gpu = 100000 # 100K row threshold

Apply configuration:

Terminal window
sudo systemctl restart heliosdb
# or
./heliosdb-server --config config/heliosdb.toml

2.2 Verify Performance Baseline

Terminal window
# Run built-in benchmark
heliosdb-cli benchmark --quick
# Expected results:
# - OLTP read: 30K-50K ops/sec
# - OLTP write: 10K-20K ops/sec
# - Analytical (GPU): 5-15x speedup

2.3 Common Quick Wins

ActionExpected ImprovementEffort
Enable GPU acceleration5-30x for analytical queries5 minutes
Increase block cache size2-5x random read performance5 minutes
Tune connection pool20-50% connection overhead reduction10 minutes
Enable query caching10-100x for repeated queries5 minutes
Add appropriate indexes10-1000x for filtered queries15-60 minutes

3. GPU Acceleration Tuning

3.1 When to Use GPU

** Use GPU for:**

  • Aggregations (SUM, AVG, COUNT, GROUP BY)
  • Large joins (>100K rows)
  • Table scans with filters
  • Window functions
  • Vector similarity search
  • Large datasets (>100K rows)

❌ Avoid GPU for:

  • Point lookups (single row)
  • Small queries (<10K rows)
  • Sequential operations (ORDER BY without aggregation)
  • Transactions (OLTP workload)
  • Short queries (<10ms total time)

Reason: GPU has transfer overhead (~1-10ms). Only worthwhile if computation savings exceed transfer cost.

3.2 GPU Configuration

[gpu]
# Enable GPU acceleration
enabled = true
# GPU device IDs (0-indexed)
device_ids = [0, 1] # Use GPUs 0 and 1
# Minimum rows to use GPU (avoid overhead for small queries)
min_rows_for_gpu = 100000 # 100K rows (default: 50K)
# GPU memory allocation
pool_size = "4GB" # Per-query GPU memory limit
cache_size = "2GB" # Intermediate result cache
max_transfer_size = "1GB" # Max single transfer
# Cost model tuning
speedup_threshold = 3.0 # Use GPU if estimated >3x faster
confidence_threshold = 0.7 # Confidence in cost estimate (0-1)
# Advanced settings
enable_async_transfers = true # Overlap CPU/GPU operations
enable_kernel_fusion = true # Combine multiple operations
enable_auto_tuning = true # Adaptive parameter tuning

3.3 Query Hints for GPU

-- Force GPU execution
SELECT /*+ USE_GPU */ SUM(price) FROM orders WHERE date > '2024-01-01';
-- Force CPU execution
SELECT /*+ NO_GPU */ * FROM small_table;
-- Specify GPU device
SELECT /*+ GPU_DEVICE(1) */ AVG(value) FROM large_table;

3.4 GPU Performance Monitoring

-- Check GPU usage
SELECT * FROM system.gpu_stats;
-- Expected output:
-- device_id | utilization | memory_used | memory_total | queries_executed
-- 0 | 78% | 12GB | 24GB | 15234
-- Query-level GPU metrics
EXPLAIN (ANALYZE, GPU_STATS)
SELECT SUM(value) FROM large_table;
-- Sample output:
-- GPU Aggregation: 83ms (12.1x speedup vs CPU)
-- Transfer time: 15ms (H2D: 8ms, D2H: 7ms)
-- Kernel time: 68ms
-- CPU baseline: 1000ms

3.5 GPU Troubleshooting

Problem: GPU not being used despite large dataset

Solutions:

  1. Check min_rows_for_gpu threshold
  2. Verify cost model prediction: EXPLAIN (GPU_COST) SELECT ...
  3. Check GPU memory availability: SELECT * FROM system.gpu_stats
  4. Use query hint to force GPU: /*+ USE_GPU */

Problem: Slower with GPU than CPU

Causes & Solutions:

  1. Small dataset: Increase min_rows_for_gpu
  2. Transfer overhead: Enable async_transfers, increase batch size
  3. Low GPU utilization: Check for sequential operations, consider CPU
  4. Memory thrashing: Reduce pool_size, reduce concurrent GPU queries

4. Query Optimization

4.1 Query Best Practices

4.1.1 Column Pruning

-- ❌ Bad: Select all columns
SELECT * FROM large_table WHERE condition;
-- Good: Select only needed columns
SELECT id, name, price FROM large_table WHERE condition;
-- Impact: 3-10x less data transfer, faster parsing, lower memory

4.1.2 Predicate Pushdown

-- ❌ Bad: Filter after join
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.date > '2024-01-01';
-- Good: Filter before join
SELECT * FROM orders o
WHERE o.date > '2024-01-01'
JOIN customers c ON o.customer_id = c.id;
-- Impact: 10-100x fewer rows in join, significantly faster

4.1.3 Index Usage

-- Create index for frequently filtered columns
CREATE INDEX idx_orders_date ON orders(date);
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Verify index usage
EXPLAIN SELECT * FROM orders WHERE date > '2024-01-01';
-- Should show: "Index Scan using idx_orders_date"
-- Composite index for multi-column filters
CREATE INDEX idx_orders_date_status ON orders(date, status);

4.1.4 Avoid SELECT *

-- ❌ Bad: Unknown columns, full table scan
SELECT * FROM large_table;
-- Good: Explicit columns, optimized execution
SELECT id, name, value FROM large_table;

4.1.5 Use LIMIT for Testing

-- Testing queries on large tables
SELECT * FROM orders LIMIT 100;
-- Production query without LIMIT
SELECT * FROM orders WHERE date > '2024-01-01';

4.2 Using EXPLAIN

-- Basic query plan
EXPLAIN SELECT * FROM orders WHERE date > '2024-01-01';
-- With execution statistics
EXPLAIN ANALYZE SELECT * FROM orders WHERE date > '2024-01-01';
-- With GPU routing decision
EXPLAIN (ANALYZE, GPU_COST)
SELECT SUM(amount) FROM orders WHERE date > '2024-01-01';
-- With detailed timing
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.date > '2024-01-01';

Interpreting EXPLAIN output:

Query Plan:
-> Aggregate (cost=1000..1100 rows=1)
GPU: Enabled (estimated 10.5x speedup)
-> Seq Scan on orders (cost=0..900 rows=100000)
Filter: (date > '2024-01-01')
GPU Transfer: 150MB (estimated 15ms)
GPU Kernel: 68ms
CPU Baseline: 1000ms
Execution Time: 83ms

Key metrics:

  • Cost: Optimizer’s estimate (lower is better)
  • Rows: Estimated row count at each stage
  • GPU: GPU routing decision and estimates
  • Execution Time: Actual query runtime

4.3 Query Optimization Patterns

Pattern 1: Aggregation Optimization

-- ❌ Inefficient: Multiple passes
SELECT
SUM(amount) as total,
AVG(amount) as average,
COUNT(*) as count
FROM orders
WHERE date > '2024-01-01';
SELECT MAX(amount) FROM orders WHERE date > '2024-01-01';
SELECT MIN(amount) FROM orders WHERE date > '2024-01-01';
-- Efficient: Single pass
SELECT
SUM(amount) as total,
AVG(amount) as average,
COUNT(*) as count,
MAX(amount) as max_amount,
MIN(amount) as min_amount
FROM orders
WHERE date > '2024-01-01';

Pattern 2: Join Optimization

-- ❌ Inefficient: Cartesian product
SELECT * FROM orders o, customers c
WHERE o.customer_id = c.id;
-- Efficient: Explicit join
SELECT * FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
-- Optimal: Join on indexed columns
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_customers_id ON customers(id);
SELECT * FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

Pattern 3: Subquery Optimization

-- ❌ Inefficient: Correlated subquery
SELECT * FROM orders o
WHERE amount > (SELECT AVG(amount) FROM orders WHERE customer_id = o.customer_id);
-- Efficient: Join with aggregation
SELECT o.* FROM orders o
JOIN (
SELECT customer_id, AVG(amount) as avg_amount
FROM orders
GROUP BY customer_id
) avg_orders ON o.customer_id = avg_orders.customer_id
WHERE o.amount > avg_orders.avg_amount;

4.4 Query Hints

HeliosDB supports query hints to control optimizer behavior:

-- Force GPU execution
SELECT /*+ USE_GPU */ SUM(amount) FROM orders;
-- Force CPU execution
SELECT /*+ NO_GPU */ * FROM small_table;
-- Force specific join algorithm
SELECT /*+ HASH_JOIN(orders, customers) */
o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Force index usage
SELECT /*+ INDEX(orders, idx_orders_date) */
* FROM orders WHERE date > '2024-01-01';
-- Disable query cache
SELECT /*+ NO_CACHE */ * FROM real_time_table;
-- Set query timeout
SELECT /*+ TIMEOUT(30s) */ * FROM slow_query;

5. Connection Pool Configuration

5.1 Connection Pool Sizing

Formula:

connections_per_node = (CPU_cores × 2) to (CPU_cores × 4)

Examples:

  • 16 cores → 32-64 connections
  • 32 cores → 64-128 connections
  • 64 cores → 128-256 connections

5.2 Configuration

[server]
# Maximum total connections
max_connections = 1000
# Connection pool per protocol
[protocols.postgres]
pool_size = 100
min_idle = 10
max_idle = 50
connection_timeout = "30s"
idle_timeout = "600s"
max_lifetime = "1h"
[protocols.mongodb]
pool_size = 200
min_idle = 20
max_idle = 100
connection_timeout = "30s"
idle_timeout = "300s"
max_lifetime = "1h"
[protocols.oracle]
pool_size = 100
min_idle = 10
max_idle = 50
connection_timeout = "30s"
idle_timeout = "600s"
max_lifetime = "1h"

5.3 Connection Pool Monitoring

-- Check connection pool stats
SELECT * FROM system.connection_pools;
-- Expected output:
-- protocol | total | active | idle | waiting | avg_wait_time
-- postgres | 100 | 45 | 55 | 0 | 2ms
-- mongodb | 200 | 120 | 80 | 5 | 15ms
-- oracle | 100 | 30 | 70 | 0 | 1ms
-- Identify connection pool bottlenecks
SELECT protocol, AVG(wait_time_ms) as avg_wait
FROM system.connection_wait_times
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY protocol
HAVING AVG(wait_time_ms) > 10;

5.4 Tuning Guidelines

Symptoms of under-provisioned pool:

  • High connection wait times (>10ms)
  • “Max connections reached” errors
  • High connection queue depth

Solutions:

  1. Increase pool_size
  2. Increase max_connections
  3. Reduce idle_timeout to free idle connections faster
  4. Add more nodes (scale horizontally)

Symptoms of over-provisioned pool:

  • High idle connection count (>80%)
  • Memory pressure from idle connections
  • Context switching overhead

Solutions:

  1. Reduce pool_size
  2. Reduce max_idle
  3. Increase idle_timeout to keep useful connections

6. Memory Management

6.1 Memory Configuration

[memory]
# Total memory limit for HeliosDB
total_memory_limit = "128GB"
# Per-query memory limit
query_memory_limit = "4GB"
# Result set memory limit (before streaming)
result_memory_limit = "2GB"
# Enable streaming for large results
enable_streaming = true
# Memory pools
[memory.pools]
query_cache = "8GB"
parser_cache = "1GB"
block_cache = "32GB"
connection_pool = "4GB"
gpu_pool = "4GB"
# Memory pressure handling
[memory.pressure]
enable_monitoring = true
warning_threshold = 0.8 # 80% memory usage
critical_threshold = 0.9 # 90% memory usage
oom_kill_queries = true # Kill queries exceeding limits

6.2 Cache Configuration

[cache]
# Query result cache
[cache.query_results]
enabled = true
size = "8GB"
ttl = "1h"
eviction_policy = "LRU"
# Parser cache (AST caching)
[cache.parser]
enabled = true
size = "1GB"
max_entries = 100000
ttl = "24h"
# Block cache (storage layer)
[cache.blocks]
enabled = true
size = "32GB"
block_size = "16KB"
eviction_policy = "LRU"
# GPU result cache
[cache.gpu_results]
enabled = true
size = "2GB"
ttl = "10m"

6.3 Memory Monitoring

-- Overall memory usage
SELECT * FROM system.memory_stats;
-- Expected output:
-- category | used | limit | percent
-- total | 96GB | 128GB | 75%
-- query_cache | 6GB | 8GB | 75%
-- block_cache | 28GB | 32GB | 87%
-- gpu_pool | 3GB | 4GB | 75%
-- connections | 2GB | 4GB | 50%
-- Per-query memory usage
SELECT
query_id,
memory_used,
execution_time_ms,
status
FROM system.active_queries
ORDER BY memory_used DESC
LIMIT 10;
-- Memory pressure events
SELECT * FROM system.memory_pressure_events
WHERE timestamp > NOW() - INTERVAL '1 hour';

6.4 Memory Optimization Tips

Tip 1: Use streaming for large results

-- Enable streaming for specific query
SELECT /*+ STREAMING */ * FROM large_table;
-- Configure streaming threshold
SET streaming_threshold = '100MB';

Tip 2: Reduce result set size

-- Use pagination
SELECT * FROM orders LIMIT 1000 OFFSET 0;
-- Use column pruning
SELECT id, name FROM large_table; -- Not SELECT *

Tip 3: Increase block cache for read-heavy workloads

[cache.blocks]
size = "64GB" # Increase from default 32GB

Tip 4: Monitor and kill memory-intensive queries

-- Kill query exceeding memory limit
SELECT kill_query('query-id-123');
-- Set memory limit for session
SET query_memory_limit = '2GB';

7. Storage Layer Tuning

7.1 LSM Tree Configuration

[storage]
# LSM tree settings
compaction_style = "leveled" # or "universal", "tiered"
# L0 (MemTable flush target)
l0_file_num_compaction_trigger = 4 # Trigger compaction at 4 L0 files
l0_slowdown_writes_trigger = 8 # Slow writes at 8 L0 files
l0_stop_writes_trigger = 12 # Stop writes at 12 L0 files
# Compaction
max_background_compactions = 4
max_background_flushes = 2
compaction_threads = 4
# SSTable settings
target_file_size_base = "64MB"
max_bytes_for_level_base = "256MB"
target_file_size_multiplier = 10
# Bloom filters
bloom_filter_bits_per_key = 10 # 1% false positive rate
bloom_filter_enabled = true
# Block cache
block_cache_size = "32GB"
block_size = "16KB"
cache_index_and_filter_blocks = true

7.2 Write-Ahead Log (WAL) Tuning

[storage.wal]
# WAL settings
wal_dir = "/mnt/nvme/heliosdb/wal"
wal_size_limit = "4GB"
max_wal_files = 10
# Synchronization
wal_sync_mode = "fsync" # "fsync", "fdatasync", "none"
wal_sync_after_write = true
wal_group_commit_delay = "1ms" # Batch commits
# Recovery
wal_recovery_mode = "point_in_time"
wal_archive_enabled = true
wal_archive_dir = "/backup/wal"

7.3 Compaction Strategies

Leveled Compaction (Default, recommended for most workloads)

  • Best for read-heavy workloads
  • Low read amplification (1-2 reads per key)
  • Higher write amplification (10-30x)
  • Predictable performance

Universal Compaction (Write-optimized)

  • Best for write-heavy workloads
  • Low write amplification (2-5x)
  • Higher read amplification (5-10 reads per key)
  • Bursty compaction

Tiered Compaction (Time-series optimized)

  • Best for time-series data
  • Efficient for TTL-based data
  • Good for append-only workloads
[storage]
# Example: Write-optimized configuration
compaction_style = "universal"
max_background_compactions = 8
compaction_threads = 8

7.4 Storage Monitoring

-- Storage statistics
SELECT * FROM system.storage_stats;
-- Expected output:
-- metric | value
-- total_size | 2.5TB
-- data_size | 2.0TB
-- index_size | 300GB
-- wal_size | 4GB
-- compaction_pending | 20GB
-- read_amplification | 1.8
-- write_amplification | 15.2
-- Compaction statistics
SELECT * FROM system.compaction_stats;
-- LSM tree structure
SELECT level, file_count, total_size
FROM system.lsm_levels;
-- Expected output:
-- level | file_count | total_size
-- 0 | 3 | 192MB
-- 1 | 12 | 768MB
-- 2 | 45 | 2.8GB
-- 3 | 120 | 12GB

7.5 Storage Optimization Tips

Tip 1: Tune compaction for workload

  • Read-heavy: Leveled compaction, aggressive compaction
  • Write-heavy: Universal compaction, delayed compaction
  • Time-series: Tiered compaction, TTL-based cleanup

Tip 2: Use appropriate bloom filter settings

# More memory, fewer false positives
bloom_filter_bits_per_key = 15 # 0.1% FPR (vs 1% default)
# Less memory, more false positives
bloom_filter_bits_per_key = 5 # 10% FPR

Tip 3: Optimize for SSD

[storage]
use_direct_io = true
use_mmap_reads = false
allow_os_buffer = false

Tip 4: Separate WAL and data on different disks

[storage]
data_dir = "/mnt/nvme0/heliosdb" # Fast SSD for data
wal_dir = "/mnt/nvme1/heliosdb/wal" # Separate SSD for WAL

8. Protocol-Specific Optimization

8.1 PostgreSQL Wire Protocol

[protocols.postgres]
# Connection settings
max_connections = 1000
pool_size = 100
connection_timeout = "30s"
# Protocol optimization
enable_binary_format = true # Binary encoding (faster)
enable_prepared_statements = true
statement_cache_size = 10000
# Result format
default_format = "binary" # "text" or "binary"
enable_compression = true
compression_threshold = "1MB"
# Performance
enable_pipeline_mode = true # Batch multiple queries
max_batch_size = 100

Best practices:

  1. Use prepared statements for repeated queries
  2. Enable binary format for large result sets
  3. Use connection pooling (PgBouncer, Pgpool)
  4. Batch inserts using COPY or multi-value INSERT
-- ❌ Inefficient: Multiple individual inserts
INSERT INTO orders VALUES (1, 'Alice', 100);
INSERT INTO orders VALUES (2, 'Bob', 200);
-- ... 10,000 more
-- Efficient: Batch insert
INSERT INTO orders VALUES
(1, 'Alice', 100),
(2, 'Bob', 200),
-- ... up to 1000 rows per batch
(1000, 'Zoe', 150);
-- Optimal: COPY for bulk loading
COPY orders FROM '/data/orders.csv' WITH (FORMAT CSV, HEADER);

8.2 MongoDB Wire Protocol

[protocols.mongodb]
# Connection settings
max_connections = 2000
pool_size = 200
connection_timeout = "30s"
# BSON optimization
enable_bson_cache = true
bson_cache_size = "2GB"
enable_zero_copy_bson = true # Avoid allocations
# Aggregation pipeline
enable_pipeline_optimization = true
max_pipeline_stages = 100
# Performance
enable_cursor_batching = true
default_batch_size = 1000
max_batch_size = 10000

Best practices:

  1. Use projection to limit returned fields
  2. Use indexes for filter conditions
  3. Optimize aggregation pipelines (push filters early)
  4. Use bulk operations for inserts/updates
// ❌ Inefficient: No projection
db.orders.find({status: "shipped"})
// Efficient: With projection
db.orders.find(
{status: "shipped"},
{_id: 1, customer: 1, total: 1}
)
// ❌ Inefficient: Aggregation without early filtering
db.orders.aggregate([
{$group: {_id: "$customer", total: {$sum: "$amount"}}},
{$match: {total: {$gt: 1000}}}
])
// Efficient: Filter early
db.orders.aggregate([
{$match: {status: "completed"}}, // Filter first
{$group: {_id: "$customer", total: {$sum: "$amount"}}},
{$match: {total: {$gt: 1000}}}
])

8.3 Oracle TNS Protocol

[protocols.oracle]
# Connection settings
max_connections = 1000
pool_size = 100
connection_timeout = "30s"
# Protocol optimization
enable_array_fetch = true
array_fetch_size = 1000
enable_prefetch = true
prefetch_rows = 100
# Oracle 23ai features
enable_property_graphs = true
enable_vector_search = true
enable_json_table = true
# Performance
enable_statement_cache = true
statement_cache_size = 5000

Best practices:

  1. Use bind variables (not string concatenation)
  2. Enable array fetch for bulk operations
  3. Use Oracle 23ai vector search for similarity queries
  4. Leverage property graphs for graph analytics
-- ❌ Inefficient: No bind variables (SQL injection risk + no cache)
SELECT * FROM orders WHERE customer_id = 'C123';
-- Efficient: With bind variables
SELECT * FROM orders WHERE customer_id = :customer_id;
-- Oracle 23ai: Vector similarity search
SELECT * FROM products
WHERE VECTOR_DISTANCE(embedding, :query_vector, COSINE) < 0.2
ORDER BY VECTOR_DISTANCE(embedding, :query_vector, COSINE)
LIMIT 10;

9. Caching Strategies

9.1 Multi-Level Cache Architecture

┌─────────────────────────────────────────┐
│ Application Layer │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ L1: Query Result Cache (8GB) │ <-- Fastest (LRU, 1h TTL)
│ - Exact match cache │
│ - Full query result caching │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ L2: Parser Cache (1GB) │ <-- Fast (24h TTL)
│ - AST caching │
│ - Parameterized query matching │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ L3: Block Cache (32GB) │ <-- Storage layer
│ - SSTable block caching │
│ - Index block caching │
└─────────────────────────────────────────┘

9.2 Query Result Caching

[cache.query_results]
enabled = true
size = "8GB"
ttl = "1h"
eviction_policy = "LRU"
# Cache key strategy
cache_key_mode = "query_text" # or "query_hash", "semantic_hash"
# Cache invalidation
invalidate_on_write = true
invalidate_on_schema_change = true
# Statistics
enable_hit_rate_tracking = true

Using query cache:

-- Automatic caching (default)
SELECT * FROM orders WHERE status = 'shipped';
-- Disable cache for specific query
SELECT /*+ NO_CACHE */ * FROM real_time_orders;
-- Force cache refresh
SELECT /*+ REFRESH_CACHE */ * FROM orders WHERE status = 'shipped';
-- Check cache statistics
SELECT * FROM system.query_cache_stats;
-- Expected output:
-- metric | value
-- size_bytes | 6.2GB
-- entries | 125,432
-- hit_rate | 78%
-- evictions | 10,234

9.3 Parser Cache

[cache.parser]
enabled = true
size = "1GB"
max_entries = 100000
ttl = "24h"
# Parameterized query matching
enable_parameterization = true
parameter_threshold = 3 # Cache if query has 3+ literals

Benefits:

  • 50-70% faster parsing for repeated query patterns
  • Reduced GC pressure (fewer AST allocations)
  • Automatic for all queries

9.4 Cache Monitoring

-- Overall cache statistics
SELECT * FROM system.cache_stats;
-- Per-cache hit rates
SELECT
cache_name,
hit_rate,
size_bytes,
entries
FROM system.cache_details;
-- Expected output:
-- cache_name | hit_rate | size_bytes | entries
-- query_results | 78% | 6.2GB | 125,432
-- parser | 85% | 850MB | 95,123
-- blocks | 92% | 29.5GB | 1,894,234
-- Cache evictions (high evictions = cache too small)
SELECT
cache_name,
evictions_per_minute
FROM system.cache_eviction_rate
WHERE timestamp > NOW() - INTERVAL '1 hour';

10. Monitoring & Diagnostics

10.1 Key Performance Metrics

-- Overall system health
SELECT * FROM system.health_check;
-- Query performance metrics
SELECT
AVG(execution_time_ms) as avg_latency,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY execution_time_ms) as p50,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY execution_time_ms) as p99
FROM system.query_log
WHERE timestamp > NOW() - INTERVAL '1 hour';
-- Throughput metrics
SELECT
COUNT(*) / 3600.0 as qps
FROM system.query_log
WHERE timestamp > NOW() - INTERVAL '1 hour';
-- Resource utilization
SELECT * FROM system.resource_stats;
-- Expected output:
-- resource | current | average | peak
-- cpu_percent | 45% | 50% | 85%
-- memory_used | 96GB | 90GB | 110GB
-- gpu_util | 78% | 65% | 95%
-- disk_iops | 12,500 | 15,000 | 45,000

10.2 Slow Query Log

[monitoring]
# Slow query logging
slow_query_threshold = "100ms"
slow_query_log_file = "/var/log/heliosdb/slow-queries.log"
enable_slow_query_log = true
# Query sampling
query_sample_rate = 0.1 # Log 10% of queries
-- View slow queries
SELECT
query_text,
execution_time_ms,
rows_examined,
timestamp
FROM system.slow_query_log
ORDER BY execution_time_ms DESC
LIMIT 10;
-- Identify frequently slow queries
SELECT
query_pattern,
COUNT(*) as count,
AVG(execution_time_ms) as avg_time
FROM system.slow_query_log
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY query_pattern
ORDER BY count DESC
LIMIT 10;

10.3 Monitoring Tools Integration

Prometheus:

[monitoring.prometheus]
enabled = true
port = 9090
metrics_path = "/metrics"

Grafana Dashboard:

  • Import HeliosDB dashboard template
  • Visualize QPS, latency, resource usage
  • Alert on performance degradation

OpenTelemetry:

[monitoring.opentelemetry]
enabled = true
endpoint = "http://otel-collector:4317"
sampling_rate = 0.1

11. Capacity Planning

11.1 Resource Estimation

OLTP Workload:

QPS Target: 50,000 queries/sec
Required:
- CPU: 32-64 cores
- Memory: 64-128 GB
- IOPS: 50K-100K
- Network: 10 Gbps
- Nodes: 4-8 (with replication)
Cost: ~$20K-$40K/month (cloud)

Analytical Workload:

Dataset: 10 TB
Query throughput: 100 queries/hour
Required:
- CPU: 64-128 cores
- Memory: 256-512 GB
- GPU: 2-4x (48-96 GB VRAM)
- Storage: 20 TB NVMe
- Network: 25-100 Gbps
- Nodes: 8-16
Cost: ~$60K-$120K/month (cloud)

11.2 Scaling Decision Matrix

ScenarioVertical ScaleHorizontal ScaleHybrid
CPU bottleneckUp to 64 coresAdd nodesRecommended
Memory pressureUp to 512GB❌ Not effectiveAdd memory + nodes
I/O bottleneckFaster SSDShard dataRecommended
Network saturation❌ Limited gainsAdd nodesRecommended
Query concurrency❌ Diminishing returnsLinear scalingRecommended

11.3 Growth Planning

Capacity headroom:

  • Plan for 50-100% headroom on peak load
  • Monitor weekly growth trends
  • Scale before reaching 70% capacity

Example growth projection:

Current: 10 TB data, 20K QPS
Growth: 20% per quarter
Quarter 1: 12 TB, 24K QPS → No scaling needed
Quarter 2: 14.4 TB, 28.8K QPS → Add 2 nodes
Quarter 3: 17.3 TB, 34.6K QPS → Add 2 nodes
Quarter 4: 20.7 TB, 41.5K QPS → Add 2 nodes + storage

12. Troubleshooting Performance Issues

12.1 High Latency

Symptoms:

  • Queries taking longer than expected
  • Increased P95/P99 latency
  • User complaints about slowness

Diagnosis:

-- Check for slow queries
SELECT * FROM system.slow_query_log
WHERE execution_time_ms > 1000
ORDER BY timestamp DESC LIMIT 20;
-- Check for resource contention
SELECT * FROM system.resource_stats;
-- Check for lock contention
SELECT * FROM system.lock_wait_events
WHERE wait_time_ms > 100;
-- Check cache hit rates
SELECT * FROM system.cache_stats;

Common causes & solutions:

CauseSolution
Missing indexesRun EXPLAIN, add appropriate indexes
Cache cold startWarm cache with common queries
Disk I/O saturationAdd faster storage, increase cache
CPU bottleneckScale up cores or scale out nodes
Network congestionUpgrade network, reduce result size
Large result setsEnable streaming, use pagination

12.2 Low Throughput

Symptoms:

  • QPS below expected
  • High connection wait times
  • Request queue buildup

Diagnosis:

-- Check connection pool status
SELECT * FROM system.connection_pools;
-- Check for bottlenecks
SELECT * FROM system.bottleneck_analysis;
-- Check query distribution
SELECT
query_type,
COUNT(*) as count,
AVG(execution_time_ms) as avg_time
FROM system.query_log
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY query_type;

Common causes & solutions:

CauseSolution
Connection pool exhaustedIncrease pool size
Single-threaded bottleneckEnable parallel execution
Lock contentionOptimize transaction scope
Compaction stormsTune compaction thresholds
GC pausesTune memory allocation

12.3 High Memory Usage

Symptoms:

  • OOM errors
  • Memory pressure events
  • Killed queries

Diagnosis:

-- Check memory usage breakdown
SELECT * FROM system.memory_stats;
-- Identify memory-intensive queries
SELECT
query_id,
query_text,
memory_used_mb
FROM system.active_queries
ORDER BY memory_used_mb DESC LIMIT 10;
-- Check for memory leaks
SELECT * FROM system.memory_leak_detection;

Solutions:

  1. Increase total_memory_limit
  2. Reduce query_memory_limit
  3. Enable streaming for large results
  4. Kill memory-intensive queries
  5. Reduce cache sizes

12.4 GPU Not Utilized

Symptoms:

  • GPU utilization low (<20%)
  • Queries not using GPU despite large datasets
  • No performance improvement with GPU

Diagnosis:

-- Check GPU status
SELECT * FROM system.gpu_stats;
-- Check GPU routing decisions
EXPLAIN (ANALYZE, GPU_COST) SELECT ...;
-- Check cost model predictions
SELECT * FROM system.gpu_routing_log
WHERE timestamp > NOW() - INTERVAL '1 hour';

Common causes & solutions:

CauseSolution
Queries too smallReduce min_rows_for_gpu
GPU memory fullIncrease GPU memory, reduce concurrent queries
Cost model miscalibratedRetrain cost model, adjust thresholds
CUDA errorsCheck CUDA installation, driver version
Sequential operationsUse query hints to force GPU

Summary

This guide covers the essential aspects of HeliosDB performance tuning:

  1. Quick Start: Essential configuration for immediate performance gains
  2. GPU Acceleration: Maximize analytical query performance with GPU
  3. Query Optimization: Best practices for efficient query writing
  4. Connection Pooling: Optimal connection management
  5. Memory Management: Efficient memory usage and caching
  6. Storage Tuning: LSM tree and compaction optimization
  7. Protocol Optimization: Protocol-specific best practices
  8. Caching: Multi-level cache strategies
  9. Monitoring: Track performance metrics and identify issues
  10. Capacity Planning: Resource estimation and scaling strategies
  11. Troubleshooting: Common performance issues and solutions

Key Takeaways:

  • Start with quick wins (GPU, caching, indexes)
  • Monitor continuously and adjust based on workload
  • Profile before optimizing (measure, don’t guess)
  • Scale horizontally for sustained growth
  • Use appropriate tools for different workload types

Next Steps:

  1. Apply quick start configuration
  2. Run baseline benchmarks
  3. Monitor key metrics (latency, throughput, resource usage)
  4. Iterate on optimization based on observed bottlenecks
  5. Plan capacity growth proactively

For additional support, consult:


Document Version: 1.0 Last Updated: November 24, 2025 Feedback: performance@heliosdb.dev