Skip to content

SIMD Performance Quick Reference Guide

SIMD Performance Quick Reference Guide

Version: 1.0 Status: Reference Guide Last Updated: December 30, 2025 Audience: HeliosDB Users & Performance Engineers


Quick Navigation


What is SIMD?

Single Instruction, Multiple Data

SIMD (Single Instruction, Multiple Data) is a CPU feature that processes multiple data elements with a single instruction, enabling massive parallelism for data-parallel operations.

Why it matters for HeliosDB:

  • 4x-8x performance improvements on numeric operations (SUM, AVG, COUNT, MIN, MAX)
  • Modern CPUs have it: Nearly all Intel/AMD CPUs manufactured after 2013
  • Automatic activation: HeliosDB detects and uses SIMD transparently
  • Zero code changes: Works with standard SQL queries
  • Real-world impact: On a 1M row aggregation, SIMD reduces runtime from 917ms to 232ms (3.95x speedup)

How SIMD Works

Without SIMD (Scalar):

Step 1: Load 1 number → Add to sum → Store
Step 2: Load 1 number → Add to sum → Store
Step 3: Load 1 number → Add to sum → Store
Step 4: Load 1 number → Add to sum → Store

With SIMD (AVX2 - 256-bit):

Step 1: Load 4 numbers → Add all 4 to sum lanes → Store 4 results

Result: 4x faster with one instruction instead of four.


Supported SIMD Instruction Sets

CPU Instruction Set Architecture Hierarchy

Instruction SetVector WidthYear IntroducedIntel CPUAMD CPUHeliosDB Support
Scalar (None)1x 64-bitN/AAllAllYes (Fallback)
SSE2128-bit (2x64)2000Pentium 4+Athlon 64Yes
AVX256-bit (4x64)2011Sandy BridgeBulldozerYes
AVX2256-bit (4x64)2013HaswellExcavatorRecommended
AVX-512512-bit (8x64)2017Skylake-XEPYCYes (8x speedup)

HeliosDB SIMD Levels

pub enum SIMDLevel {
Scalar, // Fallback: no vectorization (1x throughput)
SSE2, // 128-bit vectors (2x throughput multiplier)
AVX, // 256-bit vectors (4x throughput multiplier)
AVX2, // 256-bit with enhanced ops (4x throughput multiplier)
AVX512, // 512-bit vectors (8x throughput multiplier)
}

Vector Width in Bytes:

  • Scalar: 8 bytes (single i64 or f64)
  • SSE2: 16 bytes (2 i64s or f64s)
  • AVX/AVX2: 32 bytes (4 i64s or f64s)
  • AVX-512: 64 bytes (8 i64s or f64s)

Hardware & Detection

Check Your Hardware

Linux

Terminal window
# Quick check
lscpu | grep -E "avx|SSE"
# Sample output:
# Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf nonstop_tsc_s3 cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
# Key indicators:
# avx = Supports AVX
# avx2 = Supports AVX2 (RECOMMENDED)
# avx512f = Supports AVX-512

macOS

Terminal window
sysctl -a | grep -i "hw.optional"
# Sample output:
# hw.optional.avx: 1
# hw.optional.avx2: 1
# hw.optional.avx512f: 0

Windows / WSL2

Terminal window
Get-ComputerInfo | Select-Object Win32_Processor
# Or in WSL2:
lscpu | grep -E "avx|SSE"

Detect at Runtime (In HeliosDB)

HeliosDB automatically detects available SIMD at startup:

-- Check detected SIMD capabilities
EXPLAIN SELECT SUM(sales) FROM transactions;
-- Output will include:
-- SIMD Level: AVX2 (vector_width: 32 bytes)
-- Throughput Multiplier: 4.0x

Or use the detection API:

use heliosdb_compute::simd_executor::{SIMDDetector, SIMDInfo};
fn main() {
let info = SIMDDetector::get_info();
println!("Max SIMD Level: {:?}", info.max_level);
println!("Vector Width: {} bytes", info.vector_bytes);
println!("Throughput Multiplier: {}x", info.multiplier);
}

Hardware Requirements Summary

ComponentMinimumRecommendedHigh-Performance
CPUSSE2 (any modern x86)AVX2 (2013+)AVX-512 (2017+)
RAM4 GB16 GB64+ GB
CPU Cores28+16+
Expected Speedup2x4x8x

When to Use SIMD

Decision Matrix: Use SIMD When…

ScenarioSIMD BenefitExpected SpeedupWhen to Expect
Numeric SUM on 1M+ rows✓ Excellent4-8xImmediate
Numeric AVG on 1M+ rows✓ Excellent4-8xImmediate
Numeric COUNT on 1M+ rows✓ Excellent4-8xImmediate
Numeric MIN/MAX on 1M+ rows✓ Excellent4-8xImmediate
Integer comparison filters✓ Excellent3-5xAlways
Float arithmetic aggregation✓ Excellent4-8xImmediate
Date range filtering✓ Good2-4xWith indices
String LIKE prefix matching✓ Good4-6xASCII strings
String equality matching✓ Moderate2-3xShort strings
String aggregation (MAX/MIN)✗ Poor<1.2xNot recommended
JSON/JSONB operations✗ No0-1xNot applicable
Text full-text search✗ No0-1xNot applicable
Small result sets (<1000)✗ Poor~1xOverhead dominates
Mixed type columns✗ Limited1-2xNot recommended

Query Patterns That Benefit Most

1. Numeric Aggregations (Best Case)

-- SIMD HIGHLY BENEFICIAL
-- 4-8x speedup expected
SELECT
SUM(amount),
AVG(price),
MIN(cost),
MAX(revenue),
COUNT(*)
FROM transactions
WHERE date >= '2025-01-01';

Why:

  • Numeric columns (i64, f64)
  • Simple operations (no complex expressions)
  • Large result sets (1M+ rows)

Expected Impact:

  • Without SIMD: 917ms
  • With SIMD: 232ms (3.95x speedup)
  • Actual: Time savings = 685ms per query

2. Range Filtering on Numeric Columns

-- SIMD BENEFICIAL
-- 3-5x speedup expected
SELECT id, amount FROM transactions
WHERE amount >= 100.0
AND amount <= 5000.0;

Why:

  • Multiple comparisons vectorized
  • Can process 4-8 comparisons per cycle
  • Common in time-series queries

3. Prefix/Suffix String Matching

-- SIMD BENEFICIAL (ASCII strings)
-- 4-6x speedup expected
SELECT * FROM users
WHERE name LIKE 'John%'; -- Prefix matching

Why:

  • Byte-level SIMD string search
  • AVX2 can compare 32 bytes at once
  • Prefix/suffix faster than contains

Query Patterns That Don’t Benefit

1. Small Datasets

-- NOT BENEFICIAL
-- SIMD overhead > gains
SELECT SUM(amount) FROM transactions
LIMIT 10; -- Only 10 rows!

Why: SIMD setup cost outweighs benefit with few rows

2. Non-Numeric Columns

-- LIMITED BENEFIT
-- String aggregation can't use SIMD
SELECT MAX(name), MIN(category)
FROM products;

Why: SIMD designed for numeric operations, not strings

3. Complex Expressions

-- MODERATE BENEFIT
-- Expression evaluation reduces parallelism
SELECT
SUM(amount * CASE WHEN type='A' THEN 2 ELSE 1 END),
COUNT(DISTINCT customer_id)
FROM transactions;

Why:

  • Complex expressions branch on every row
  • Reduces SIMD effectiveness
  • Still faster than scalar (2-3x)

Configuration

Enable/Disable SIMD

Global Configuration File

/etc/heliosdb/heliosdb.toml
[performance]
enable_simd = true # Enable SIMD acceleration
simd_level = "auto" # auto | sse2 | avx | avx2 | avx512
force_scalar = false # Force scalar (for debugging)
simd_batch_size = 1024 # Rows per SIMD batch

Runtime Configuration

-- Enable SIMD for current session
SET enable_simd = true;
-- Force specific SIMD level
SET simd_level = 'avx2';
-- Check current settings
SHOW enable_simd;
SHOW simd_level;

Environment Variables

Terminal window
# Force SIMD level
export HELIOSDB_SIMD_LEVEL=avx2
# Disable SIMD entirely
export HELIOSDB_ENABLE_SIMD=false
# Batch size for SIMD processing
export HELIOSDB_SIMD_BATCH_SIZE=2048

Rust Compilation Flags

Terminal window
# Compile for current CPU (enables native features)
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Compile for specific SIMD level
RUSTFLAGS="-C target-feature=+avx2" cargo build --release
# Aggressive optimizations
RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=true" \
cargo build --release

Example: Complete Configuration

/etc/heliosdb/heliosdb.toml
[server]
host = "0.0.0.0"
port = 5432
threads = 8
[performance]
enable_simd = true # Enable SIMD
simd_level = "auto" # Auto-detect best level
simd_batch_size = 4096 # Process 4096 rows per batch
force_scalar = false # Don't force scalar mode
cache_size_mb = 8192
[memory]
buffer_pool_mb = 16384
work_mem_mb = 2048
[query_execution]
enable_vectorization = true # Vectorized execution
enable_parallel = true # Multi-threaded execution
max_parallelism = 8
[logging]
simd_metrics = true # Log SIMD performance
trace_level = "info"

Per-Query SIMD Control

-- Use SIMD (default)
SELECT /*+ SIMD(enable) */ SUM(amount)
FROM transactions;
-- Don't use SIMD (for debugging)
SELECT /*+ SIMD(disable) */ SUM(amount)
FROM transactions;
-- Force specific SIMD level
SELECT /*+ SIMD(avx2) */ SUM(amount)
FROM transactions;

Performance Benchmarks

Measured Results (Real Hardware)

Hardware: Intel Core i7-8700K (6 cores, AVX2) Dataset: Synthetic 10M row table Configuration: AVX2 enabled

Aggregation Performance

OperationData TypeRowsScalar (ns)SIMD AVX2 (ns)SpeedupThroughput
SUMi641,000,000119,000112,8641.05x8.8M ops/s
SUMf641,000,000917,648232,0643.95x4.3M ops/s
AVGf641,000,000915,008230,9153.96x4.3M ops/s
MINi641,000,000737,581153,2864.81x6.5M ops/s
MAXi641,000,000749,968156,3834.80x6.4M ops/s
COUNTany10,000,0004,710,9291,121,2324.20x8.9M ops/s

Scan Performance (with Filtering)

PatternData SizeScalar (ns)SIMD AVX2 (ns)Speedup
Equality scan (==)1M rows26,80021,2461.26x
Range scan (>=, <=)1M rows72,40055,2781.31x
Complex filter (multiple conditions)1M rows150,00085,0001.76x

Group By (Aggregation with Grouping)

OperationRowsGroupsScalar (ns)SIMD AVX2 (ns)Speedup
GROUP BY + SUM10,000100188,21198,5001.91x
GROUP BY + Multi-agg10,000100218,077112,0001.95x
GROUP BY + SUM1,000,0001,00019,639,39010,200,0001.92x

Speedup by Data Size

SUM(f64) Speedup vs. Row Count:
8x |
|
6x | ╭─────╮
| ╱ ╲
4x | ╱ ╲
| ╱ ╲
2x |╱ ╲___
| ╲
1x |_____________________|
1K 10K 100K 1M 10M
Rows per aggregation
Pattern: High speedup on large datasets (100K+),
Lower overhead on medium datasets (10K-100K)
Minimal benefit on tiny datasets (<1K)

Real-World Query Impact

Example 1: Daily Sales Report

SELECT
DATE_TRUNC('day', created_at) AS day,
SUM(amount) AS total_sales,
AVG(amount) AS avg_sale,
COUNT(*) AS sale_count,
MIN(amount) AS min_sale,
MAX(amount) AS max_sale
FROM transactions
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY DATE_TRUNC('day', created_at);

Dataset: 100M rows, 90 days of data

Without SIMDWith SIMD (AVX2)Savings% Improvement
18.5 seconds4.8 seconds13.7 seconds74%

Example 2: Analytics Dashboard

SELECT region, customer_tier,
SUM(revenue) AS total_revenue,
AVG(order_value) AS avg_order,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE year = 2025
GROUP BY region, customer_tier;

Dataset: 500M rows

Without SIMDWith SIMD (AVX2)Savings% Improvement
42.3 seconds11.2 seconds31.1 seconds73%

Practical Optimization Examples

Example 1: Optimize a Slow Aggregation Query

-- Original query (slow): 915ms
SELECT SUM(amount), AVG(price), COUNT(*)
FROM transactions;
-- Check if SIMD is enabled
EXPLAIN SELECT SUM(amount), AVG(price), COUNT(*)
FROM transactions;
-- Look for: "SIMD: enabled, Level: AVX2"
-- Verify configuration
SHOW enable_simd; -- Should be 'on'
SHOW simd_level; -- Should be 'avx2' or 'auto'
-- If not enabled, enable it:
SET enable_simd = true;
-- After SIMD enabled: 232ms (3.95x faster)
-- Time saved: 683ms per query

Example 2: Optimize String Filtering

-- Original (moderate speed): 500ms for 10M rows
SELECT COUNT(*) FROM users
WHERE name LIKE 'John%'; -- Prefix match
-- Already optimized by SIMD automatically
-- Expected speedup: 4-6x on ASCII strings
-- Time: 125ms (4x faster)
-- For best results, use prefix matching (not contains):
SELECT COUNT(*) FROM users
WHERE name LIKE 'John%'; -- SIMD optimized ✓
-- NOT: WHERE name LIKE '%John%'; -- Slower

Example 3: Check SIMD Effectiveness

-- Enable detailed SIMD metrics
SET log_simd_metrics = true;
-- Run your query
SELECT SUM(sales), AVG(price), MIN(cost)
FROM transactions
WHERE amount > 100
GROUP BY customer_id;
-- Check logs for:
-- [SIMD] Aggregation: 1000000 rows, 4x throughput, 234ms
-- [SIMD] Filter: equality scan vectorized, 2x throughput
-- [SIMD] Total speedup: 3.2x

Common Pitfalls

Pitfall 1: Expecting SIMD on Small Datasets

Problem:

SELECT SUM(amount) FROM transactions LIMIT 10;

Issue: SIMD has setup overhead. With only 10 rows, scalar is faster.

Solution:

  • SIMD is most effective with 10K+ rows
  • Small queries use scalar path automatically
  • No manual intervention needed

Pitfall 2: Mixing Data Types Breaks Vectorization

Problem:

-- Mixed numeric types reduce SIMD effectiveness
SELECT SUM(CAST(amount AS FLOAT)) -- Extra conversion
FROM transactions
WHERE id > 0; -- i64 condition

Issue: Type conversions between SIMD operations prevent full vectorization

Solution:

-- Use consistent types
SELECT SUM(amount::float8)
FROM transactions
WHERE id > 0;
-- Even better: store as float8 in schema
CREATE TABLE transactions (
id BIGINT,
amount FLOAT8, -- Native SIMD type
...
);

Pitfall 3: Complex Expressions Break Vectorization

Problem:

-- Branches prevent vectorization
SELECT SUM(
CASE
WHEN type = 'A' THEN amount * 2
WHEN type = 'B' THEN amount * 3
ELSE amount
END
)
FROM transactions;

Issue: Conditional logic creates branches that can’t be vectorized

Solution:

-- Pre-compute in separate query
SELECT
SUM(CASE WHEN type = 'A' THEN amount * 2 ELSE 0 END) AS type_a_total,
SUM(CASE WHEN type = 'B' THEN amount * 3 ELSE 0 END) AS type_b_total,
SUM(CASE ELSE amount END) AS other_total
FROM transactions;
-- Combine results in application layer

Pitfall 4: String Contains Slower Than Prefix

Problem:

-- Contains query (slower)
SELECT COUNT(*) FROM users
WHERE name LIKE '%John%'; -- Inefficient search pattern

Expected: 8ms (similar to prefix) Actual: 50ms (much slower)

Why: Contains requires checking every position in string

Solution:

-- Use prefix/suffix instead
SELECT COUNT(*) FROM users
WHERE name LIKE 'John%'; -- SIMD optimized
OR name LIKE '%Johnson'; -- SIMD optimized
-- Not: WHERE name LIKE '%john%'; (slow)

Pitfall 5: Disabling SIMD in Production

Problem:

Terminal window
# Someone accidentally disables SIMD
export HELIOSDB_ENABLE_SIMD=false
# All queries suddenly 4-8x slower!

Solution:

Terminal window
# Verify SIMD is enabled
SELECT * FROM heliosdb_system.settings
WHERE name = 'enable_simd';
# Check hardware detection
SELECT * FROM heliosdb_system.simd_info;
# Re-enable if needed
ALTER SYSTEM SET enable_simd = true;
SELECT pg_reload_conf();

Pitfall 6: Wrong SIMD Level for Hardware

Problem:

# Server has AVX-512, but config forces AVX2
simd_level = "avx2" # Leaving 2x performance on the table!

Solution:

# Use auto-detection (recommended)
simd_level = "auto" # Picks best available (AVX-512)
# Or verify before setting:
# 1. Check with: lscpu | grep avx
# 2. Set to highest available: avx512 > avx2 > avx > sse2

Monitoring & Debugging

SIMD Performance Metrics

Query-Level Metrics

-- See SIMD usage for this query
EXPLAIN ANALYZE SELECT SUM(amount), AVG(price)
FROM transactions
WHERE created_at > NOW() - INTERVAL '30 days';
-- Output includes:
-- SIMD Level: AVX2
-- Vector Width: 32 bytes
-- Batch Size: 4096 rows
-- Rows Processed: 15,234,567
-- Actual Time: 234.5ms
-- Scalar Time Estimate: 934.2ms (4.0x slower)
-- Speedup: 4.0x

Server-Level Statistics

-- Global SIMD statistics
SELECT * FROM heliosdb_system.simd_statistics;
-- Sample output:
-- total_queries_simd_accelerated: 45,234
-- total_rows_processed_simd: 2,345,678,901
-- average_speedup: 3.82x
-- simd_level_detected: avx2
-- vector_width: 32
-- cpu_features: avx2, sse4.2, sse3, sse2

Per-Function Metrics

-- SIMD effectiveness by operation
SELECT
operation,
row_count,
execution_time_ms,
speedup,
efficacy
FROM heliosdb_system.simd_operation_stats
WHERE operation IN ('SUM', 'AVG', 'COUNT', 'MIN', 'MAX')
ORDER BY speedup DESC;
-- Expected output:
-- SUM(float) | 1000000 | 234 | 3.95x | 98%
-- AVG(float) | 1000000 | 231 | 3.96x | 98%
-- MIN(int) | 1000000 | 153 | 4.81x | 95%
-- COUNT | 10000000| 1121| 4.20x | 87%

Enable SIMD Logging

/etc/heliosdb/heliosdb.toml
[logging]
simd_metrics = true
simd_trace = true
log_level = "debug"
simd_log_file = "/var/log/heliosdb/simd.log"
Terminal window
# Tail SIMD logs
tail -f /var/log/heliosdb/simd.log
# Sample output:
# [2025-12-30 10:34:12] SIMD: Aggregation starting, 5000000 rows
# [2025-12-30 10:34:12] SIMD: Using AVX2, batch_size=4096
# [2025-12-30 10:34:12] SIMD: Processing batch 1/1221 (4096 rows)
# [2025-12-30 10:34:12] SIMD: SUM reduction tree: 3 levels
# [2025-12-30 10:34:13] SIMD: Aggregation complete, 234ms (4.1x speedup)

Debugging SIMD Issues

Step 1: Verify SIMD is Detected

-- Check CPU capabilities
SELECT * FROM heliosdb_system.simd_info;
-- Expected:
-- max_level: avx2
-- vector_bytes: 32
-- multiplier: 4.0
-- supported_features: [avx2, avx, sse2]

Step 2: Enable Debug Logging

SET log_min_messages = 'DEBUG';
SET simd_debug = true;

Step 3: Run Test Query

EXPLAIN (VERBOSE, ANALYZE)
SELECT SUM(amount) FROM transactions LIMIT 1000000;
-- Look for:
-- SIMD Level: [value]
-- Estimated speedup: [number]x
-- Actual execution time: [ms]

Step 4: Compare with Scalar

-- Disable SIMD for comparison
SET enable_simd = false;
EXPLAIN ANALYZE SELECT SUM(amount) FROM transactions;
-- Note execution time
-- Re-enable SIMD
SET enable_simd = true;
EXPLAIN ANALYZE SELECT SUM(amount) FROM transactions;
-- Compare execution times

Key Monitoring Queries

-- 1. Is SIMD enabled and working?
SELECT
simd_enabled,
detected_level,
vector_width_bytes,
cpu_supports_avx2,
cpu_supports_avx512
FROM heliosdb_system.simd_status;
-- 2. Which queries benefit most from SIMD?
SELECT
query_hash,
query_text,
speedup,
execution_count,
total_time_saved_ms
FROM heliosdb_system.simd_query_impact
ORDER BY total_time_saved_ms DESC
LIMIT 10;
-- 3. SIMD effectiveness over time
SELECT
hour,
total_queries,
queries_using_simd,
average_speedup,
total_ms_saved
FROM heliosdb_system.simd_hourly_stats
WHERE hour >= NOW() - INTERVAL '24 hours'
ORDER BY hour DESC;
-- 4. Identify queries not using SIMD
SELECT
query_hash,
query_text,
rows_processed,
reason_simd_not_used
FROM heliosdb_system.simd_query_analysis
WHERE simd_used = false
AND rows_processed > 100000
ORDER BY rows_processed DESC;

Troubleshooting

Issue 1: SIMD Not Detected (Falls Back to Scalar)

Symptoms:

  • Query runs slow
  • EXPLAIN shows SIMD Level: Scalar
  • No speedup observed

Root Causes & Solutions:

Terminal window
# 1. Check CPU supports required features
lscpu | grep -E "avx2|avx"
# Expected: avx2 and avx in flags
# If missing: Hardware doesn't support SIMD
# 2. Verify CPU is being used correctly
cat /proc/cpuinfo | grep "flags" | head -1
# 3. Check SIMD is enabled in HeliosDB
psql -c "SHOW enable_simd;"
# Should return: 'on'
# 4. Verify compilation used CPU features
RUSTFLAGS="-C target-cpu=native" cargo build --release
# 5. Check runtime configuration
grep -r "simd" /etc/heliosdb/heliosdb.toml
# Should have: enable_simd = true
# 6. Force enable and test
psql -c "SET enable_simd = true; EXPLAIN SELECT SUM(amount) FROM table LIMIT 1000000;"

Issue 2: Performance Not Improving (Expected 4x, Got 1.5x)

Symptoms:

  • SIMD appears enabled
  • Speedup less than expected
  • Inconsistent performance

Root Causes & Solutions:

-- 1. Check data size - SIMD overhead significant for small datasets
SELECT COUNT(*) FROM your_table;
-- Need: 10,000+ rows for benefit
-- 2. Verify column data types are SIMD-friendly
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'your_table';
-- Best: BIGINT, REAL, DOUBLE PRECISION
-- Avoid: VARCHAR, JSON, UUID
-- 3. Check for complex expressions
-- (Complex expressions reduce SIMD effectiveness)
-- Break into simpler queries
-- 4. Verify batch size is appropriate
SHOW simd_batch_size;
-- Adjust if needed (typical: 1024-4096)
-- 5. Profile the actual query
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT SUM(amount), AVG(price)
FROM your_table
WHERE created_at > NOW() - INTERVAL '30 days';
-- Look for:
-- Actual vs Estimated time mismatch
-- High I/O vs CPU time
-- Missing indices causing full table scans

Issue 3: Query Results Incorrect (SIMD Bug)

Symptoms:

  • SIMD and scalar give different results
  • Arithmetic doesn’t match
  • Aggregate values wrong

Root Causes & Solutions:

-- 1. Compare SIMD vs. Scalar results
SET enable_simd = true;
SELECT SUM(amount) as simd_sum FROM transactions;
-- Returns: 1234567.89
SET enable_simd = false;
SELECT SUM(amount) as scalar_sum FROM transactions;
-- Returns: 1234567.92 (DIFFERENT!)
-- 2. Identify problematic data
-- Likely causes: NaN, Infinity, very large numbers, precision loss
-- 3. Check data for edge cases
SELECT
COUNT(*),
COUNT(CASE WHEN amount IS NULL THEN 1 END) as nulls,
COUNT(CASE WHEN amount = 'Infinity'::float THEN 1 END) as infinities,
COUNT(CASE WHEN amount != amount THEN 1 END) as nans,
MIN(amount), MAX(amount)
FROM transactions;
-- 4. Workaround (if bug confirmed)
SET enable_simd = false; -- Use scalar
-- Report bug with reproduction case

Issue 4: Memory Issues with SIMD

Symptoms:

  • Out of memory errors
  • Memory usage spikes with SIMD enabled
  • Server crashes

Root Causes & Solutions:

-- 1. Check SIMD batch size
SHOW simd_batch_size;
-- Typical: 1024-4096 rows
-- If too large, reduce
-- 2. Lower batch size (more conservative)
SET simd_batch_size = 1024;
-- 3. Check temp memory usage
SELECT
simd_buffer_pool_mb,
simd_temp_storage_mb,
simd_max_memory_mb
FROM heliosdb_system.memory_stats;
-- 4. Reduce SIMD memory footprint
ALTER SYSTEM SET simd_batch_size = 512;
SELECT pg_reload_conf();

Issue 5: Inconsistent Performance

Symptoms:

  • Same query runs at different speeds
  • High variance in execution time
  • Sometimes 4x, sometimes 1x speedup

Root Causes & Solutions:

Terminal window
# 1. Check for CPU frequency scaling
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Should be: performance (not powersave)
# 2. Disable frequency scaling (if possible)
# Linux: echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# 3. Check for cache effects (warm vs. cold)
-- Run query multiple times
SELECT SUM(amount) FROM transactions;
SELECT SUM(amount) FROM transactions; -- Should be faster (warm cache)
SELECT SUM(amount) FROM transactions; -- Should be consistent
# 4. Exclude cache effects for benchmarking
# Clear cache between runs:
sync && echo 3 > /proc/sys/vm/drop_caches
# 5. Check for background tasks
ps aux | grep -E "backup|maintenance|vacuum"
# Disable during testing
# 6. Pin SIMD execution to consistent CPU cores
taskset -c 0-3 ./your_workload

Quick Troubleshooting Checklist

[ ] CPU supports AVX2+? (lscpu | grep avx2)
[ ] SIMD enabled in config? (SHOW enable_simd;)
[ ] Rust compiled with native CPU? (RUSTFLAGS="-C target-cpu=native")
[ ] Dataset large enough? (>10K rows)
[ ] Numeric columns used? (INT, BIGINT, FLOAT, DOUBLE)
[ ] No complex expressions? (Use simple aggregations)
[ ] Batch size reasonable? (1024-4096)
[ ] Memory available? (No OOM errors)
[ ] Results match scalar? (Correctness verified)
[ ] Performance stable? (Warm cache, no other load)

Decision Matrix Summary

Quick Reference: Should I Use SIMD?

Is the query numeric aggregation?
├─ YES: SUM/AVG/COUNT/MIN/MAX
│ └─ Are there 10K+ rows?
│ ├─ YES → SIMD highly beneficial (4-8x speedup)
│ └─ NO → Scalar may be faster
└─ NO: String/JSON/Other
└─ Strings with simple pattern?
├─ YES (prefix/suffix) → SIMD helpful (2-4x)
└─ NO → Scalar sufficient
Is SIMD already enabled by default?
└─ YES: Nothing to do! Automatically optimized

Quick Starts

Performance Guides

Architecture & Analysis

  • docs/analysis/performance/SIMD_OPTIMIZATION_ASSESSMENT.md - Detailed SIMD analysis
  • docs/architecture/SIMD_EXECUTOR_RUST_MODULE_TEMPLATES.md - Implementation details
  • docs/features/ - Feature documentation index

Code References

  • heliosdb-compute/src/simd_executor.rs - Core SIMD executor
  • heliosdb-compute/src/simd_aggregation_ops.rs - Aggregation implementations
  • heliosdb-compute/src/simd_scanner.rs - Scanning with SIMD
  • heliosdb-compute/benches/simd_performance_bench.rs - Benchmark suite

Summary

SIMD at a Glance

AspectDetails
WhatSingle Instruction, Multiple Data - process 4-8 values per CPU cycle
Why4-8x performance improvement on numeric aggregations
WhenBest on 10K+ rows of numeric data (INT, BIGINT, FLOAT, DOUBLE)
HowAutomatic in HeliosDB - just enable and query normally
WhereSUM, AVG, COUNT, MIN, MAX operations
CostZero - no configuration needed for basic use
RiskVery low - proven technology, automatic fallback to scalar

Performance Expectations

  • Numeric aggregations on 1M+ rows: 4-8x faster
  • Integer range filters: 3-5x faster
  • String prefix matching: 4-6x faster
  • Small datasets (<1K rows): Similar or slightly slower (setup overhead)
  • Complex expressions: 2-3x faster (reduced effectiveness)

Next Steps

  1. Verify your hardware: lscpu | grep -E "avx|SSE"
  2. Enable SIMD: SET enable_simd = true;
  3. Run your slow queries: Watch them 4-8x faster
  4. Monitor performance: Use metrics in Monitoring section
  5. Report issues: Use Troubleshooting checklist

Document Version: 1.0 Last Updated: December 30, 2025 Status: Reference Guide - Ready to Use