Skip to content

Performance Improvement Results (2026-02-13)

Performance Improvement Results (2026-02-13)

Overview

Eight performance improvements were identified and implemented. Results use In-Memory mode (256MB memtable) for consistency, eliminating disk I/O variance.

Improvements Implemented

#ImprovementTypeImpact
1Group-commit WALConfig (DirectIoWal)Write latency: 500us → 19us (26x faster)
2SELECT overhead reductionCode (handler.rs)SELECT: 530us → 503us (~5% faster)
3Route point lookups to lsm_getN/ANo point lookups in original benchmark workload
4Single lsm_put for UPDATEN/A150 writes for 70 UPDATEs = multi-row WHERE matches, not redundant
5SQL parse cacheCode (sql_parser.rs)Repeated SELECT parse: 43us → 6us (7x faster)
6Pre-compile security regexAlready doneregex_cache module already caches compiled patterns
7Finer PerfTracer spansCode (handler.rs)New diagnostic spans: select_row_processing, select_aggregate, update_row_processing
8Point lookup fast pathCode (handler.rs)Point SELECT: 530us → 30us (17.7x faster)

Point Lookup Optimization (Improvement 8)

Problem

execute_select_with_storage() always performed a full table scan via scan_table_branch_aware(), even for simple SELECT * FROM t WHERE id = N. The LSM engine has an O(1) read(key) method, but internal storage keys ([table_id:8][row_id:8]) differ from user column values.

Solution

Added an in-memory pk_index (HashMap<table_name, HashMap<first_col_value, storage_key>>) that bridges user column values to internal storage keys:

  • INSERT: Index first column value → storage key
  • SELECT: Single equality predicate on first column → O(1) lsm_get instead of full lsm_scan
  • UPDATE/DELETE/DROP: Maintain index consistency

Results (In-Memory, 1000 queries each)

MetricFull Scan (baseline)Point LookupSpeedup
execute_select530us30us17.7x
Storage operationlsm_scan 150uslsm_get 9us16.7x
sql_parse43us4us (cached)10.8x
Total per query~723us~43us~17x

Miss Behavior

When the pk_index has no entry (key doesn’t exist), the fast path exits in ~1us and falls through to the full table scan — zero overhead for non-indexed queries.

Comparison Table — Average Latency (In-Memory Mode)

Execute Phase (end-to-end per statement type)

StatementBaseline+Imp1 WAL+Imp2 SELECTImp3 N/AImp4 N/A+Imp5 CacheImp6 N/A+Imp7 SpansChange
INSERT execute_insert551us63us559us~560us~560us-88% (w/ WAL)
SELECT execute_select530us537us503us478us~480us-10%
UPDATE execute_update1700us715us1800us~1800us~1800us-58% (w/ WAL)
DELETE execute_delete870us633us855us~870us~870us-27% (w/ WAL)

Storage Operations

OperationBaseline+Imp1 WAL+Imp2Imp3Imp4+Imp5Imp6+Imp7Change
lsm_put (INSERT)500us19us505us~505us~505us-96% (w/ WAL)
lsm_put (UPDATE)491us20us509us~510us~510us-96% (w/ WAL)
lsm_delete514us26us502us~510us~510us-95% (w/ WAL)
lsm_scan (SELECT)150us154us156us152us~155us~0%
lsm_scan (UPDATE)180us180us184us177us~180us~0%

SQL Parse Phase

StatementBaseline+Imp1+Imp2Imp3Imp4+Imp5 CacheImp6+Imp7Change
INSERT sql_parse48us46us50us54us~55us~0% (unique queries)
SELECT sql_parse43us43us44us6us~6us-86% (cached)
UPDATE sql_parse47us45us49us52us~52us~0% (unique queries)
DELETE sql_parse37us37us39us44us~44us~0% (unique queries)

New Diagnostic Spans (Improvement 7)

SpanStatementCountAvgDescription
select_row_processingSELECT147~530usDecode + predicate eval + value conversion loop
select_aggregateSELECT84~7usCOUNT/SUM/AVG computation
update_row_processingUPDATE70~530usScan + filter + modify + write loop
select_point_lookupSELECT (point)1000~23usPK index lookup + lsm_get + decode

Summary of Impact

Biggest Wins

  1. Point Lookup Fast Path (Imp 8): Largest read-path improvement. For OLTP primary-key lookups (WHERE id = N), bypasses full table scan entirely. 17.7x faster (530us → 30us) using O(1) lsm_get instead of lsm_scan + decode-all-rows.

  2. Group-commit WAL (Imp 1): Largest write-path improvement. Switching from per-write fsync() to DirectIoWal with group commit batching reduces write latency by 96% (500us → 19us).

  3. SQL Parse Cache (Imp 5): For repeated queries (common in OLTP workloads), caching the ParsedQuery AST avoids redundant sqlparser::Parser::parse_sql() calls. 86% reduction in parse time (43us → 6us).

  4. SELECT Overhead Reduction (Imp 2): Eliminating per-row UTF-8 key allocation (for non-Serializable isolation) and using into_iter() instead of iter().clone() saves ~5% on SELECT execution.

Combined Effect (All Improvements)

MetricBaselineOptimizedSpeedup
INSERT execute_insert551us63us8.7x (w/ WAL)
SELECT full scan530us478us1.1x
SELECT point lookup530us30us17.7x
SELECT sql_parse43us4us10.8x (cached)
UPDATE execute_update1700us715us2.4x (w/ WAL)
DELETE execute_delete870us633us1.4x (w/ WAL)
lsm_put (write)500us19us26x (w/ WAL)
lsm_get (point read)N/A9usN/A (new)

Remaining Bottlenecks

  1. SELECT row processing (530us avg): StoredRow::decode + value conversion dominates full scans. Future: columnar storage, zero-copy deserialization.
  2. UPDATE multi-row overhead: Each matched row requires scan → decode → modify → encode → write. Future: batch write API.
  3. lsm_scan baseline (150us): Memtable + SSTable merge. Future: zone maps, bloom filters, predicate pushdown to storage.
  4. Point lookup miss: Falls through to full scan. Future: bloom filter to detect missing keys without scanning.

Code Changes

FileChange
heliosdb-protocols/src/postgres/handler.rspk_index field; stringify_stored_value() helper; try_point_lookup_key() method; point lookup fast path in execute_select_with_storage(); pk_index maintenance in INSERT/UPDATE/DELETE/DROP; skip UTF-8 key conversion when not Serializable; Vec::with_capacity; into_iter(); new PerfTracer spans
heliosdb-protocols/src/postgres/sql_parser.rsstd::sync::Mutex<HashMap> parse cache (256 entries, clear-on-full eviction)
heliosdb-protocols/src/postgres/predicate_eval.rsMade column_name_to_hash() pub(crate) for point lookup detection
heliosdb-protocols/tests/performance_tracing_benchmark.rsAdded use_group_commit parameter; POINT LOOKUP + POINT LOOKUP MISS benchmark sections; perf_improvement_1_group_commit_wal test
heliosdb-storage/src/direct_io_wal.rsAlready existed — DirectIoWalConfig::default() with group commit (256 batch, 5ms flush)

How to Run

Terminal window
# Full benchmark (standard WAL)
CARGO_HOME=/tmp/cargo-home cargo test -p heliosdb-protocols \
--test performance_tracing_benchmark performance_tracing_benchmark \
-- --nocapture --exact
# With Group-commit WAL
CARGO_HOME=/tmp/cargo-home cargo test -p heliosdb-protocols \
--test performance_tracing_benchmark perf_improvement_1_group_commit_wal \
-- --nocapture --exact