Performance Improvement Results (2026-02-13)

Overview

Eight performance improvements were identified and implemented. Results use In-Memory mode (256MB memtable) for consistency, eliminating disk I/O variance.

Improvements Implemented

#	Improvement	Type	Impact
1	Group-commit WAL	Config (DirectIoWal)	Write latency: 500us → 19us (26x faster)
2	SELECT overhead reduction	Code (handler.rs)	SELECT: 530us → 503us (~5% faster)
3	Route point lookups to lsm_get	N/A	No point lookups in original benchmark workload
4	Single lsm_put for UPDATE	N/A	150 writes for 70 UPDATEs = multi-row WHERE matches, not redundant
5	SQL parse cache	Code (sql_parser.rs)	Repeated SELECT parse: 43us → 6us (7x faster)
6	Pre-compile security regex	Already done	regex_cache module already caches compiled patterns
7	Finer PerfTracer spans	Code (handler.rs)	New diagnostic spans: `select_row_processing`, `select_aggregate`, `update_row_processing`
8	Point lookup fast path	Code (handler.rs)	Point SELECT: 530us → 30us (17.7x faster)

Point Lookup Optimization (Improvement 8)

Problem

execute_select_with_storage() always performed a full table scan via scan_table_branch_aware(), even for simple SELECT * FROM t WHERE id = N. The LSM engine has an O(1) read(key) method, but internal storage keys ([table_id:8][row_id:8]) differ from user column values.

Solution

Added an in-memory pk_index (HashMap<table_name, HashMap<first_col_value, storage_key>>) that bridges user column values to internal storage keys:

INSERT: Index first column value → storage key
SELECT: Single equality predicate on first column → O(1) lsm_get instead of full lsm_scan
UPDATE/DELETE/DROP: Maintain index consistency

Results (In-Memory, 1000 queries each)

Metric	Full Scan (baseline)	Point Lookup	Speedup
`execute_select`	530us	30us	17.7x
Storage operation	`lsm_scan` 150us	`lsm_get` 9us	16.7x
`sql_parse`	43us	4us (cached)	10.8x
Total per query	~723us	~43us	~17x

Miss Behavior

When the pk_index has no entry (key doesn’t exist), the fast path exits in ~1us and falls through to the full table scan — zero overhead for non-indexed queries.

Comparison Table — Average Latency (In-Memory Mode)

Execute Phase (end-to-end per statement type)

Statement	Baseline	+Imp1 WAL	+Imp2 SELECT	Imp3 N/A	Imp4 N/A	+Imp5 Cache	Imp6 N/A	+Imp7 Spans	Change
INSERT execute_insert	551us	63us	559us	—	—	~560us	—	~560us	-88% (w/ WAL)
SELECT execute_select	530us	537us	503us	—	—	478us	—	~480us	-10%
UPDATE execute_update	1700us	715us	1800us	—	—	~1800us	—	~1800us	-58% (w/ WAL)
DELETE execute_delete	870us	633us	855us	—	—	~870us	—	~870us	-27% (w/ WAL)

Storage Operations

Operation	Baseline	+Imp1 WAL	+Imp2	Imp3	Imp4	+Imp5	Imp6	+Imp7	Change
lsm_put (INSERT)	500us	19us	505us	—	—	~505us	—	~505us	-96% (w/ WAL)
lsm_put (UPDATE)	491us	20us	509us	—	—	~510us	—	~510us	-96% (w/ WAL)
lsm_delete	514us	26us	502us	—	—	~510us	—	~510us	-95% (w/ WAL)
lsm_scan (SELECT)	150us	154us	156us	—	—	152us	—	~155us	~0%
lsm_scan (UPDATE)	180us	180us	184us	—	—	177us	—	~180us	~0%

SQL Parse Phase

Statement	Baseline	+Imp1	+Imp2	Imp3	Imp4	+Imp5 Cache	Imp6	+Imp7	Change
INSERT sql_parse	48us	46us	50us	—	—	54us	—	~55us	~0% (unique queries)
SELECT sql_parse	43us	43us	44us	—	—	6us	—	~6us	-86% (cached)
UPDATE sql_parse	47us	45us	49us	—	—	52us	—	~52us	~0% (unique queries)
DELETE sql_parse	37us	37us	39us	—	—	44us	—	~44us	~0% (unique queries)

New Diagnostic Spans (Improvement 7)

Span	Statement	Count	Avg	Description
`select_row_processing`	SELECT	147	~530us	Decode + predicate eval + value conversion loop
`select_aggregate`	SELECT	84	~7us	COUNT/SUM/AVG computation
`update_row_processing`	UPDATE	70	~530us	Scan + filter + modify + write loop
`select_point_lookup`	SELECT (point)	1000	~23us	PK index lookup + lsm_get + decode

Summary of Impact

Biggest Wins

Point Lookup Fast Path (Imp 8): Largest read-path improvement. For OLTP primary-key lookups (WHERE id = N), bypasses full table scan entirely. 17.7x faster (530us → 30us) using O(1) lsm_get instead of lsm_scan + decode-all-rows.
Group-commit WAL (Imp 1): Largest write-path improvement. Switching from per-write fsync() to DirectIoWal with group commit batching reduces write latency by 96% (500us → 19us).
SQL Parse Cache (Imp 5): For repeated queries (common in OLTP workloads), caching the ParsedQuery AST avoids redundant sqlparser::Parser::parse_sql() calls. 86% reduction in parse time (43us → 6us).
SELECT Overhead Reduction (Imp 2): Eliminating per-row UTF-8 key allocation (for non-Serializable isolation) and using into_iter() instead of iter().clone() saves ~5% on SELECT execution.

Combined Effect (All Improvements)

Metric	Baseline	Optimized	Speedup
INSERT execute_insert	551us	63us	8.7x (w/ WAL)
SELECT full scan	530us	478us	1.1x
SELECT point lookup	530us	30us	17.7x
SELECT sql_parse	43us	4us	10.8x (cached)
UPDATE execute_update	1700us	715us	2.4x (w/ WAL)
DELETE execute_delete	870us	633us	1.4x (w/ WAL)
lsm_put (write)	500us	19us	26x (w/ WAL)
lsm_get (point read)	N/A	9us	N/A (new)

Remaining Bottlenecks

SELECT row processing (530us avg): StoredRow::decode + value conversion dominates full scans. Future: columnar storage, zero-copy deserialization.
UPDATE multi-row overhead: Each matched row requires scan → decode → modify → encode → write. Future: batch write API.
lsm_scan baseline (150us): Memtable + SSTable merge. Future: zone maps, bloom filters, predicate pushdown to storage.
Point lookup miss: Falls through to full scan. Future: bloom filter to detect missing keys without scanning.

Code Changes

File	Change
`heliosdb-protocols/src/postgres/handler.rs`	`pk_index` field; `stringify_stored_value()` helper; `try_point_lookup_key()` method; point lookup fast path in `execute_select_with_storage()`; pk_index maintenance in INSERT/UPDATE/DELETE/DROP; skip UTF-8 key conversion when not Serializable; `Vec::with_capacity`; `into_iter()`; new PerfTracer spans
`heliosdb-protocols/src/postgres/sql_parser.rs`	`std::sync::Mutex<HashMap>` parse cache (256 entries, clear-on-full eviction)
`heliosdb-protocols/src/postgres/predicate_eval.rs`	Made `column_name_to_hash()` `pub(crate)` for point lookup detection
`heliosdb-protocols/tests/performance_tracing_benchmark.rs`	Added `use_group_commit` parameter; POINT LOOKUP + POINT LOOKUP MISS benchmark sections; `perf_improvement_1_group_commit_wal` test
`heliosdb-storage/src/direct_io_wal.rs`	Already existed — `DirectIoWalConfig::default()` with group commit (256 batch, 5ms flush)

How to Run

# Full benchmark (standard WAL)
CARGO_HOME=/tmp/cargo-home cargo test -p heliosdb-protocols \
  --test performance_tracing_benchmark performance_tracing_benchmark \
  -- --nocapture --exact

# With Group-commit WAL
CARGO_HOME=/tmp/cargo-home cargo test -p heliosdb-protocols \
  --test performance_tracing_benchmark perf_improvement_1_group_commit_wal \
  -- --nocapture --exact