Bulk Loading Performance Analysis

Date: 2026-01-16 Benchmark: 10GB ALP compression demo (~447M rows) Observed Rate: ~31,000-41,000 rows/sec (declining over time)

Executive Summary

The current INSERT path has several performance bottlenecks that limit bulk loading throughput. This analysis identifies 8 major bottlenecks with concrete improvement recommendations that could increase throughput by 5-10x.

Current INSERT Path Flow

SQL String ("INSERT INTO t VALUES (...)")
    ↓
Parser::parse_one()              ← BOTTLENECK 1: Per-statement parsing
    ↓
Planner::statement_to_plan()
    ↓
execute_in_transaction()
    ├─ Get Schema
    ├─ Suspend SMFI (if bulk)    ← EXISTING OPTIMIZATION
    ├─ For each value_row:
    │   ├─ Evaluate expressions
    │   ├─ Auto-cast values
    │   ├─ Fill defaults/nulls
    │   ├─ FK validation         ← BOTTLENECK 2: Per-row validation
    │   ├─ BEFORE triggers       ← BOTTLENECK 3: Per-row triggers
    │   ├─ storage.insert_tuple()
    │   │   ├─ Get row_id
    │   │   ├─ Serialize for size ← BOTTLENECK 4: Double serialization
    │   │   ├─ Compress per-col   ← BOTTLENECK 5: No batch compression
    │   │   ├─ Format key string  ← BOTTLENECK 6: String allocation
    │   │   ├─ WAL::append()
    │   │   ├─ MV delta record    ← BOTTLENECK 7: Per-row tracking
    │   │   ├─ SMFI delta update
    │   │   └─ Speculative filter ← BOTTLENECK 8: Per-row filter update
    │   └─ AFTER triggers
    └─ Return count
    ↓
Transaction.commit()
    ├─ Parse key strings          ← BOTTLENECK 9: String parsing
    ├─ WriteBatch for all entries ← EXISTING OPTIMIZATION
    └─ db.write(batch)            ← ATOMIC

Detailed Bottleneck Analysis

BOTTLENECK 1: Per-Statement SQL Parsing

Location: src/lib.rs:1720-1726 Impact: HIGH

Each INSERT statement is parsed independently:

let parser = sql::Parser::new();
let statement = parser.parse_one(sql)?;
let planner = sql::Planner::with_catalog(&catalog);
let plan = planner.statement_to_plan(statement)?;

Problem: Demo executes 447M individual INSERT INTO orders VALUES (...) statements.

Recommendations:

Prepared Statement Caching: Cache parsed plan for repeated INSERTs
Multi-row VALUES: Support INSERT INTO t VALUES (...), (...), (...) syntax
COPY Command: Implement PostgreSQL COPY protocol for bulk loading

BOTTLENECK 2: Per-Row Foreign Key Validation

Location: src/lib.rs:787-819 Impact: MEDIUM (when FK constraints exist)

// For each row:
for fk in &fk_constraints {
    if fk.deferred == DeferTiming::Immediate {
        // Validate FK exists for each row
    }
}

Recommendations:

Batch FK Validation: Collect all FK values, validate in single query
Defer by Default for Bulk: Auto-defer FK checks during bulk inserts
Bloom Filter Pre-check: Use bloom filter to skip unnecessary lookups

BOTTLENECK 3: Per-Row Trigger Execution

Location: src/lib.rs:821-922 Impact: HIGH (when triggers exist)

// BEFORE INSERT triggers - per row
let trigger_ctx = TriggerRowContext::for_insert(&table_name, &tuple);
execute_triggers(&trigger_ctx, TriggerTiming::Before)?;

Recommendations:

Statement-level Triggers: Support FOR EACH STATEMENT triggers
Batch Trigger Context: Pass batch of rows to trigger function
Skip Check: Fast-path when no triggers defined (already partial)

BOTTLENECK 4: Double Serialization for Size Calculation

Location: src/storage/engine.rs:630-632 Impact: MEDIUM

// Serialize JUST to calculate size
let original_size = bincode::serialize(&tuple)
    .map_err(|e| ...)?
    .len();

Then later serializes again for compression.

Recommendations:

Estimate Size: Use bincode::serialized_size() without allocating
Calculate During Compression: Get size as byproduct of compression
Skip Metrics in Bulk Mode: Disable per-row metrics during bulk load

BOTTLENECK 5: Per-Row Column Compression

Location: src/storage/engine.rs:640-657 Impact: HIGH

Each tuple is compressed column-by-column individually:

let compressed = super::compression::compress_tuple(&tuple, &schema, ...)?;

Existing Batch APIs (not used for inserts):

compress_f64_batch() at src/storage/compression/integration.rs:850
compress_f32_batch() at src/storage/compression/integration.rs:919

Recommendations:

Columnar Batch Compression: Buffer N rows, compress columns together
Deferred Compression: Compress in background after batch commits
Mini-batch Strategy: Compress every 1000 rows as columnar block

BOTTLENECK 6: Per-Row String Key Allocation

Location: src/storage/engine.rs:690 Impact: MEDIUM

let key = format!("data:{}:{}", table_name, row_id).into_bytes();

Creates new String allocation for each row.

Recommendations:

Pre-allocated Buffer: Reuse buffer, only update row_id portion
Binary Key Format: Use fixed-size binary keys instead of strings
Batch Key Generation: Generate all keys upfront

BOTTLENECK 7: Per-Row Metadata Tracking

Location: src/storage/engine.rs:696-709 Impact: MEDIUM

// Three separate tracking calls per row:
self.mv_delta_tracker.record_insert(table_name, row_id, tuple.clone())?;
self.filter_delta_tracker.on_insert(table_name, row_id, &tuple, &schema);
self.speculative_filter_manager.on_insert(table_name, &col.name, value);

Recommendations:

Batch Delta Recording: Buffer deltas, flush at transaction commit
Disable During Bulk: Skip MV/filter tracking during bulk load
Async Tracking: Move tracking to background thread

BOTTLENECK 8: Transaction Commit Key Parsing

Location: src/storage/transaction.rs:371-395 Impact: MEDIUM

During commit, parses each key string to extract table/row info:

if let Ok(key_str) = std::str::from_utf8(key) {
    if key_str.starts_with("data:") {
        let rest = &key_str[5..];
        if let Some(colon_pos) = rest.find(':') {
            // Parse table_name and row_id
        }
    }
}

Recommendations:

Structured Keys: Store metadata alongside value in write_set
Binary Key Format: Use fixed-size binary format for O(1) parsing
Skip Version Index for Bulk: Defer version index creation

Proposed Optimizations

Priority 1: COPY Command Implementation

Estimated Improvement: 5-10x throughput

COPY orders FROM STDIN WITH (FORMAT CSV);
1,10.99,5
2,24.95,3
\.

Implementation approach:

Binary protocol support for raw data streaming
Skip SQL parsing entirely
Batch insert directly to storage layer
Columnar compression at batch level

Priority 2: Prepared Statement with Batch Binding

Estimated Improvement: 2-3x throughput

let stmt = db.prepare("INSERT INTO orders VALUES ($1, $2, $3)")?;
for row in data {
    stmt.bind(&row)?;  // No re-parsing
}
stmt.execute_batch()?;  // Single commit

Priority 3: Bulk Insert API

Estimated Improvement: 3-5x throughput

impl EmbeddedDatabase {
    pub fn bulk_insert(&self, table: &str, rows: Vec<Vec<Value>>) -> Result<u64> {
        // 1. Parse schema once
        // 2. Disable per-row tracking
        // 3. Columnar batch compression
        // 4. Single WAL entry for batch
        // 5. Atomic commit
    }
}

Priority 4: Configuration Flags for Bulk Mode

Estimated Improvement: 1.5-2x throughput

SET bulk_load_mode = true;
-- Disables: per-row metrics, MV tracking, speculative filters
-- Enables: deferred FK validation, batch compression

Existing Optimizations (Already Implemented)

SMFI Bulk Load Suspension: src/lib.rs:666-677
- Threshold-based (default: 100 rows)
- Auto-resumes on guard drop
- Configurable via SET smfi_bulk_load_threshold
DashMap Write Set: src/storage/transaction.rs:77
- Lock-free concurrent access
- Reduces contention
RocksDB WriteBatch: src/storage/transaction.rs:361
- Atomic multi-entry writes
- Better than row-by-row puts
WAL Group Commit: src/storage/wal.rs:274-308
- Batches multiple operations
- 10ms batch window

Benchmark Comparison

Scenario	Current	With COPY	Improvement
447M rows	~31k/s	~200k/s	6.5x
Time to 10GB	~4 hours	~37 min	6.5x

Estimates based on similar database implementations

Implementation Roadmap

Phase 1: Quick Wins (1-2 days)

Add bulk_load_mode configuration flag
Skip metrics collection during bulk mode
Pre-allocate key buffer for reuse

Phase 2: Batch Compression (3-5 days)

Implement mini-batch columnar compression
Buffer N rows before compression
Use existing compress_f64_batch() APIs

Phase 3: COPY Command (1-2 weeks)

Implement COPY FROM STDIN parser
Binary protocol support
Direct-to-storage path bypassing SQL layer

Phase 4: Prepared Statements Optimization (1 week)

Cache parsed plans for INSERTs
Batch binding API
Execute batch with single commit

Conclusion

The current architecture processes inserts row-by-row through multiple layers, each adding overhead. The key optimizations are:

Reduce parsing: COPY command or prepared statements
Batch compression: Columnar compression for better ratios and speed
Skip tracking: Disable non-essential tracking during bulk loads
Binary keys: Avoid string allocation/parsing

With these optimizations, bulk loading throughput should improve from ~31k rows/sec to ~200k+ rows/sec.