Skip to content

Bulk Loading Performance Analysis

Bulk Loading Performance Analysis

Date: 2026-01-16 Benchmark: 10GB ALP compression demo (~447M rows) Observed Rate: ~31,000-41,000 rows/sec (declining over time)

Executive Summary

The current INSERT path has several performance bottlenecks that limit bulk loading throughput. This analysis identifies 8 major bottlenecks with concrete improvement recommendations that could increase throughput by 5-10x.


Current INSERT Path Flow

SQL String ("INSERT INTO t VALUES (...)")
Parser::parse_one() ← BOTTLENECK 1: Per-statement parsing
Planner::statement_to_plan()
execute_in_transaction()
├─ Get Schema
├─ Suspend SMFI (if bulk) ← EXISTING OPTIMIZATION
├─ For each value_row:
│ ├─ Evaluate expressions
│ ├─ Auto-cast values
│ ├─ Fill defaults/nulls
│ ├─ FK validation ← BOTTLENECK 2: Per-row validation
│ ├─ BEFORE triggers ← BOTTLENECK 3: Per-row triggers
│ ├─ storage.insert_tuple()
│ │ ├─ Get row_id
│ │ ├─ Serialize for size ← BOTTLENECK 4: Double serialization
│ │ ├─ Compress per-col ← BOTTLENECK 5: No batch compression
│ │ ├─ Format key string ← BOTTLENECK 6: String allocation
│ │ ├─ WAL::append()
│ │ ├─ MV delta record ← BOTTLENECK 7: Per-row tracking
│ │ ├─ SMFI delta update
│ │ └─ Speculative filter ← BOTTLENECK 8: Per-row filter update
│ └─ AFTER triggers
└─ Return count
Transaction.commit()
├─ Parse key strings ← BOTTLENECK 9: String parsing
├─ WriteBatch for all entries ← EXISTING OPTIMIZATION
└─ db.write(batch) ← ATOMIC

Detailed Bottleneck Analysis

BOTTLENECK 1: Per-Statement SQL Parsing

Location: src/lib.rs:1720-1726 Impact: HIGH

Each INSERT statement is parsed independently:

let parser = sql::Parser::new();
let statement = parser.parse_one(sql)?;
let planner = sql::Planner::with_catalog(&catalog);
let plan = planner.statement_to_plan(statement)?;

Problem: Demo executes 447M individual INSERT INTO orders VALUES (...) statements.

Recommendations:

  1. Prepared Statement Caching: Cache parsed plan for repeated INSERTs
  2. Multi-row VALUES: Support INSERT INTO t VALUES (...), (...), (...) syntax
  3. COPY Command: Implement PostgreSQL COPY protocol for bulk loading

BOTTLENECK 2: Per-Row Foreign Key Validation

Location: src/lib.rs:787-819 Impact: MEDIUM (when FK constraints exist)

// For each row:
for fk in &fk_constraints {
if fk.deferred == DeferTiming::Immediate {
// Validate FK exists for each row
}
}

Recommendations:

  1. Batch FK Validation: Collect all FK values, validate in single query
  2. Defer by Default for Bulk: Auto-defer FK checks during bulk inserts
  3. Bloom Filter Pre-check: Use bloom filter to skip unnecessary lookups

BOTTLENECK 3: Per-Row Trigger Execution

Location: src/lib.rs:821-922 Impact: HIGH (when triggers exist)

// BEFORE INSERT triggers - per row
let trigger_ctx = TriggerRowContext::for_insert(&table_name, &tuple);
execute_triggers(&trigger_ctx, TriggerTiming::Before)?;

Recommendations:

  1. Statement-level Triggers: Support FOR EACH STATEMENT triggers
  2. Batch Trigger Context: Pass batch of rows to trigger function
  3. Skip Check: Fast-path when no triggers defined (already partial)

BOTTLENECK 4: Double Serialization for Size Calculation

Location: src/storage/engine.rs:630-632 Impact: MEDIUM

// Serialize JUST to calculate size
let original_size = bincode::serialize(&tuple)
.map_err(|e| ...)?
.len();

Then later serializes again for compression.

Recommendations:

  1. Estimate Size: Use bincode::serialized_size() without allocating
  2. Calculate During Compression: Get size as byproduct of compression
  3. Skip Metrics in Bulk Mode: Disable per-row metrics during bulk load

BOTTLENECK 5: Per-Row Column Compression

Location: src/storage/engine.rs:640-657 Impact: HIGH

Each tuple is compressed column-by-column individually:

let compressed = super::compression::compress_tuple(&tuple, &schema, ...)?;

Existing Batch APIs (not used for inserts):

  • compress_f64_batch() at src/storage/compression/integration.rs:850
  • compress_f32_batch() at src/storage/compression/integration.rs:919

Recommendations:

  1. Columnar Batch Compression: Buffer N rows, compress columns together
  2. Deferred Compression: Compress in background after batch commits
  3. Mini-batch Strategy: Compress every 1000 rows as columnar block

BOTTLENECK 6: Per-Row String Key Allocation

Location: src/storage/engine.rs:690 Impact: MEDIUM

let key = format!("data:{}:{}", table_name, row_id).into_bytes();

Creates new String allocation for each row.

Recommendations:

  1. Pre-allocated Buffer: Reuse buffer, only update row_id portion
  2. Binary Key Format: Use fixed-size binary keys instead of strings
  3. Batch Key Generation: Generate all keys upfront

BOTTLENECK 7: Per-Row Metadata Tracking

Location: src/storage/engine.rs:696-709 Impact: MEDIUM

// Three separate tracking calls per row:
self.mv_delta_tracker.record_insert(table_name, row_id, tuple.clone())?;
self.filter_delta_tracker.on_insert(table_name, row_id, &tuple, &schema);
self.speculative_filter_manager.on_insert(table_name, &col.name, value);

Recommendations:

  1. Batch Delta Recording: Buffer deltas, flush at transaction commit
  2. Disable During Bulk: Skip MV/filter tracking during bulk load
  3. Async Tracking: Move tracking to background thread

BOTTLENECK 8: Transaction Commit Key Parsing

Location: src/storage/transaction.rs:371-395 Impact: MEDIUM

During commit, parses each key string to extract table/row info:

if let Ok(key_str) = std::str::from_utf8(key) {
if key_str.starts_with("data:") {
let rest = &key_str[5..];
if let Some(colon_pos) = rest.find(':') {
// Parse table_name and row_id
}
}
}

Recommendations:

  1. Structured Keys: Store metadata alongside value in write_set
  2. Binary Key Format: Use fixed-size binary format for O(1) parsing
  3. Skip Version Index for Bulk: Defer version index creation

Proposed Optimizations

Priority 1: COPY Command Implementation

Estimated Improvement: 5-10x throughput

COPY orders FROM STDIN WITH (FORMAT CSV);
1,10.99,5
2,24.95,3
\.

Implementation approach:

  • Binary protocol support for raw data streaming
  • Skip SQL parsing entirely
  • Batch insert directly to storage layer
  • Columnar compression at batch level

Priority 2: Prepared Statement with Batch Binding

Estimated Improvement: 2-3x throughput

let stmt = db.prepare("INSERT INTO orders VALUES ($1, $2, $3)")?;
for row in data {
stmt.bind(&row)?; // No re-parsing
}
stmt.execute_batch()?; // Single commit

Priority 3: Bulk Insert API

Estimated Improvement: 3-5x throughput

impl EmbeddedDatabase {
pub fn bulk_insert(&self, table: &str, rows: Vec<Vec<Value>>) -> Result<u64> {
// 1. Parse schema once
// 2. Disable per-row tracking
// 3. Columnar batch compression
// 4. Single WAL entry for batch
// 5. Atomic commit
}
}

Priority 4: Configuration Flags for Bulk Mode

Estimated Improvement: 1.5-2x throughput

SET bulk_load_mode = true;
-- Disables: per-row metrics, MV tracking, speculative filters
-- Enables: deferred FK validation, batch compression

Existing Optimizations (Already Implemented)

  1. SMFI Bulk Load Suspension: src/lib.rs:666-677

    • Threshold-based (default: 100 rows)
    • Auto-resumes on guard drop
    • Configurable via SET smfi_bulk_load_threshold
  2. DashMap Write Set: src/storage/transaction.rs:77

    • Lock-free concurrent access
    • Reduces contention
  3. RocksDB WriteBatch: src/storage/transaction.rs:361

    • Atomic multi-entry writes
    • Better than row-by-row puts
  4. WAL Group Commit: src/storage/wal.rs:274-308

    • Batches multiple operations
    • 10ms batch window

Benchmark Comparison

ScenarioCurrentWith COPYImprovement
447M rows~31k/s~200k/s6.5x
Time to 10GB~4 hours~37 min6.5x

Estimates based on similar database implementations


Implementation Roadmap

Phase 1: Quick Wins (1-2 days)

  • Add bulk_load_mode configuration flag
  • Skip metrics collection during bulk mode
  • Pre-allocate key buffer for reuse

Phase 2: Batch Compression (3-5 days)

  • Implement mini-batch columnar compression
  • Buffer N rows before compression
  • Use existing compress_f64_batch() APIs

Phase 3: COPY Command (1-2 weeks)

  • Implement COPY FROM STDIN parser
  • Binary protocol support
  • Direct-to-storage path bypassing SQL layer

Phase 4: Prepared Statements Optimization (1 week)

  • Cache parsed plans for INSERTs
  • Batch binding API
  • Execute batch with single commit

Conclusion

The current architecture processes inserts row-by-row through multiple layers, each adding overhead. The key optimizations are:

  1. Reduce parsing: COPY command or prepared statements
  2. Batch compression: Columnar compression for better ratios and speed
  3. Skip tracking: Disable non-essential tracking during bulk loads
  4. Binary keys: Avoid string allocation/parsing

With these optimizations, bulk loading throughput should improve from ~31k rows/sec to ~200k+ rows/sec.