Skip to content

Database Sink Benchmark Implementation Plan

Database Sink Benchmark Implementation Plan

Version: 1.0 Date: 2025-10-29 Owner: Performance Benchmarker Agent Status: Ready for Implementation

Executive Summary

This document outlines the comprehensive benchmarking strategy for the F1.3 Database Sink connector to meet aggressive Phase 2 performance targets:

  • Throughput: >100K events/sec
  • Latency P99: <100ms
  • Checkpoint overhead: <5%
  • Connection utilization: 50-80%
  • Memory per sink: <100MB

Benchmark Architecture

┌─────────────────────────────────────────────────────────────┐
│ Database Sink Benchmark Suite │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Throughput │ │ Latency │ │ Memory │ │
│ │ Benchmarks │ │ Benchmarks │ │ Profiling │ │
│ └──────────────┘ └──────────────┘ └───────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Connection │ │ Transaction │ │ Batching │ │
│ │Pool Benches │ │Manager 2PC │ │ Strategy │ │
│ └──────────────┘ └──────────────┘ └───────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Concurrency │ │ Regression │ │
│ │ Testing │ │ Tests │ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

1. Throughput Benchmarks

1.1 Single-Threaded Throughput

Goal: Measure maximum sustained throughput with single sink instance

Test Cases:

  • Small Batches: 10, 50, 100 rows per batch
  • Medium Batches: 500, 1000, 2000 rows per batch
  • Large Batches: 5000, 10000 rows per batch

Metrics:

  • Events/second sustained rate
  • Batch efficiency ratio
  • Memory allocation rate
  • CPU utilization per event

Implementation:

#[bench]
fn bench_throughput_single_thread(c: &mut Criterion) {
let mut group = c.benchmark_group("throughput_single");
for batch_size in [100, 1000, 10000] {
group.throughput(Throughput::Elements(batch_size));
group.bench_with_input(
BenchmarkId::from_parameter(batch_size),
&batch_size,
|b, &size| {
b.to_async(Runtime::new().unwrap())
.iter(|| write_batch(size))
}
);
}
}

1.2 Multi-Threaded Throughput

Goal: Measure throughput under concurrent load

Test Cases:

  • 2, 4, 8, 16, 32 concurrent sinks
  • Mixed batch sizes
  • Shared connection pool

Metrics:

  • Aggregate throughput
  • Per-sink throughput
  • Lock contention metrics
  • Connection pool saturation

1.3 Throughput Under Different Write Modes

Goal: Compare INSERT vs UPSERT vs REPLACE performance

Test Cases:

  • INSERT only
  • UPSERT with 50% updates
  • UPSERT with 90% updates
  • REPLACE mode

Expected Results:

  • INSERT: ~100K events/sec
  • UPSERT (50% update): ~80K events/sec
  • UPSERT (90% update): ~60K events/sec
  • REPLACE: ~50K events/sec

2. Latency Benchmarks

2.1 End-to-End Latency

Goal: Measure write() to flush() complete latency

Test Cases:

  • Single row writes
  • Small batch (10 rows)
  • Medium batch (1000 rows)
  • Large batch (10000 rows)

Metrics Distribution:

  • P50, P90, P95, P99, P99.9, Max
  • Mean and standard deviation
  • Outlier detection (> 3σ)

Target:

  • P50: <10ms
  • P95: <50ms
  • P99: <100ms
  • P99.9: <500ms

2.2 Component-Level Latency

Goal: Break down latency by component

Components:

  1. Buffer Add: Time to add row to buffer
  2. Serialization: Row to DatabaseRow conversion
  3. Connection Acquire: Pool acquisition time
  4. Transaction Begin: Begin transaction overhead
  5. Execute Write: Database write execution
  6. Transaction Prepare: Phase 1 (2PC) overhead
  7. Transaction Commit: Phase 2 (2PC) overhead

Implementation:

struct LatencyBreakdown {
buffer_add_ns: u64,
serialization_ns: u64,
conn_acquire_ns: u64,
txn_begin_ns: u64,
write_execute_ns: u64,
txn_prepare_ns: u64,
txn_commit_ns: u64,
total_ns: u64,
}

2.3 Latency Under Load

Goal: Measure latency degradation under high throughput

Test Scenarios:

  • 10K events/sec load → measure P99
  • 50K events/sec load → measure P99
  • 80K events/sec load → measure P99
  • 100K events/sec load → measure P99

Expected Degradation:

  • Should remain <100ms P99 up to 80K events/sec
  • May degrade at 100K+ events/sec

3. Connection Pool Benchmarks

3.1 Connection Acquisition Latency

Goal: Measure pool.acquire() performance

Test Cases:

  • Warm pool (all connections idle)
  • Cold pool (create new connections)
  • Mixed (some idle, some in use)
  • Exhausted pool (wait for release)

Metrics:

  • Acquisition time P50, P99
  • Connection creation time
  • Wait time when exhausted

Target:

  • Warm acquire: <1ms P99
  • Cold acquire: <50ms P99 (network + handshake)
  • Exhausted wait: <100ms P99

3.2 Connection Pool Under Concurrency

Goal: Test pool behavior under high concurrent load

Test Cases:

  • 10, 50, 100, 200 concurrent acquires
  • Pool sizes: 2, 5, 10, 20 connections
  • Measure contention and queuing

Metrics:

  • Average wait time
  • Queue depth distribution
  • Connection utilization rate
  • Semaphore acquisition time

3.3 Health Check Overhead

Goal: Measure cost of connection health checks

Test Cases:

  • Health check every 10s, 30s, 60s
  • Idle connection timeout: 300s, 600s
  • Max lifetime: 1800s, 3600s

Metrics:

  • Health check execution time
  • Impact on throughput
  • False positive rate (good connections marked bad)

Target: <1% throughput impact

4. Transaction Manager (2PC) Benchmarks

4.1 Two-Phase Commit Overhead

Goal: Measure 2PC protocol overhead vs simple commit

Comparison:

  • Simple commit (enable_2pc = false)
  • 2PC commit (enable_2pc = true)
  • State backend write overhead

Test Cases:

  • 1, 10, 100, 1000 operations per transaction
  • Measure prepare phase time
  • Measure commit phase time

Expected Overhead:

  • Prepare: +5-10ms
  • Commit: +2-5ms
  • State backend write: +1-3ms
  • Total overhead: ~10-20ms per transaction

4.2 Recovery Performance

Goal: Measure recovery time after crash

Test Scenarios:

  • 1, 10, 100, 1000 prepared transactions
  • Measure recovery.recover() time
  • Measure commit of prepared transactions

Metrics:

  • Recovery initialization time
  • Per-transaction recovery time
  • Total recovery duration

Target: <1 second for 100 prepared transactions

4.3 Transaction Retry Performance

Goal: Measure retry mechanism overhead

Test Cases:

  • Transient failures (simulated)
  • Retry backoff: 100ms, 500ms, 2000ms
  • Max retries: 1, 3, 5

Metrics:

  • Time to successful commit after retries
  • Retry overhead per attempt
  • Success rate after N retries

5. Batching Strategy Benchmarks

5.1 Batch Size Optimization

Goal: Find optimal batch size for throughput

Test Range: 10, 50, 100, 500, 1000, 2000, 5000, 10000 rows

Metrics:

  • Throughput (events/sec)
  • Latency (P99)
  • Memory usage
  • Batch efficiency

Expected Optimal: 1000-2000 rows per batch

5.2 Flush Interval Impact

Goal: Measure time-based flushing impact

Test Cases:

  • Flush intervals: 1s, 5s, 10s, 30s
  • Low throughput scenario (100 events/sec)
  • High throughput scenario (50K events/sec)

Metrics:

  • Average buffering time
  • Buffer utilization
  • Flush trigger ratio (size vs time)

Target: 80%+ size-based flushes, <20% time-based

5.3 Batch Efficiency Under Load

Goal: Measure batching efficiency at different rates

Test Scenarios:

  • Bursty traffic (10K events, then idle)
  • Steady traffic (constant 50K events/sec)
  • Variable traffic (sine wave pattern)

Metrics:

  • Batch utilization (actual / max batch size)
  • Flush trigger distribution
  • Wasted capacity

6. Memory Profiling

6.1 Memory Usage Per Sink

Goal: Ensure <100MB per sink instance

Components:

  • WriteBuffer memory
  • Transaction state
  • Connection pool overhead
  • Metrics storage

Test Cases:

  • Idle state (no writes)
  • Active writing (sustained load)
  • Peak load (bursts)

Breakdown Target:

WriteBuffer: 10-20 MB (10K rows × 1-2KB)
Connection Pool: 5-10 MB (10 connections)
Transaction State: 5-10 MB (active transactions)
Metrics: 1-5 MB (counters, histograms)
Overhead: 10-20 MB (Arc, RwLock, etc.)
─────────────────────────────
Total: 30-65 MB (well under 100MB)

6.2 Allocation Rate

Goal: Minimize allocations in hot paths

Critical Paths:

  1. WriteBuffer::add() - Should reuse Vec capacity
  2. convert_row() - Minimize temporary allocations
  3. serialize_row() - Use fixed-size buffers
  4. TransactionManager::add_operation() - Pool operations

Target: <1000 allocations per 1000 events

6.3 Memory Leak Detection

Goal: Ensure no memory leaks over time

Test:

  • Run for 1 hour continuous writes
  • Monitor memory growth
  • Sample heap at intervals
  • Check for leaked Arc/RwLock references

Target: <1% memory growth per hour

7. Concurrency & Contention

7.1 Lock Contention Analysis

Goal: Identify lock bottlenecks

Hot Locks:

  • write_buffer: RwLock<WriteBuffer>
  • active_connections: RwLock<Vec<PooledConnection>>
  • active_transactions: RwLock<HashMap<...>>

Metrics:

  • Lock acquisition time
  • Hold duration
  • Contention rate (waits / acquires)

Optimization Strategy:

  • Use parking_lot for faster RwLock
  • Consider lock-free structures (DashMap)
  • Reduce critical section size

7.2 Async Task Scheduling

Goal: Optimize tokio task distribution

Test Cases:

  • Spawn task per write vs batched execution
  • Work-stealing vs dedicated threads
  • Task spawn overhead

Metrics:

  • Task queue depth
  • Worker thread utilization
  • Context switch rate

8. Checkpoint Overhead

8.1 Checkpoint Latency

Goal: Measure checkpoint() execution time

Test Cases:

  • Empty buffer (no flush needed)
  • Partial buffer (requires flush)
  • During active writes

Target: <50ms P99 checkpoint time

8.2 Checkpoint Impact on Throughput

Goal: Measure throughput impact during checkpoint

Test:

  • Baseline: sustained 80K events/sec
  • Trigger checkpoint every 10s
  • Measure throughput dip

Target: <5% throughput reduction during checkpoint

8.3 Checkpoint Frequency Optimization

Goal: Find optimal checkpoint interval

Test Range: 1s, 5s, 10s, 30s, 60s

Trade-off:

  • Frequent checkpoints: Lower recovery time, higher overhead
  • Infrequent checkpoints: Higher recovery time, lower overhead

Recommendation: 10-30s for production

9. Regression Test Suite

9.1 Performance Regression Detection

Goal: Detect performance regressions in CI/CD

Baseline Metrics (to be established):

throughput_single_thread_1000_batch: 50000 events/sec ±5%
latency_p99_medium_batch: 80ms ±10%
connection_acquire_warm: 0.5ms ±20%
checkpoint_overhead: 3% ±1%
memory_per_sink: 50MB ±20MB

CI Integration:

Terminal window
# Run benchmarks on every commit to main
cargo bench --bench database_sink_bench -- --save-baseline main
# Compare against baseline
cargo bench --bench database_sink_bench -- --baseline main

Alert Thresholds:

  • 10% throughput decrease → Warning

  • 20% throughput decrease → Error

  • 15% latency increase → Warning

  • 30% latency increase → Error

9.2 Continuous Benchmark Dashboard

Goal: Track performance trends over time

Metrics to Track:

  • Throughput trend (weekly)
  • Latency percentiles (daily)
  • Memory usage (daily)
  • Regression count (per release)

Tools:

  • Criterion.rs for benchmarking
  • gnuplot for visualization
  • Prometheus/Grafana for monitoring

10. Optimization Recommendations

10.1 Hot Path Optimizations

Optimization 1: Batch Serialization

Current: Serialize rows one-by-one

// CURRENT (slow)
for row in rows {
let db_row = self.convert_row(row)?;
db_rows.push(db_row);
}

Optimized: Batch serialize with pre-allocation

// OPTIMIZED
let mut db_rows = Vec::with_capacity(rows.len());
for row in rows {
db_rows.push(self.convert_row_unchecked(row));
}

Expected Gain: +10-15% throughput

Optimization 2: Connection Pool Warm-Up

Current: Connections created on-demand Optimized: Pre-warm min_connections at startup Expected Gain: -50% initial latency spike

Optimization 3: Zero-Copy Serialization

Current: Placeholder serialize_row() returns empty Vec Optimized: Use bincode or serde_json with buffer reuse Expected Gain: -20% allocation rate

Optimization 4: Lock-Free Write Buffer

Current: RwLock<WriteBuffer> on every write Optimized: Use crossbeam::queue::SegQueue or channel Expected Gain: -30% lock contention

10.2 Async Optimization

Optimization 5: Batch Connection Acquire

Current: Acquire connection per flush Optimized: Hold connection across multiple batches (with health check) Expected Gain: -10ms average latency

Optimization 6: Parallel Transaction Prepare

Current: Sequential prepare/commit Optimized: Parallel prepare for independent transactions Expected Gain: +50% throughput for multi-sink

10.3 Memory Optimization

Optimization 7: Object Pooling

Current: Allocate Row/DatabaseRow per write Optimized: Pool and reuse Row objects Expected Gain: -50% allocation rate

Optimization 8: Compact Row Representation

Current: HashMap for fields (heavy) Optimized: Columnar or array-based representation Expected Gain: -30% memory per row

11. Benchmark Execution Plan

Phase 1: Baseline (Day 1)

  • Implement core benchmark suite
  • Establish baseline metrics
  • Identify top 3 bottlenecks

Phase 2: Optimization (Day 2)

  • Implement hot path optimizations
  • Re-run benchmarks
  • Measure improvement

Phase 3: Stress Testing (Day 3)

  • Long-running stability tests
  • Memory leak detection
  • Concurrency edge cases

Phase 4: Regression Tests (Day 4)

  • Integrate into CI/CD
  • Set up monitoring
  • Document final results

12. Deliverables

12.1 Code Deliverables

  1. benches/database_sink_bench.rs - Comprehensive benchmark suite
  2. benches/connection_pool_bench.rs - Pool-specific benchmarks
  3. benches/transaction_bench.rs - 2PC benchmarks
  4. scripts/benchmark_runner.sh - Automation script

12.2 Documentation Deliverables

  1. Benchmark Implementation Plan (this document)
  2. ⏳ Performance Analysis Report
  3. ⏳ Optimization Guide
  4. ⏳ Regression Test Setup

12.3 Data Deliverables

  1. ⏳ Baseline benchmark results
  2. ⏳ Flamegraph profiles
  3. ⏳ Memory allocation reports
  4. ⏳ Comparison matrices

13. Success Criteria

Must-Have Targets

  • Throughput >100K events/sec (single sink)
  • Latency P99 <100ms
  • Memory <100MB per sink
  • Checkpoint overhead <5%
  • Connection utilization 50-80%

Nice-to-Have Targets

  • Throughput >200K events/sec (aggressive optimization)
  • Latency P99 <50ms
  • Memory <50MB per sink
  • Zero-downtime checkpoint

Stretch Goals

  • Throughput >500K events/sec (multi-sink aggregated)
  • Latency P50 <5ms
  • Sub-millisecond connection acquire (warm)

14. Risk & Mitigation

RiskImpactProbabilityMitigation
2PC overhead too highHighMediumImplement async prepare, optimize state backend
Connection pool exhaustionHighMediumDynamic pool sizing, better monitoring
Memory leaks in Arc/RwLockHighLowRigorous leak testing, use weak refs
Lock contention bottleneckMediumHighSwitch to lock-free structures
Benchmark instabilityLowMediumMultiple iterations, outlier removal

15. References


Status: Plan Complete - Ready for Implementation Next Step: Implement database_sink_bench.rs Owner: Performance Benchmarker Agent Review Date: 2025-10-30