Database Sink Benchmark Implementation Plan

Version: 1.0 Date: 2025-10-29 Owner: Performance Benchmarker Agent Status: Ready for Implementation

Executive Summary

This document outlines the comprehensive benchmarking strategy for the F1.3 Database Sink connector to meet aggressive Phase 2 performance targets:

Throughput: >100K events/sec
Latency P99: <100ms
Checkpoint overhead: <5%
Connection utilization: 50-80%
Memory per sink: <100MB

Benchmark Architecture

┌─────────────────────────────────────────────────────────────┐
│              Database Sink Benchmark Suite                   │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐    ┌──────────────┐    ┌───────────────┐ │
│  │  Throughput  │    │   Latency    │    │   Memory      │ │
│  │  Benchmarks  │    │ Benchmarks   │    │  Profiling    │ │
│  └──────────────┘    └──────────────┘    └───────────────┘ │
│                                                               │
│  ┌──────────────┐    ┌──────────────┐    ┌───────────────┐ │
│  │ Connection   │    │ Transaction  │    │  Batching     │ │
│  │Pool Benches  │    │Manager 2PC   │    │  Strategy     │ │
│  └──────────────┘    └──────────────┘    └───────────────┘ │
│                                                               │
│  ┌──────────────┐    ┌──────────────┐                       │
│  │ Concurrency  │    │  Regression  │                       │
│  │   Testing    │    │    Tests     │                       │
│  └──────────────┘    └──────────────┘                       │
│                                                               │
└─────────────────────────────────────────────────────────────┘

1. Throughput Benchmarks

1.1 Single-Threaded Throughput

Goal: Measure maximum sustained throughput with single sink instance

Test Cases:

Small Batches: 10, 50, 100 rows per batch
Medium Batches: 500, 1000, 2000 rows per batch
Large Batches: 5000, 10000 rows per batch

Metrics:

Events/second sustained rate
Batch efficiency ratio
Memory allocation rate
CPU utilization per event

Implementation:

#[bench]
fn bench_throughput_single_thread(c: &mut Criterion) {
    let mut group = c.benchmark_group("throughput_single");

    for batch_size in [100, 1000, 10000] {
        group.throughput(Throughput::Elements(batch_size));
        group.bench_with_input(
            BenchmarkId::from_parameter(batch_size),
            &batch_size,
            |b, &size| {
                b.to_async(Runtime::new().unwrap())
                    .iter(|| write_batch(size))
            }
        );
    }
}

1.2 Multi-Threaded Throughput

Goal: Measure throughput under concurrent load

Test Cases:

2, 4, 8, 16, 32 concurrent sinks
Mixed batch sizes
Shared connection pool

Metrics:

Aggregate throughput
Per-sink throughput
Lock contention metrics
Connection pool saturation

1.3 Throughput Under Different Write Modes

Goal: Compare INSERT vs UPSERT vs REPLACE performance

Test Cases:

INSERT only
UPSERT with 50% updates
UPSERT with 90% updates
REPLACE mode

Expected Results:

INSERT: ~100K events/sec
UPSERT (50% update): ~80K events/sec
UPSERT (90% update): ~60K events/sec
REPLACE: ~50K events/sec

2. Latency Benchmarks

2.1 End-to-End Latency

Goal: Measure write() to flush() complete latency

Test Cases:

Single row writes
Small batch (10 rows)
Medium batch (1000 rows)
Large batch (10000 rows)

Metrics Distribution:

P50, P90, P95, P99, P99.9, Max
Mean and standard deviation
Outlier detection (> 3σ)

Target:

P50: <10ms
P95: <50ms
P99: <100ms
P99.9: <500ms

2.2 Component-Level Latency

Goal: Break down latency by component

Components:

Buffer Add: Time to add row to buffer
Serialization: Row to DatabaseRow conversion
Connection Acquire: Pool acquisition time
Transaction Begin: Begin transaction overhead
Execute Write: Database write execution
Transaction Prepare: Phase 1 (2PC) overhead
Transaction Commit: Phase 2 (2PC) overhead

Implementation:

struct LatencyBreakdown {
    buffer_add_ns: u64,
    serialization_ns: u64,
    conn_acquire_ns: u64,
    txn_begin_ns: u64,
    write_execute_ns: u64,
    txn_prepare_ns: u64,
    txn_commit_ns: u64,
    total_ns: u64,
}

2.3 Latency Under Load

Goal: Measure latency degradation under high throughput

Test Scenarios:

10K events/sec load → measure P99
50K events/sec load → measure P99
80K events/sec load → measure P99
100K events/sec load → measure P99

Expected Degradation:

Should remain <100ms P99 up to 80K events/sec
May degrade at 100K+ events/sec

3. Connection Pool Benchmarks

3.1 Connection Acquisition Latency

Goal: Measure pool.acquire() performance

Test Cases:

Warm pool (all connections idle)
Cold pool (create new connections)
Mixed (some idle, some in use)
Exhausted pool (wait for release)

Metrics:

Acquisition time P50, P99
Connection creation time
Wait time when exhausted

Target:

Warm acquire: <1ms P99
Cold acquire: <50ms P99 (network + handshake)
Exhausted wait: <100ms P99

3.2 Connection Pool Under Concurrency

Goal: Test pool behavior under high concurrent load

Test Cases:

10, 50, 100, 200 concurrent acquires
Pool sizes: 2, 5, 10, 20 connections
Measure contention and queuing

Metrics:

Average wait time
Queue depth distribution
Connection utilization rate
Semaphore acquisition time

3.3 Health Check Overhead

Goal: Measure cost of connection health checks

Test Cases:

Health check every 10s, 30s, 60s
Idle connection timeout: 300s, 600s
Max lifetime: 1800s, 3600s

Metrics:

Health check execution time
Impact on throughput
False positive rate (good connections marked bad)

Target: <1% throughput impact

4. Transaction Manager (2PC) Benchmarks

4.1 Two-Phase Commit Overhead

Goal: Measure 2PC protocol overhead vs simple commit

Comparison:

Simple commit (enable_2pc = false)
2PC commit (enable_2pc = true)
State backend write overhead

Test Cases:

1, 10, 100, 1000 operations per transaction
Measure prepare phase time
Measure commit phase time

Expected Overhead:

Prepare: +5-10ms
Commit: +2-5ms
State backend write: +1-3ms
Total overhead: ~10-20ms per transaction

4.2 Recovery Performance

Goal: Measure recovery time after crash

Test Scenarios:

1, 10, 100, 1000 prepared transactions
Measure recovery.recover() time
Measure commit of prepared transactions

Metrics:

Recovery initialization time
Per-transaction recovery time
Total recovery duration

Target: <1 second for 100 prepared transactions

4.3 Transaction Retry Performance

Goal: Measure retry mechanism overhead

Test Cases:

Transient failures (simulated)
Retry backoff: 100ms, 500ms, 2000ms
Max retries: 1, 3, 5

Metrics:

Time to successful commit after retries
Retry overhead per attempt
Success rate after N retries

5. Batching Strategy Benchmarks

5.1 Batch Size Optimization

Goal: Find optimal batch size for throughput

Test Range: 10, 50, 100, 500, 1000, 2000, 5000, 10000 rows

Metrics:

Throughput (events/sec)
Latency (P99)
Memory usage
Batch efficiency

Expected Optimal: 1000-2000 rows per batch

5.2 Flush Interval Impact

Goal: Measure time-based flushing impact

Test Cases:

Flush intervals: 1s, 5s, 10s, 30s
Low throughput scenario (100 events/sec)
High throughput scenario (50K events/sec)

Metrics:

Average buffering time
Buffer utilization
Flush trigger ratio (size vs time)

Target: 80%+ size-based flushes, <20% time-based

5.3 Batch Efficiency Under Load

Goal: Measure batching efficiency at different rates

Test Scenarios:

Bursty traffic (10K events, then idle)
Steady traffic (constant 50K events/sec)
Variable traffic (sine wave pattern)

Metrics:

Batch utilization (actual / max batch size)
Flush trigger distribution
Wasted capacity

6. Memory Profiling

6.1 Memory Usage Per Sink

Goal: Ensure <100MB per sink instance

Components:

WriteBuffer memory
Transaction state
Connection pool overhead
Metrics storage

Test Cases:

Idle state (no writes)
Active writing (sustained load)
Peak load (bursts)

Breakdown Target:

WriteBuffer:       10-20 MB  (10K rows × 1-2KB)
Connection Pool:   5-10 MB   (10 connections)
Transaction State: 5-10 MB   (active transactions)
Metrics:          1-5 MB    (counters, histograms)
Overhead:         10-20 MB  (Arc, RwLock, etc.)
─────────────────────────────
Total:            30-65 MB  (well under 100MB)

6.2 Allocation Rate

Goal: Minimize allocations in hot paths

Critical Paths:

WriteBuffer::add() - Should reuse Vec capacity
convert_row() - Minimize temporary allocations
serialize_row() - Use fixed-size buffers
TransactionManager::add_operation() - Pool operations

Target: <1000 allocations per 1000 events

6.3 Memory Leak Detection

Goal: Ensure no memory leaks over time

Test:

Run for 1 hour continuous writes
Monitor memory growth
Sample heap at intervals
Check for leaked Arc/RwLock references

Target: <1% memory growth per hour

7. Concurrency & Contention

7.1 Lock Contention Analysis

Goal: Identify lock bottlenecks

Hot Locks:

write_buffer: RwLock<WriteBuffer>
active_connections: RwLock<Vec<PooledConnection>>
active_transactions: RwLock<HashMap<...>>

Metrics:

Lock acquisition time
Hold duration
Contention rate (waits / acquires)

Optimization Strategy:

Use parking_lot for faster RwLock
Consider lock-free structures (DashMap)
Reduce critical section size

7.2 Async Task Scheduling

Goal: Optimize tokio task distribution

Test Cases:

Spawn task per write vs batched execution
Work-stealing vs dedicated threads
Task spawn overhead

Metrics:

Task queue depth
Worker thread utilization
Context switch rate

8. Checkpoint Overhead

8.1 Checkpoint Latency

Goal: Measure checkpoint() execution time

Test Cases:

Empty buffer (no flush needed)
Partial buffer (requires flush)
During active writes

Target: <50ms P99 checkpoint time

8.2 Checkpoint Impact on Throughput

Goal: Measure throughput impact during checkpoint

Test:

Baseline: sustained 80K events/sec
Trigger checkpoint every 10s
Measure throughput dip

Target: <5% throughput reduction during checkpoint

8.3 Checkpoint Frequency Optimization

Goal: Find optimal checkpoint interval

Test Range: 1s, 5s, 10s, 30s, 60s

Trade-off:

Frequent checkpoints: Lower recovery time, higher overhead
Infrequent checkpoints: Higher recovery time, lower overhead

Recommendation: 10-30s for production

9. Regression Test Suite

9.1 Performance Regression Detection

Goal: Detect performance regressions in CI/CD

Baseline Metrics (to be established):

throughput_single_thread_1000_batch: 50000 events/sec  ±5%
latency_p99_medium_batch: 80ms  ±10%
connection_acquire_warm: 0.5ms  ±20%
checkpoint_overhead: 3%  ±1%
memory_per_sink: 50MB  ±20MB

CI Integration:

# Run benchmarks on every commit to main
cargo bench --bench database_sink_bench -- --save-baseline main

# Compare against baseline
cargo bench --bench database_sink_bench -- --baseline main

Alert Thresholds:

10% throughput decrease → Warning
20% throughput decrease → Error
15% latency increase → Warning
30% latency increase → Error

9.2 Continuous Benchmark Dashboard

Goal: Track performance trends over time

Metrics to Track:

Throughput trend (weekly)
Latency percentiles (daily)
Memory usage (daily)
Regression count (per release)

Tools:

Criterion.rs for benchmarking
gnuplot for visualization
Prometheus/Grafana for monitoring

10. Optimization Recommendations

10.1 Hot Path Optimizations

Optimization 1: Batch Serialization

Current: Serialize rows one-by-one

// CURRENT (slow)
for row in rows {
    let db_row = self.convert_row(row)?;
    db_rows.push(db_row);
}

Optimized: Batch serialize with pre-allocation

// OPTIMIZED
let mut db_rows = Vec::with_capacity(rows.len());
for row in rows {
    db_rows.push(self.convert_row_unchecked(row));
}

Expected Gain: +10-15% throughput

Optimization 2: Connection Pool Warm-Up

Current: Connections created on-demand Optimized: Pre-warm min_connections at startup Expected Gain: -50% initial latency spike

Optimization 3: Zero-Copy Serialization

Current: Placeholder serialize_row() returns empty Vec Optimized: Use bincode or serde_json with buffer reuse Expected Gain: -20% allocation rate

Optimization 4: Lock-Free Write Buffer

Current: RwLock<WriteBuffer> on every write Optimized: Use crossbeam::queue::SegQueue or channel Expected Gain: -30% lock contention

10.2 Async Optimization

Optimization 5: Batch Connection Acquire

Current: Acquire connection per flush Optimized: Hold connection across multiple batches (with health check) Expected Gain: -10ms average latency

Optimization 6: Parallel Transaction Prepare

Current: Sequential prepare/commit Optimized: Parallel prepare for independent transactions Expected Gain: +50% throughput for multi-sink

10.3 Memory Optimization

Optimization 7: Object Pooling

Current: Allocate Row/DatabaseRow per write Optimized: Pool and reuse Row objects Expected Gain: -50% allocation rate

Optimization 8: Compact Row Representation

Current: HashMap for fields (heavy) Optimized: Columnar or array-based representation Expected Gain: -30% memory per row

11. Benchmark Execution Plan

Phase 1: Baseline (Day 1)

Implement core benchmark suite
Establish baseline metrics
Identify top 3 bottlenecks

Phase 2: Optimization (Day 2)

Implement hot path optimizations
Re-run benchmarks
Measure improvement

Phase 3: Stress Testing (Day 3)

Long-running stability tests
Memory leak detection
Concurrency edge cases

Phase 4: Regression Tests (Day 4)

Integrate into CI/CD
Set up monitoring
Document final results

12. Deliverables

12.1 Code Deliverables

benches/database_sink_bench.rs - Comprehensive benchmark suite
⏳ benches/connection_pool_bench.rs - Pool-specific benchmarks
⏳ benches/transaction_bench.rs - 2PC benchmarks
⏳ scripts/benchmark_runner.sh - Automation script

12.2 Documentation Deliverables

Benchmark Implementation Plan (this document)
⏳ Performance Analysis Report
⏳ Optimization Guide
⏳ Regression Test Setup

12.3 Data Deliverables

⏳ Baseline benchmark results
⏳ Flamegraph profiles
⏳ Memory allocation reports
⏳ Comparison matrices

13. Success Criteria

Must-Have Targets

Throughput >100K events/sec (single sink)
Latency P99 <100ms
Memory <100MB per sink
Checkpoint overhead <5%
Connection utilization 50-80%

Nice-to-Have Targets

Throughput >200K events/sec (aggressive optimization)
Latency P99 <50ms
Memory <50MB per sink
Zero-downtime checkpoint

Stretch Goals

Throughput >500K events/sec (multi-sink aggregated)
Latency P50 <5ms
Sub-millisecond connection acquire (warm)

14. Risk & Mitigation

Risk	Impact	Probability	Mitigation
2PC overhead too high	High	Medium	Implement async prepare, optimize state backend
Connection pool exhaustion	High	Medium	Dynamic pool sizing, better monitoring
Memory leaks in Arc/RwLock	High	Low	Rigorous leak testing, use weak refs
Lock contention bottleneck	Medium	High	Switch to lock-free structures
Benchmark instability	Low	Medium	Multiple iterations, outlier removal

15. References

Status: Plan Complete - Ready for Implementation Next Step: Implement database_sink_bench.rs Owner: Performance Benchmarker Agent Review Date: 2025-10-30