Database Sink Benchmark Implementation Plan
Database Sink Benchmark Implementation Plan
Version: 1.0 Date: 2025-10-29 Owner: Performance Benchmarker Agent Status: Ready for Implementation
Executive Summary
This document outlines the comprehensive benchmarking strategy for the F1.3 Database Sink connector to meet aggressive Phase 2 performance targets:
- Throughput: >100K events/sec
- Latency P99: <100ms
- Checkpoint overhead: <5%
- Connection utilization: 50-80%
- Memory per sink: <100MB
Benchmark Architecture
┌─────────────────────────────────────────────────────────────┐│ Database Sink Benchmark Suite │├─────────────────────────────────────────────────────────────┤│ ││ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ ││ │ Throughput │ │ Latency │ │ Memory │ ││ │ Benchmarks │ │ Benchmarks │ │ Profiling │ ││ └──────────────┘ └──────────────┘ └───────────────┘ ││ ││ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ ││ │ Connection │ │ Transaction │ │ Batching │ ││ │Pool Benches │ │Manager 2PC │ │ Strategy │ ││ └──────────────┘ └──────────────┘ └───────────────┘ ││ ││ ┌──────────────┐ ┌──────────────┐ ││ │ Concurrency │ │ Regression │ ││ │ Testing │ │ Tests │ ││ └──────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────────────────────┘1. Throughput Benchmarks
1.1 Single-Threaded Throughput
Goal: Measure maximum sustained throughput with single sink instance
Test Cases:
- Small Batches: 10, 50, 100 rows per batch
- Medium Batches: 500, 1000, 2000 rows per batch
- Large Batches: 5000, 10000 rows per batch
Metrics:
- Events/second sustained rate
- Batch efficiency ratio
- Memory allocation rate
- CPU utilization per event
Implementation:
#[bench]fn bench_throughput_single_thread(c: &mut Criterion) { let mut group = c.benchmark_group("throughput_single");
for batch_size in [100, 1000, 10000] { group.throughput(Throughput::Elements(batch_size)); group.bench_with_input( BenchmarkId::from_parameter(batch_size), &batch_size, |b, &size| { b.to_async(Runtime::new().unwrap()) .iter(|| write_batch(size)) } ); }}1.2 Multi-Threaded Throughput
Goal: Measure throughput under concurrent load
Test Cases:
- 2, 4, 8, 16, 32 concurrent sinks
- Mixed batch sizes
- Shared connection pool
Metrics:
- Aggregate throughput
- Per-sink throughput
- Lock contention metrics
- Connection pool saturation
1.3 Throughput Under Different Write Modes
Goal: Compare INSERT vs UPSERT vs REPLACE performance
Test Cases:
- INSERT only
- UPSERT with 50% updates
- UPSERT with 90% updates
- REPLACE mode
Expected Results:
- INSERT: ~100K events/sec
- UPSERT (50% update): ~80K events/sec
- UPSERT (90% update): ~60K events/sec
- REPLACE: ~50K events/sec
2. Latency Benchmarks
2.1 End-to-End Latency
Goal: Measure write() to flush() complete latency
Test Cases:
- Single row writes
- Small batch (10 rows)
- Medium batch (1000 rows)
- Large batch (10000 rows)
Metrics Distribution:
- P50, P90, P95, P99, P99.9, Max
- Mean and standard deviation
- Outlier detection (> 3σ)
Target:
- P50: <10ms
- P95: <50ms
- P99: <100ms
- P99.9: <500ms
2.2 Component-Level Latency
Goal: Break down latency by component
Components:
- Buffer Add: Time to add row to buffer
- Serialization: Row to DatabaseRow conversion
- Connection Acquire: Pool acquisition time
- Transaction Begin: Begin transaction overhead
- Execute Write: Database write execution
- Transaction Prepare: Phase 1 (2PC) overhead
- Transaction Commit: Phase 2 (2PC) overhead
Implementation:
struct LatencyBreakdown { buffer_add_ns: u64, serialization_ns: u64, conn_acquire_ns: u64, txn_begin_ns: u64, write_execute_ns: u64, txn_prepare_ns: u64, txn_commit_ns: u64, total_ns: u64,}2.3 Latency Under Load
Goal: Measure latency degradation under high throughput
Test Scenarios:
- 10K events/sec load → measure P99
- 50K events/sec load → measure P99
- 80K events/sec load → measure P99
- 100K events/sec load → measure P99
Expected Degradation:
- Should remain <100ms P99 up to 80K events/sec
- May degrade at 100K+ events/sec
3. Connection Pool Benchmarks
3.1 Connection Acquisition Latency
Goal: Measure pool.acquire() performance
Test Cases:
- Warm pool (all connections idle)
- Cold pool (create new connections)
- Mixed (some idle, some in use)
- Exhausted pool (wait for release)
Metrics:
- Acquisition time P50, P99
- Connection creation time
- Wait time when exhausted
Target:
- Warm acquire: <1ms P99
- Cold acquire: <50ms P99 (network + handshake)
- Exhausted wait: <100ms P99
3.2 Connection Pool Under Concurrency
Goal: Test pool behavior under high concurrent load
Test Cases:
- 10, 50, 100, 200 concurrent acquires
- Pool sizes: 2, 5, 10, 20 connections
- Measure contention and queuing
Metrics:
- Average wait time
- Queue depth distribution
- Connection utilization rate
- Semaphore acquisition time
3.3 Health Check Overhead
Goal: Measure cost of connection health checks
Test Cases:
- Health check every 10s, 30s, 60s
- Idle connection timeout: 300s, 600s
- Max lifetime: 1800s, 3600s
Metrics:
- Health check execution time
- Impact on throughput
- False positive rate (good connections marked bad)
Target: <1% throughput impact
4. Transaction Manager (2PC) Benchmarks
4.1 Two-Phase Commit Overhead
Goal: Measure 2PC protocol overhead vs simple commit
Comparison:
- Simple commit (enable_2pc = false)
- 2PC commit (enable_2pc = true)
- State backend write overhead
Test Cases:
- 1, 10, 100, 1000 operations per transaction
- Measure prepare phase time
- Measure commit phase time
Expected Overhead:
- Prepare: +5-10ms
- Commit: +2-5ms
- State backend write: +1-3ms
- Total overhead: ~10-20ms per transaction
4.2 Recovery Performance
Goal: Measure recovery time after crash
Test Scenarios:
- 1, 10, 100, 1000 prepared transactions
- Measure recovery.recover() time
- Measure commit of prepared transactions
Metrics:
- Recovery initialization time
- Per-transaction recovery time
- Total recovery duration
Target: <1 second for 100 prepared transactions
4.3 Transaction Retry Performance
Goal: Measure retry mechanism overhead
Test Cases:
- Transient failures (simulated)
- Retry backoff: 100ms, 500ms, 2000ms
- Max retries: 1, 3, 5
Metrics:
- Time to successful commit after retries
- Retry overhead per attempt
- Success rate after N retries
5. Batching Strategy Benchmarks
5.1 Batch Size Optimization
Goal: Find optimal batch size for throughput
Test Range: 10, 50, 100, 500, 1000, 2000, 5000, 10000 rows
Metrics:
- Throughput (events/sec)
- Latency (P99)
- Memory usage
- Batch efficiency
Expected Optimal: 1000-2000 rows per batch
5.2 Flush Interval Impact
Goal: Measure time-based flushing impact
Test Cases:
- Flush intervals: 1s, 5s, 10s, 30s
- Low throughput scenario (100 events/sec)
- High throughput scenario (50K events/sec)
Metrics:
- Average buffering time
- Buffer utilization
- Flush trigger ratio (size vs time)
Target: 80%+ size-based flushes, <20% time-based
5.3 Batch Efficiency Under Load
Goal: Measure batching efficiency at different rates
Test Scenarios:
- Bursty traffic (10K events, then idle)
- Steady traffic (constant 50K events/sec)
- Variable traffic (sine wave pattern)
Metrics:
- Batch utilization (actual / max batch size)
- Flush trigger distribution
- Wasted capacity
6. Memory Profiling
6.1 Memory Usage Per Sink
Goal: Ensure <100MB per sink instance
Components:
- WriteBuffer memory
- Transaction state
- Connection pool overhead
- Metrics storage
Test Cases:
- Idle state (no writes)
- Active writing (sustained load)
- Peak load (bursts)
Breakdown Target:
WriteBuffer: 10-20 MB (10K rows × 1-2KB)Connection Pool: 5-10 MB (10 connections)Transaction State: 5-10 MB (active transactions)Metrics: 1-5 MB (counters, histograms)Overhead: 10-20 MB (Arc, RwLock, etc.)─────────────────────────────Total: 30-65 MB (well under 100MB)6.2 Allocation Rate
Goal: Minimize allocations in hot paths
Critical Paths:
WriteBuffer::add()- Should reuse Vec capacityconvert_row()- Minimize temporary allocationsserialize_row()- Use fixed-size buffersTransactionManager::add_operation()- Pool operations
Target: <1000 allocations per 1000 events
6.3 Memory Leak Detection
Goal: Ensure no memory leaks over time
Test:
- Run for 1 hour continuous writes
- Monitor memory growth
- Sample heap at intervals
- Check for leaked Arc/RwLock references
Target: <1% memory growth per hour
7. Concurrency & Contention
7.1 Lock Contention Analysis
Goal: Identify lock bottlenecks
Hot Locks:
write_buffer: RwLock<WriteBuffer>active_connections: RwLock<Vec<PooledConnection>>active_transactions: RwLock<HashMap<...>>
Metrics:
- Lock acquisition time
- Hold duration
- Contention rate (waits / acquires)
Optimization Strategy:
- Use parking_lot for faster RwLock
- Consider lock-free structures (DashMap)
- Reduce critical section size
7.2 Async Task Scheduling
Goal: Optimize tokio task distribution
Test Cases:
- Spawn task per write vs batched execution
- Work-stealing vs dedicated threads
- Task spawn overhead
Metrics:
- Task queue depth
- Worker thread utilization
- Context switch rate
8. Checkpoint Overhead
8.1 Checkpoint Latency
Goal: Measure checkpoint() execution time
Test Cases:
- Empty buffer (no flush needed)
- Partial buffer (requires flush)
- During active writes
Target: <50ms P99 checkpoint time
8.2 Checkpoint Impact on Throughput
Goal: Measure throughput impact during checkpoint
Test:
- Baseline: sustained 80K events/sec
- Trigger checkpoint every 10s
- Measure throughput dip
Target: <5% throughput reduction during checkpoint
8.3 Checkpoint Frequency Optimization
Goal: Find optimal checkpoint interval
Test Range: 1s, 5s, 10s, 30s, 60s
Trade-off:
- Frequent checkpoints: Lower recovery time, higher overhead
- Infrequent checkpoints: Higher recovery time, lower overhead
Recommendation: 10-30s for production
9. Regression Test Suite
9.1 Performance Regression Detection
Goal: Detect performance regressions in CI/CD
Baseline Metrics (to be established):
throughput_single_thread_1000_batch: 50000 events/sec ±5%latency_p99_medium_batch: 80ms ±10%connection_acquire_warm: 0.5ms ±20%checkpoint_overhead: 3% ±1%memory_per_sink: 50MB ±20MBCI Integration:
# Run benchmarks on every commit to maincargo bench --bench database_sink_bench -- --save-baseline main
# Compare against baselinecargo bench --bench database_sink_bench -- --baseline mainAlert Thresholds:
-
10% throughput decrease → Warning
-
20% throughput decrease → Error
-
15% latency increase → Warning
-
30% latency increase → Error
9.2 Continuous Benchmark Dashboard
Goal: Track performance trends over time
Metrics to Track:
- Throughput trend (weekly)
- Latency percentiles (daily)
- Memory usage (daily)
- Regression count (per release)
Tools:
- Criterion.rs for benchmarking
- gnuplot for visualization
- Prometheus/Grafana for monitoring
10. Optimization Recommendations
10.1 Hot Path Optimizations
Optimization 1: Batch Serialization
Current: Serialize rows one-by-one
// CURRENT (slow)for row in rows { let db_row = self.convert_row(row)?; db_rows.push(db_row);}Optimized: Batch serialize with pre-allocation
// OPTIMIZEDlet mut db_rows = Vec::with_capacity(rows.len());for row in rows { db_rows.push(self.convert_row_unchecked(row));}Expected Gain: +10-15% throughput
Optimization 2: Connection Pool Warm-Up
Current: Connections created on-demand Optimized: Pre-warm min_connections at startup Expected Gain: -50% initial latency spike
Optimization 3: Zero-Copy Serialization
Current: Placeholder serialize_row() returns empty Vec
Optimized: Use bincode or serde_json with buffer reuse
Expected Gain: -20% allocation rate
Optimization 4: Lock-Free Write Buffer
Current: RwLock<WriteBuffer> on every write
Optimized: Use crossbeam::queue::SegQueue or channel
Expected Gain: -30% lock contention
10.2 Async Optimization
Optimization 5: Batch Connection Acquire
Current: Acquire connection per flush Optimized: Hold connection across multiple batches (with health check) Expected Gain: -10ms average latency
Optimization 6: Parallel Transaction Prepare
Current: Sequential prepare/commit Optimized: Parallel prepare for independent transactions Expected Gain: +50% throughput for multi-sink
10.3 Memory Optimization
Optimization 7: Object Pooling
Current: Allocate Row/DatabaseRow per write Optimized: Pool and reuse Row objects Expected Gain: -50% allocation rate
Optimization 8: Compact Row Representation
Current: HashMap for fields (heavy) Optimized: Columnar or array-based representation Expected Gain: -30% memory per row
11. Benchmark Execution Plan
Phase 1: Baseline (Day 1)
- Implement core benchmark suite
- Establish baseline metrics
- Identify top 3 bottlenecks
Phase 2: Optimization (Day 2)
- Implement hot path optimizations
- Re-run benchmarks
- Measure improvement
Phase 3: Stress Testing (Day 3)
- Long-running stability tests
- Memory leak detection
- Concurrency edge cases
Phase 4: Regression Tests (Day 4)
- Integrate into CI/CD
- Set up monitoring
- Document final results
12. Deliverables
12.1 Code Deliverables
benches/database_sink_bench.rs- Comprehensive benchmark suite- ⏳
benches/connection_pool_bench.rs- Pool-specific benchmarks - ⏳
benches/transaction_bench.rs- 2PC benchmarks - ⏳
scripts/benchmark_runner.sh- Automation script
12.2 Documentation Deliverables
- Benchmark Implementation Plan (this document)
- ⏳ Performance Analysis Report
- ⏳ Optimization Guide
- ⏳ Regression Test Setup
12.3 Data Deliverables
- ⏳ Baseline benchmark results
- ⏳ Flamegraph profiles
- ⏳ Memory allocation reports
- ⏳ Comparison matrices
13. Success Criteria
Must-Have Targets
- Throughput >100K events/sec (single sink)
- Latency P99 <100ms
- Memory <100MB per sink
- Checkpoint overhead <5%
- Connection utilization 50-80%
Nice-to-Have Targets
- Throughput >200K events/sec (aggressive optimization)
- Latency P99 <50ms
- Memory <50MB per sink
- Zero-downtime checkpoint
Stretch Goals
- Throughput >500K events/sec (multi-sink aggregated)
- Latency P50 <5ms
- Sub-millisecond connection acquire (warm)
14. Risk & Mitigation
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| 2PC overhead too high | High | Medium | Implement async prepare, optimize state backend |
| Connection pool exhaustion | High | Medium | Dynamic pool sizing, better monitoring |
| Memory leaks in Arc/RwLock | High | Low | Rigorous leak testing, use weak refs |
| Lock contention bottleneck | Medium | High | Switch to lock-free structures |
| Benchmark instability | Low | Medium | Multiple iterations, outlier removal |
15. References
- Criterion.rs Documentation
- Apache Flink Performance Tuning
- Database Connection Pool Best Practices
- Tokio Performance Tuning
- Rust Performance Book
Status: Plan Complete - Ready for Implementation
Next Step: Implement database_sink_bench.rs
Owner: Performance Benchmarker Agent
Review Date: 2025-10-30