Synchronous Replication Overhead Analysis
Synchronous Replication Overhead Analysis
Executive Summary
This analysis quantifies the latency impact and throughput effects of synchronous mirroring (single mirror per shard) on HeliosDB’s write performance, examining the trade-offs between data durability (RPO=0) and write throughput across different network configurations and workload patterns.
1. Synchronous Replication Architecture
1.1 Write Path with Mirroring
Client Write Request ↓┌────────────────────────────────────┐│ Primary Storage Node │├────────────────────────────────────┤│ 1. Validate write │ ← 50 μs│ 2. Append to commit log (local) │ ← 400 μs (NVMe)│ 3. Insert to memtable (local) │ ← 50 μs│ 4. Replicate to mirror → → → → → →│─┐└────────────────────────────────────┘ │ │ Network RTT┌────────────────────────────────────┐ ││ Mirror Storage Node │←┘├────────────────────────────────────┤│ 5. Receive replication data ││ 6. Append to commit log (local) │ ← 400 μs│ 7. Insert to memtable (local) │ ← 50 μs│ 8. Send ACK → → → → → → → → → → → │─┐└────────────────────────────────────┘ │ ↑ │ └───────────────────────────────┘ Network RTT ↓┌────────────────────────────────────┐│ Primary: Acknowledge to client │└────────────────────────────────────┘Critical Path Components:
- Primary processing: 500 μs (commit log + memtable)
- Network to mirror: RTT/2
- Mirror processing: 450 μs (commit log + memtable)
- Network ACK: RTT/2
- Total network: 1 RTT
Total latency = Primary (500 μs) + Mirror (450 μs) + Network (1 RTT)
1.2 Comparison: Asynchronous vs Synchronous
Asynchronous Replication (Not Default):
Write latency seen by client:- Local commit log: 400 μs- Local memtable: 50 μs- Client ACK: 450 μs
Background replication (not blocking):- Network to mirror: Asynchronous- Mirror write: Happens in parallel
Client latency: 450 μsRPO: >0 (potential data loss if primary fails before replication)Synchronous Replication (HeliosDB Default):
Write latency seen by client:- Primary write: 500 μs- Network RTT: Variable (see below)- Mirror write: 450 μs
Client latency: 500 + RTT + 450 μsRPO: 0 (no data loss on primary failure)2. Network RTT Impact Analysis
2.1 RTT by Network Technology
| Network Type | Location | RTT (Round-Trip Time) | Write Latency | Overhead vs Async |
|---|---|---|---|---|
| RDMA (same rack) | <5 meters | 2-5 μs | 952-955 μs | +502-505 μs (+106%) |
| RDMA (same datacenter) | <100 meters | 5-15 μs | 955-965 μs | +505-515 μs (+106-114%) |
| 10 Gbps Ethernet (same rack) | <5 meters | 50-100 μs | 1,000-1,050 μs | +550-600 μs (+122-133%) |
| 10 Gbps Ethernet (same datacenter) | <100 meters | 200-500 μs | 1,150-1,450 μs | +700-1,000 μs (+156-222%) |
| 10 Gbps Ethernet (same region) | <500 km | 5-15 ms | 5,950-15,950 μs | +5,500-15,500 μs (+1,222-3,444%) |
Key Observations:
-
RDMA (Same Datacenter):
- RTT: 5-15 μs (negligible)
- Total write latency: ~960 μs
- Overhead: +106% vs async (but still <1ms)
- Recommended configuration for production
-
10 Gbps Ethernet (Same Datacenter):
- RTT: 200-500 μs (significant)
- Total write latency: 1.2-1.5 ms
- Overhead: +156-222% vs async
- Acceptable for moderate write workloads
-
Cross-Region Replication:
- RTT: 5-15 ms (prohibitive for synchronous)
- Total write latency: 6-16 ms
- Not suitable for synchronous replication
- Use asynchronous replication for geo-distributed setups
2.2 Quantitative Impact on Write Throughput
Single-Threaded Write Performance:
| Network | Write Latency | Max Throughput (1/latency) | Writes/sec |
|---|---|---|---|
| Async (baseline) | 450 μs | 1 ÷ 0.00045 s | 2,222 |
| RDMA (same DC) | 960 μs | 1 ÷ 0.00096 s | 1,042 |
| 10G Eth (same DC) | 1,200 μs | 1 ÷ 0.0012 s | 833 |
| 10G Eth (same region) | 6,000 μs | 1 ÷ 0.006 s | 167 |
Throughput reduction:
- RDMA: 53% reduction (2,222 → 1,042 writes/sec)
- 10G Ethernet: 62% reduction (2,222 → 833 writes/sec)
Multi-Threaded Write Performance (100 Concurrent Writes):
Pipelined writes allow overlapping network RTT
RDMA Configuration:- Per-write latency: 960 μs- Concurrency: 100 parallel writes- Pipeline depth: ~100 in-flight writes- Effective throughput: 100 × (1 ÷ 0.00096s) = 104,167 writes/sec
vs Async:- Throughput: 100 × (1 ÷ 0.00045s) = 222,222 writes/sec- Reduction: 53% (same as single-threaded)
Conclusion: Synchronous mirroring reduces throughput by 50-60% regardless of concurrency3. Workload-Specific Impact Analysis
3.1 OLTP Workload (High Write Frequency)
Profile:
- Write rate: 50,000 transactions/sec (target)
- Average write size: 2 KB
- Read:Write ratio: 70:30
Async Replication:
Per-node write capacity:- Single-threaded: 2,222 writes/sec- With 100 threads: 222,222 writes/sec- Nodes needed for 50K writes/sec: 50,000 ÷ 222,222 = 1 node
Total nodes: 1 storage node (plus replicas for reads)Sync Replication (RDMA):
Per-node write capacity:- Single-threaded: 1,042 writes/sec- With 100 threads: 104,167 writes/sec- Nodes needed for 50K writes/sec: 50,000 ÷ 104,167 = 1 node (still sufficient)
Total nodes: 1 storage nodeLatency penalty: +106% (960 μs vs 450 μs)Analysis:
- For moderate OLTP workloads (<100K writes/sec), synchronous replication with RDMA is viable
- Latency increases by ~500 μs, but stays under 1 ms
- Trade-off: 2x write latency for zero data loss (RPO=0)
Sync Replication (10G Ethernet):
Per-node write capacity:- With 100 threads: 83,333 writes/sec- Nodes needed: 50,000 ÷ 83,333 = 1 node (marginal)
At peak load (100K writes/sec):- Nodes needed: 100,000 ÷ 83,333 = 2 nodes- With RDMA: 100,000 ÷ 104,167 = 1 node
Cost: 2x nodes needed vs RDMARecommendation for OLTP: Use RDMA for synchronous replication to maintain <1ms write latency.
3.2 HTAP Workload (Mixed Read/Write)
Profile:
- Write rate: 10,000 transactions/sec
- Read rate: 100,000 queries/sec
- Analytical queries: 1,000/sec (complex, multi-shard)
Impact of Synchronous Replication:
Write latency impact:- Async: 450 μs- Sync (RDMA): 960 μs- Difference: +510 μs
For transactional inserts (user-facing):- Async: User sees 450 μs insert latency- Sync: User sees 960 μs insert latency- Perceptible but acceptable (<1ms)
For analytical queries:- Write latency doesn't affect read performance- Reads can hit either primary or mirror (load balancing)- **Synchronous replication enables zero-lag mirror reads**Benefit: Real-Time Analytics on Mirror
Async replication lag: 10-100 ms (typical)- Analytics queries may see stale data- Not suitable for real-time dashboards
Sync replication lag: 0 ms- Mirror is always up-to-date- Analytics queries can safely read from mirror- Load balancing: Primary handles writes, mirror handles reads
Read throughput improvement:- Single node: 100K reads/sec- With mirror (read replica): 200K reads/sec- **2x read capacity with zero lag**Recommendation for HTAP: Synchronous replication is ideal—enables real-time analytics on mirror while maintaining durability.
3.3 Bulk Load Workload (Write-Heavy)
Profile:
- Initial data load: 1 TB of data
- Sustained write rate: 500,000 rows/sec
- Temporary (batch processing)
Async Replication:
Write throughput per node:- With 200 concurrent threads: 444,444 writes/sec- Nodes needed for 500K writes/sec: 500,000 ÷ 444,444 = 2 nodes
Load time for 1 TB:- Row size: 2 KB- Total rows: 1 TB ÷ 2 KB = 500M rows- Time: 500M ÷ 500K writes/sec = 1,000 seconds = 16.7 minutesSync Replication (RDMA):
Write throughput per node:- With 200 concurrent threads: 208,333 writes/sec- Nodes needed for 500K writes/sec: 500,000 ÷ 208,333 = 3 nodes
Load time for 1 TB:- Time: 500M ÷ 500K writes/sec = 1,000 seconds = 16.7 minutes (same)
Additional resource cost:- 3 nodes vs 2 nodes = +50% compute- But mirrors provide redundancyOptimization: Disable Synchronous Replication for Bulk Loads
-- HeliosDB DDL extensionALTER TABLE staging_data SET REPLICATION MODE = 'ASYNC';
-- Bulk load with asynchronous replicationCOPY staging_data FROM '/data/bulk.csv';
-- Re-enable synchronous replicationALTER TABLE staging_data SET REPLICATION MODE = 'SYNC';
-- Force mirror catch-up (blocking)ALTER TABLE staging_data SYNC REPLICAS;Result:
- Bulk load at full async speed (444K writes/sec per node)
- Post-load sync: 100-500 seconds (depending on mirror lag)
- Best of both worlds: Fast loads + eventual strong consistency
Recommendation for Bulk Loads: Temporarily disable synchronous replication, re-enable after load completion.
4. Failure Scenarios and Recovery Time
4.1 Primary Node Failure (Synchronous Replication)
Scenario: Primary node crashes mid-transaction
State at failure:- Primary commit log: Last write at LSN 1,234,567- Mirror commit log: Last write at LSN 1,234,567 (guaranteed)- No data loss
Failover process:1. Witness detects primary failure: 1-3 seconds (heartbeat timeout)2. Witness grants failover to mirror: 100 ms (quorum decision)3. Mirror promoted to primary: 50 ms (metadata update)4. New primary ready for writes: 50 ms (state transition)Total: 1.2-3.2 seconds
Recovery Point Objective (RPO): 0 (no data loss)Recovery Time Objective (RTO): 1-3 secondsClient Impact:
In-flight writes at failure time:- Writes acknowledged to client: 0 lost (committed to mirror)- Writes not yet acknowledged: Failed, client retries (idempotent writes)
Downtime: 1-3 secondsUser experience: Brief unavailability, automatic recovery4.2 Primary Node Failure (Asynchronous Replication)
Scenario: Primary node crashes mid-transaction
State at failure:- Primary commit log: Last write at LSN 1,234,567- Mirror commit log: Last write at LSN 1,234,500 (lagging)- Data loss: 67 transactions
Replication lag at failure:- Typical async lag: 10-100 ms- Writes in-flight: 100 ms × 100K writes/sec = 10,000 writes- Data loss: Up to 10,000 transactions
Recovery Point Objective (RPO): 10-100 ms (data loss window)Recovery Time Objective (RTO): 1-3 seconds (same as sync)Client Impact:
Data loss consequences:- 10,000 acknowledged writes lost (committed to primary, not mirror)- Client believes writes succeeded, but data is gone- Application-level inconsistency
Mitigation:- Application-level write-ahead logging- Client-side retry with idempotency keys- **Not suitable for financial transactions or critical data**4.3 Network Partition (Split-Brain Prevention)
Scenario: Network partition between primary and mirror
Witness-Based Quorum:
Partition Scenario 1: Primary isolated┌─────────┐ ┌────────┐│ Primary │ X │ Mirror │───┐└─────────┘ └────────┘ │ │ ┌──────────┐ │ │ Witness │─┘ └──────────┘
Quorum decision:- Mirror + Witness = 2 votes (majority)- Primary = 1 vote (minority)- Mirror promoted to primary ✓- Old primary demoted (cannot accept writes) ✓
Partition Scenario 2: Mirror isolated┌─────────┐ ┌────────┐│ Primary │───┐ │ Mirror │ X└─────────┘ │ └────────┘ │┌──────────┐ ││ Witness │──┘└──────────┘
Quorum decision:- Primary + Witness = 2 votes (majority)- Mirror = 1 vote (minority)- Primary remains primary ✓- Mirror cannot self-promote ✓
Result: No split-brain, guaranteed single active primarySynchronous Replication Advantage:
With sync replication:- Writes block until mirror acknowledges- If mirror is partitioned, writes fail (but no data loss)- Client sees write errors, can retry
With async replication:- Writes succeed on primary (no blocking)- If mirror is partitioned, replication lag grows- On failover, large data loss windowRecommendation: Synchronous replication critical for split-brain safety in presence of network partitions.
5. Tuning and Optimization Strategies
5.1 Adaptive Replication Mode
Dynamic Mode Selection:
pub enum ReplicationMode { Synchronous, // RPO = 0, slower writes Asynchronous, // RPO > 0, faster writes SemiSync, // Hybrid: sync for critical tables, async for others}
pub struct TableReplicationPolicy { mode: ReplicationMode, timeout_ms: u64, // Max time to wait for sync ACK}
impl TableReplicationPolicy { pub fn new_critical() -> Self { Self { mode: ReplicationMode::Synchronous, timeout_ms: 5000, // 5 sec timeout before async fallback } }
pub fn new_best_effort() -> Self { Self { mode: ReplicationMode::Asynchronous, timeout_ms: 0, } }}Per-Table Configuration:
-- Critical data: Financial transactionsCREATE TABLE transactions ( txn_id BIGINT PRIMARY KEY, amount DECIMAL(20,2), ...) WITH (replication_mode = 'SYNCHRONOUS');
-- Non-critical data: User sessionsCREATE TABLE sessions ( session_id UUID PRIMARY KEY, user_id BIGINT, ...) WITH (replication_mode = 'ASYNCHRONOUS');
-- Analytics: Logs (can tolerate loss)CREATE TABLE access_logs ( timestamp TIMESTAMP, user_id BIGINT, ...) WITH (replication_mode = 'ASYNCHRONOUS');Benefit:
- Critical tables: RPO = 0, slower writes (acceptable for low-volume txns)
- Non-critical tables: Fast writes, some data loss risk (acceptable for high-volume logs)
- Cluster-wide throughput optimized
5.2 Batched Replication
Problem: Synchronous replication adds 1 RTT per write
Solution: Group multiple writes into single replication batch
Traditional (1 write per RTT):Write 1 → Mirror ACK → 960 μsWrite 2 → Mirror ACK → 960 μsWrite 3 → Mirror ACK → 960 μsTotal: 2,880 μs for 3 writes
Batched (N writes per RTT):Write 1 ┐Write 2 ├→ Batch → Mirror ACK → 980 μsWrite 3 ┘Total: 980 μs for 3 writes
Latency per write: 980 ÷ 3 = 327 μsThroughput: 3x improvementImplementation:
pub struct ReplicationBatcher { pending_writes: Vec<WriteOp>, batch_size: usize, max_wait_us: u64,}
impl ReplicationBatcher { pub async fn add_write(&mut self, write: WriteOp) -> Result<()> { self.pending_writes.push(write);
if self.pending_writes.len() >= self.batch_size { self.flush().await?; }
Ok(()) }
async fn flush(&mut self) -> Result<()> { // Send entire batch to mirror in one message let batch = std::mem::take(&mut self.pending_writes); self.send_batch_to_mirror(batch).await?;
// Single RTT for entire batch self.wait_for_mirror_ack().await?; Ok(()) }}Configuration:
[replication.batching]enabled = truemax_batch_size = 100 # Up to 100 writes per batchmax_wait_us = 500 # Flush batch after 500 μs even if not fullTrade-offs:
| Metric | Non-Batched | Batched (size=100) |
|---|---|---|
| Latency (per write) | 960 μs | 600 μs (avg) |
| Throughput | 104K writes/sec | 167K writes/sec |
| Max delay | 960 μs | 1,460 μs (worst case: wait 500μs + 960μs RTT) |
Recommendation: Enable batching for high-throughput workloads, with max_wait tuned to latency SLA.
5.3 Parallel Replication Streams
Problem: Single replication stream serializes writes
Solution: Partition replication by shard/table, use multiple parallel streams
Single Stream:Primary → [Queue: W1, W2, W3, ...] → MirrorBottleneck: Network RTT serializes all writes
Parallel Streams (4 streams):Primary → [Queue 1: W1, W5, ...] → MirrorPrimary → [Queue 2: W2, W6, ...] → MirrorPrimary → [Queue 3: W3, W7, ...] → MirrorPrimary → [Queue 4: W4, W8, ...] → Mirror
Throughput: 4x (each stream pipelined independently)Configuration:
[replication.parallelism]num_streams = 4 # 4 parallel TCP/RDMA connections per primary-mirror pairstream_assignment = "hash" # Hash shard key to assign streamPerformance Impact:
With 4 parallel streams:- Each stream: 104K writes/sec (RDMA)- Total: 4 × 104K = 416K writes/sec per node
Improvement: 4x vs single streamCaveat: Within a single shard, write order must be preserved (use same stream for same shard).
6. Cost-Benefit Analysis
6.1 Performance vs Durability Trade-off Matrix
| Replication Mode | Write Latency | Throughput | RPO | RTO | Use Case |
|---|---|---|---|---|---|
| None | 450 μs | 222K/sec | ∞ (all data lost on failure) | N/A | Development only |
| Async | 450 μs | 222K/sec | 10-100 ms | 1-3 sec | Logs, caches, non-critical |
| Sync (10G Eth) | 1,200 μs | 83K/sec | 0 | 1-3 sec | Moderate criticality |
| Sync (RDMA) | 960 μs | 104K/sec | 0 | 1-3 sec | Production default |
| Sync (RDMA) + Batching | 600 μs (avg) | 167K/sec | 0 | 1-3 sec | High-throughput OLTP |
6.2 Financial Impact of Data Loss (RPO > 0)
Scenario: E-commerce database with async replication
Assumptions:- Average transaction value: $100- Write rate: 1,000 transactions/sec- Replication lag: 50 ms (typical async)- Primary failure probability: 0.1% per year (1 failure every 1000 days)
Data loss on failure:- Transactions in lag window: 1,000 writes/sec × 0.05 sec = 50 transactions- Value at risk: 50 × $100 = $5,000 per failure
Expected annual loss:- Failure rate: 0.001/year- Expected loss: $5,000 × 0.001 = $5/year
Seems low, but:- Reputational damage: Unquantified but significant- Regulatory compliance: Financial transactions require RPO=0 (PCI-DSS, SOX)- Legal liability: Class-action lawsuits for data loss
Conclusion: For financial data, synchronous replication is mandatory regardless of cost6.3 Infrastructure Cost Comparison
Scenario: 30-node HeliosDB cluster
Async Replication:
Throughput target: 3 million writes/sec
Nodes needed (primary + mirror):- Per-node capacity: 222K writes/sec- Primary nodes: 3M ÷ 222K = 14 nodes- Mirror nodes: 14 nodes (same capacity)- Total: 28 nodes
Network: 10 Gbps Ethernet- Cost per node: $800- Total network cost: 28 × $800 = $22,400
Node cost (compute + storage):- Cost per node: $5,000- Total: 28 × $5,000 = $140,000
Total infrastructure: $162,400Sync Replication (RDMA):
Throughput target: 3 million writes/sec
Nodes needed:- Per-node capacity: 104K writes/sec (RDMA sync)- Primary nodes: 3M ÷ 104K = 29 nodes- Mirror nodes: 29 nodes- Total: 58 nodes
Network: 100 Gbps RDMA- Cost per node: $3,200- Total network cost: 58 × $3,200 = $185,600
Node cost:- Cost per node: $5,000- Total: 58 × $5,000 = $290,000
Total infrastructure: $475,600
Additional cost: $313,200 (1.93x more expensive)BUT: Durability Benefit
Async: RPO = 50 ms- Data loss on failure: 50,000 transactions- Financial risk: Potentially $millions in transaction value
Sync: RPO = 0- Data loss on failure: 0 transactions- Financial risk: $0
For mission-critical systems, 1.93x infrastructure cost is justifiedOptimization: Hybrid Strategy
Configure 80% of tables with async (non-critical logs, caches):- Nodes needed: 14 primary + 14 mirror = 28 nodes (async)
Configure 20% of tables with sync (financial transactions):- Nodes needed: 6 primary + 6 mirror = 12 nodes (sync, RDMA)
Total nodes: 40 nodesNetwork cost: 28 × $800 + 12 × $3,200 = $61,040Node cost: 40 × $5,000 = $200,000Total: $261,040
Savings: $214,560 (45% cheaper than full sync)Durability: RPO=0 for critical data, RPO>0 for non-critical
**Best of both worlds**7. Monitoring and Alerting
7.1 Key Replication Metrics
Latency Metrics:
replication.primary_write_latency_us: Time for local write on primaryreplication.mirror_ack_latency_us: Time waiting for mirror ACKreplication.total_write_latency_us: End-to-end write latencyreplication.network_rtt_us: Measured network RTT to mirror
Alert thresholds:- total_write_latency_us > 2000: Warning (>2ms)- total_write_latency_us > 5000: Critical (>5ms, likely network issue)Throughput Metrics:
replication.writes_per_sec: Actual write throughputreplication.bytes_replicated_per_sec: Data replication bandwidthreplication.pending_writes: Queue depth on primary
Alert thresholds:- pending_writes > 10,000: Replication is lagging- writes_per_sec < 50,000: Underutilized (cost optimization opportunity)Failure Metrics:
replication.mirror_failures_total: Count of failed replicationsreplication.mirror_lag_seconds: How far mirror is behind (async mode)replication.failovers_total: Count of primary → mirror promotions
Alert thresholds:- mirror_failures_total increasing: Network or mirror node issues- mirror_lag_seconds > 1.0: Async lag too high (risk of data loss)7.2 Automated Remediation
Auto-Fallback to Async:
pub struct ReplicationManager { mode: ReplicationMode, timeout_ms: u64, failure_count: AtomicU64,}
impl ReplicationManager { pub async fn replicate_write(&self, write: WriteOp) -> Result<()> { match self.mode { ReplicationMode::Synchronous => { match timeout(self.timeout_ms, self.sync_replicate(write)).await { Ok(Ok(())) => { // Success, reset failure counter self.failure_count.store(0, Ordering::Relaxed); Ok(()) } Ok(Err(e)) | Err(_) => { // Sync replication failed or timed out self.failure_count.fetch_add(1, Ordering::Relaxed);
if self.failure_count.load(Ordering::Relaxed) > 10 { // Too many failures, fallback to async warn!("Sync replication failing, falling back to async"); self.mode = ReplicationMode::Asynchronous; }
// Write succeeds locally, replication async self.async_replicate(write).await } } } ReplicationMode::Asynchronous => self.async_replicate(write).await, } }}Benefit: Graceful degradation under network issues (availability > durability temporarily)
8. Conclusion
Key Findings:
-
Latency Impact:
- RDMA: +510 μs (106% increase, but <1ms total)
- 10G Ethernet: +700-1,000 μs (156-222% increase, ~1.5ms total)
- RDMA critical for sub-millisecond write latency
-
Throughput Impact:
- Reduction: 50-60% vs async (104K vs 222K writes/sec per node)
- Mitigation: Batching can recover 60% of loss (167K writes/sec)
- 2x more nodes needed for same throughput
-
Durability Benefit:
- RPO: 0 (zero data loss on primary failure)
- RTO: 1-3 seconds (automatic failover with witness quorum)
- Essential for financial transactions, user data, compliance
-
Cost-Benefit:
- Infrastructure: 1.9x more expensive (more nodes needed)
- Data loss risk: Eliminated ($0 vs potentially $millions)
- Justified for mission-critical production workloads
-
Optimization Strategies:
- Batching: 60% throughput recovery with <1ms added latency
- Parallel streams: 4x throughput with 4 streams
- Hybrid mode: 45% cost savings (async for non-critical, sync for critical)
Recommended Configuration:
[replication]default_mode = "synchronous" # RPO=0 by defaultnetwork = "rdma" # <1ms write latencywitness_quorum = true # Split-brain protection
[replication.batching]enabled = truemax_batch_size = 50max_wait_us = 300 # 300μs batching window
[replication.parallelism]num_streams = 4 # 4x throughput
# Per-table overrides[replication.table_overrides]"access_logs" = "asynchronous" # Non-critical"user_sessions" = "asynchronous" # Non-critical"transactions" = "synchronous" # Critical"account_balances" = "synchronous" # CriticalDecision Matrix:
| Workload Type | Recommended Mode | Network | Expected Write Latency | Expected Throughput |
|---|---|---|---|---|
| Financial transactions | Sync | RDMA | <1ms | 100K/sec per node |
| E-commerce orders | Sync | RDMA | <1ms | 100K/sec per node |
| User profile updates | Sync | 10G Eth | 1-2ms | 80K/sec per node |
| Analytics/logs | Async | 10G Eth | <0.5ms | 200K/sec per node |
| Development/testing | Async | 10G Eth | <0.5ms | 200K/sec per node |
Next Steps:
- Implement per-table replication mode configuration
- Add batching support for high-throughput workloads
- Develop automatic fallback mechanism (sync → async on network issues)
- Create monitoring dashboard for replication health metrics
- Benchmark failover times under various failure scenarios