Skip to content

Multi-Region Active-Active Writes

Multi-Region Active-Active Writes — Sub-100ms Global Sync

Crate: heliosdb-cluster/crates/active-active (50+ tests, production), heliosdb-cluster/crates/multiregion (architecture-complete, example stubbed pending API) Status: Active-active engine production; full multi-region setup-script example marked “requires API methods not yet implemented” in source — see Section 10. ARR: $60M-$100M (per audit)


UVP

Postgres replication is single-leader. Spanner is closed-source. CockroachDB and YugabyteDB are leader-per-range. HeliosDB Full ships the first PostgreSQL-compatible database with true active-active writes — write to any region, read from any region, with strong, causal, eventual, or session consistency you pick per workload. Cross-region latency lands at ~100ms strong / ~10ms eventual on a 3-region setup, with last-write-wins / first-write-wins / region-priority conflict resolution baked in. The same SQL, three regions, no leader pinning.


Prerequisites

  • A working 3-node Raft cluster — see raft-setup.md.
  • Three regions (anything tagged RegionId(0..n) — typically AWS regions, Azure zones, or GCP regions).
  • HeliosDB Full v7.x+ on every node.
  • About 30 minutes.

1. Two Layers, One Database

The Full edition has two cooperating multi-region subsystems, both inside heliosdb-cluster/crates:

LayerCrateJob
Active-active write engineactive-activeper-row writes, version vectors, conflict resolution, partition tolerance
Multi-region cluster managermultiregionregion topology, WAL streaming, global 2PC, query routing, health monitor

The active-active engine is what makes “write anywhere” actually work. The multiregion cluster manager is what wires regions together. They are designed to be used together.


2. Pick a Consistency Model

Per the active-active README, four are supported:

ModelLatency (3 regions)Use case
Strong (2PC linearizable)~100msfinancial transactions, inventory
Eventual (async replication)~10mssocial feeds, analytics, logs
Causal (preserves causality)~50mscomment threads, workflows
Session (read-your-writes)~40msuser sessions, shopping carts

You pick this per ActiveActiveManager, not per query — choose based on the dominant workload.


3. Set Up the Active-Active Manager

use heliosdb_active_active::*;
use chrono::Utc;
#[tokio::main]
async fn main() -> Result<()> {
let config = ActiveActiveConfig {
consistency_model: ConsistencyModel::Strong,
num_regions: 3,
cross_region_timeout_ms: 100,
..Default::default()
};
let manager = ActiveActiveManager::new(config).await?;
let request = WriteRequest {
key: "user:1234".to_string(),
value: b"user_data".to_vec(),
version: VersionVector::new(),
region: RegionId(0),
timestamp: Utc::now(),
causal_deps: None,
session_id: None,
};
let response = manager.write(request).await?;
println!("Write OK: {:?}", response);
let metrics = manager.get_metrics().await;
println!("Max latency: {}ms", metrics.max_latency_ms);
Ok(())
}

Notes:

  • VersionVector::new() starts a fresh causality vector. The manager fills it in during commit.
  • region: RegionId(0) means “this write originated in region 0”.
  • cross_region_timeout_ms: 100 is the per-region wait. Strong consistency uses 2-phase commit and waits for all participants up to this timeout.

4. Configure Conflict Resolution

When two regions write the same key in the same instant, somebody wins. Pick how:

// Last-write-wins — uses Hybrid Logical Clock
let config = ActiveActiveConfig {
resolution_strategy: ResolutionStrategy::LastWriteWins,
..Default::default()
};
// First-write-wins
let config = ActiveActiveConfig {
resolution_strategy: ResolutionStrategy::FirstWriteWins,
..Default::default()
};
// Region priority (e.g. always prefer the primary region)
let mut resolver = ConflictResolver::new(ResolutionStrategy::RegionPriority);
resolver.set_region_priority(RegionId(0), 100); // Primary
resolver.set_region_priority(RegionId(1), 50); // Secondary
resolver.set_region_priority(RegionId(2), 25);

LWW is the default. Region priority is ideal when you want one region authoritative for tie-breaks (e.g. a regulatory home region) without giving up active-active for everyone else.


5. Wire Up the Region Topology (multiregion crate)

This is the cluster-level config that tells the system which regions exist. From the multiregion README:

use heliosdb_multiregion::*;
let regions = vec![
RegionConfig {
region_id: "us-east-1".to_string(),
datacenter: "Virginia".to_string(),
nodes: vec!["10.0.1.1:5432".to_string(), "10.0.1.2:5432".to_string()],
is_primary: true,
},
RegionConfig {
region_id: "eu-west-1".to_string(),
datacenter: "Ireland".to_string(),
nodes: vec!["10.0.2.1:5432".to_string(), "10.0.2.2:5432".to_string()],
is_primary: false,
},
RegionConfig {
region_id: "ap-south-1".to_string(),
datacenter: "Mumbai".to_string(),
nodes: vec!["10.0.3.1:5432".to_string(), "10.0.3.2:5432".to_string()],
is_primary: false,
},
];
let config = ReplicationConfig {
mode: ReplicationMode::ActiveActive,
conflict_resolution: ConflictStrategy::LastWriteWins,
consistency_level: ConsistencyLevel::Quorum,
compression: true,
encryption: true,
max_lag_ms: 5000,
};
let cluster = MultiRegionCluster::new_with_config(regions, config).await?;

compression: true uses LZ4 on the WAL stream — typically 3-5x bandwidth reduction. encryption: true encrypts cross-region traffic in transit (independent of TLS at the listener).


6. Issue a Global Transaction

Multi-region 2PC across three datacenters:

let mut txn = cluster.begin_global_transaction().await?;
txn.add_operation(Operation::Write {
key: "order:42".to_string(),
value: b"{\"status\":\"paid\"}".to_vec(),
});
txn.add_operation(Operation::Delete {
key: "cart:user:99".to_string(),
});
cluster.commit_global(txn).await?;

commit_global runs the prepare phase on every region. If any region votes No, all regions abort. Default txn timeout is 30 seconds; raise it with txn.timeout_secs = 60 if your cross-region RTT is unforgiving.


7. Region-Aware Query Routing

Reads can be routed to the user’s nearest region:

// Pin to user's region
let target = cluster
.route_query("SELECT * FROM users WHERE id = $1", Some("eu-west-1"))
.await?;
// Or let the router pick the lowest-latency region
let target = cluster.route_query("SELECT * FROM users WHERE id = $1", None).await?;

route_query(_, None) picks based on the configured router (latency / load / policy).


8. Watch Replication Lag

for status in cluster.get_all_region_status().await? {
if status.lag_ms > 5000 {
eprintln!("⚠ High lag in {}: {}ms", status.region_id, status.lag_ms);
}
}

Wire this into your alerting. max_lag_ms: 5000 in the config above is the trigger — above it, the region is considered “lagging” but still serving reads. Above 2 * max_lag_ms (configurable) reads can be auto-routed away.


9. Network Partitions

Active-active is designed to survive partitions:

let affected = vec![RegionId(2), RegionId(3)];
manager.handle_partition(affected.clone()).await?;
// system continues serving from regions 0 and 1
// when network heals:
manager.recover_from_partition(affected).await?;
// automatic reconciliation of divergent writes via the configured resolver

Quorum-based decision making prevents split-brain — if you can’t reach a majority, you can’t accept writes (unless you’re in Eventual mode, where stale-but-available is the deal).


10. Honest Status: What’s Production, What’s Architecture-Complete

Per the source-of-truth check:

  • active-active crate: 50+ tests, production-ready per its README (“first Postgres-compatible database to offer genuine active-active capabilities”).
  • multiregion crate: Library and types are production code, but the example file examples/multiregion_setup.rs is currently a stub: “This example requires the ‘multiregion-example’ feature… requires API methods and types not yet fully implemented.” Production deployments today run via the MultiRegionCluster API directly — no end-to-end CLI driver yet.
  • Per the audit, this is Tier 3 coverage (architecture exists, no full walkthrough). Treat the snippets above as a real API surface but not a copy-paste deploy script.

11. Performance Reference

From the active-active crate’s published benches:

ScenarioLatencyThroughput
Strong, 1 region50ms50K ops/sec
Strong, 3 regions100ms30K ops/sec
Eventual, 3 regions10ms100K ops/sec
Concurrent (100 threads)80ms50K ops/sec

Rerun on your own topology with cargo bench inside the crate.


Where Next