Multi-Region Active-Active Writes

Multi-Region Active-Active Writes — Sub-100ms Global Sync

Crate: heliosdb-cluster/crates/active-active (50+ tests, production), heliosdb-cluster/crates/multiregion (architecture-complete, example stubbed pending API) Status: Active-active engine production; full multi-region setup-script example marked “requires API methods not yet implemented” in source — see Section 10. ARR: $60M-$100M (per audit)

UVP

Postgres replication is single-leader. Spanner is closed-source. CockroachDB and YugabyteDB are leader-per-range. HeliosDB Full ships the first PostgreSQL-compatible database with true active-active writes — write to any region, read from any region, with strong, causal, eventual, or session consistency you pick per workload. Cross-region latency lands at ~100ms strong / ~10ms eventual on a 3-region setup, with last-write-wins / first-write-wins / region-priority conflict resolution baked in. The same SQL, three regions, no leader pinning.

Prerequisites

A working 3-node Raft cluster — see raft-setup.md.
Three regions (anything tagged RegionId(0..n) — typically AWS regions, Azure zones, or GCP regions).
HeliosDB Full v7.x+ on every node.
About 30 minutes.

1. Two Layers, One Database

The Full edition has two cooperating multi-region subsystems, both inside heliosdb-cluster/crates:

Layer	Crate	Job
Active-active write engine	`active-active`	per-row writes, version vectors, conflict resolution, partition tolerance
Multi-region cluster manager	`multiregion`	region topology, WAL streaming, global 2PC, query routing, health monitor

The active-active engine is what makes “write anywhere” actually work. The multiregion cluster manager is what wires regions together. They are designed to be used together.

2. Pick a Consistency Model

Per the active-active README, four are supported:

Model	Latency (3 regions)	Use case
`Strong` (2PC linearizable)	~100ms	financial transactions, inventory
`Eventual` (async replication)	~10ms	social feeds, analytics, logs
`Causal` (preserves causality)	~50ms	comment threads, workflows
`Session` (read-your-writes)	~40ms	user sessions, shopping carts

You pick this per ActiveActiveManager, not per query — choose based on the dominant workload.

3. Set Up the Active-Active Manager

use heliosdb_active_active::*;
use chrono::Utc;

#[tokio::main]
async fn main() -> Result<()> {
    let config = ActiveActiveConfig {
        consistency_model: ConsistencyModel::Strong,
        num_regions: 3,
        cross_region_timeout_ms: 100,
        ..Default::default()
    };

    let manager = ActiveActiveManager::new(config).await?;

    let request = WriteRequest {
        key: "user:1234".to_string(),
        value: b"user_data".to_vec(),
        version: VersionVector::new(),
        region: RegionId(0),
        timestamp: Utc::now(),
        causal_deps: None,
        session_id: None,
    };

    let response = manager.write(request).await?;
    println!("Write OK: {:?}", response);

    let metrics = manager.get_metrics().await;
    println!("Max latency: {}ms", metrics.max_latency_ms);
    Ok(())
}

Notes:

VersionVector::new() starts a fresh causality vector. The manager fills it in during commit.
region: RegionId(0) means “this write originated in region 0”.
cross_region_timeout_ms: 100 is the per-region wait. Strong consistency uses 2-phase commit and waits for all participants up to this timeout.

4. Configure Conflict Resolution

When two regions write the same key in the same instant, somebody wins. Pick how:

// Last-write-wins — uses Hybrid Logical Clock
let config = ActiveActiveConfig {
    resolution_strategy: ResolutionStrategy::LastWriteWins,
    ..Default::default()
};

// First-write-wins
let config = ActiveActiveConfig {
    resolution_strategy: ResolutionStrategy::FirstWriteWins,
    ..Default::default()
};

// Region priority (e.g. always prefer the primary region)
let mut resolver = ConflictResolver::new(ResolutionStrategy::RegionPriority);
resolver.set_region_priority(RegionId(0), 100); // Primary
resolver.set_region_priority(RegionId(1), 50);  // Secondary
resolver.set_region_priority(RegionId(2), 25);

LWW is the default. Region priority is ideal when you want one region authoritative for tie-breaks (e.g. a regulatory home region) without giving up active-active for everyone else.

5. Wire Up the Region Topology (multiregion crate)

This is the cluster-level config that tells the system which regions exist. From the multiregion README:

use heliosdb_multiregion::*;

let regions = vec![
    RegionConfig {
        region_id: "us-east-1".to_string(),
        datacenter: "Virginia".to_string(),
        nodes: vec!["10.0.1.1:5432".to_string(), "10.0.1.2:5432".to_string()],
        is_primary: true,
    },
    RegionConfig {
        region_id: "eu-west-1".to_string(),
        datacenter: "Ireland".to_string(),
        nodes: vec!["10.0.2.1:5432".to_string(), "10.0.2.2:5432".to_string()],
        is_primary: false,
    },
    RegionConfig {
        region_id: "ap-south-1".to_string(),
        datacenter: "Mumbai".to_string(),
        nodes: vec!["10.0.3.1:5432".to_string(), "10.0.3.2:5432".to_string()],
        is_primary: false,
    },
];

let config = ReplicationConfig {
    mode: ReplicationMode::ActiveActive,
    conflict_resolution: ConflictStrategy::LastWriteWins,
    consistency_level: ConsistencyLevel::Quorum,
    compression: true,
    encryption: true,
    max_lag_ms: 5000,
};

let cluster = MultiRegionCluster::new_with_config(regions, config).await?;

compression: true uses LZ4 on the WAL stream — typically 3-5x bandwidth reduction. encryption: true encrypts cross-region traffic in transit (independent of TLS at the listener).

6. Issue a Global Transaction

Multi-region 2PC across three datacenters:

let mut txn = cluster.begin_global_transaction().await?;

txn.add_operation(Operation::Write {
    key: "order:42".to_string(),
    value: b"{\"status\":\"paid\"}".to_vec(),
});

txn.add_operation(Operation::Delete {
    key: "cart:user:99".to_string(),
});

cluster.commit_global(txn).await?;

commit_global runs the prepare phase on every region. If any region votes No, all regions abort. Default txn timeout is 30 seconds; raise it with txn.timeout_secs = 60 if your cross-region RTT is unforgiving.

7. Region-Aware Query Routing

Reads can be routed to the user’s nearest region:

// Pin to user's region
let target = cluster
    .route_query("SELECT * FROM users WHERE id = $1", Some("eu-west-1"))
    .await?;

// Or let the router pick the lowest-latency region
let target = cluster.route_query("SELECT * FROM users WHERE id = $1", None).await?;

route_query(_, None) picks based on the configured router (latency / load / policy).

8. Watch Replication Lag

for status in cluster.get_all_region_status().await? {
    if status.lag_ms > 5000 {
        eprintln!("⚠ High lag in {}: {}ms", status.region_id, status.lag_ms);
    }
}

Wire this into your alerting. max_lag_ms: 5000 in the config above is the trigger — above it, the region is considered “lagging” but still serving reads. Above 2 * max_lag_ms (configurable) reads can be auto-routed away.

9. Network Partitions

Active-active is designed to survive partitions:

let affected = vec![RegionId(2), RegionId(3)];
manager.handle_partition(affected.clone()).await?;
// system continues serving from regions 0 and 1

// when network heals:
manager.recover_from_partition(affected).await?;
// automatic reconciliation of divergent writes via the configured resolver

Quorum-based decision making prevents split-brain — if you can’t reach a majority, you can’t accept writes (unless you’re in Eventual mode, where stale-but-available is the deal).

10. Honest Status: What’s Production, What’s Architecture-Complete

Per the source-of-truth check:

active-active crate: 50+ tests, production-ready per its README (“first Postgres-compatible database to offer genuine active-active capabilities”).
multiregion crate: Library and types are production code, but the example file examples/multiregion_setup.rs is currently a stub: “This example requires the ‘multiregion-example’ feature… requires API methods and types not yet fully implemented.” Production deployments today run via the MultiRegionCluster API directly — no end-to-end CLI driver yet.
Per the audit, this is Tier 3 coverage (architecture exists, no full walkthrough). Treat the snippets above as a real API surface but not a copy-paste deploy script.

11. Performance Reference

From the active-active crate’s published benches:

Scenario	Latency	Throughput
Strong, 1 region	50ms	50K ops/sec
Strong, 3 regions	100ms	30K ops/sec
Eventual, 3 regions	10ms	100K ops/sec
Concurrent (100 threads)	80ms	50K ops/sec

Rerun on your own topology with cargo bench inside the crate.

Where Next

raft-setup.md — single-region quorum.
sharding-config.md — combine with horizontal sharding for petabyte scale.
pitr-recovery.md — global PITR across regions.
Source: heliosdb-cluster/crates/active-active/README.md, heliosdb-cluster/crates/multiregion/README.md