Demo 5: Lag-Aware Routing
Demo 5: Lag-Aware Routing
An educational demo showing how HeliosProxy detects replication lag and automatically reroutes reads to healthy standbys, plus read-your-writes (RYW) consistency guarantees.
What This Teaches
Why Replication Lag Matters
PostgreSQL streaming replication is asynchronous by default. A standby can fall seconds (or minutes) behind the primary under heavy write load, network issues, or resource contention. If your application reads from a lagging standby, it gets stale data — a user might create a record, refresh the page, and not see it.
How Lag-Aware Routing Protects Against Stale Reads
HeliosProxy continuously monitors each standby’s replication lag via health checks (every 2s in this demo). When a standby’s lag exceeds max_replica_lag_ms (100ms here), the proxy stops routing reads to it until it catches up. Reads automatically shift to healthy standbys or the primary.
How Read-Your-Writes (RYW) Works
- A client writes (INSERT/UPDATE/DELETE) through the proxy.
- The proxy tags the session with the write’s WAL LSN.
- Subsequent reads in that session are routed only to nodes that have replayed past that LSN — typically the primary or the synchronous standby.
- Once the async standby catches up past the tagged LSN, it becomes eligible for reads again.
This guarantees a user always sees their own changes, even with read/write splitting.
Architecture
┌──────────────────┐ │ HeliosProxy │ │ lag routing + │ │ RYW enabled │ └────┬───┬───┬─────┘ │ │ │ ┌──────────┘ │ └──────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Primary │ │ Sync │ │ Async │ │ pg:65432 │─▶│ Standby │ │ Standby │ │ (writes) │ │ pg:65442 │ │ pg:65462 │ └──────────┘ │ (low lag)│ │(has lag) │ └──────────┘ └──────────┘Step-by-Step
1. Start the cluster
docker compose up -ddocker compose ps # Wait for all 4 services healthy2. Watch routing decisions
./observe.shYou will see reads round-robin across both standbys (sync and async) since lag is near zero.
3. Induce lag on the async standby
In a second terminal:
./induce-lag.sh on # Adds 500ms network delay via tc/netem4. Watch routing shift
Back in the observe.sh terminal, within a few seconds you will see:
- The async standby’s lag climbs above 100ms
- Reads stop going to the async standby
- All reads shift to the sync standby (or primary)
5. Test RYW consistency
The observer automatically tests RYW each iteration:
- It INSERTs a uniquely-tagged row through the proxy
- It immediately SELECTs that row in the same session
- Even with the async standby lagging, the read finds the row because RYW pins the read to the primary or sync standby
6. Remove lag, watch recovery
./induce-lag.sh off # Removes network delayWithin seconds, the async standby catches up and becomes eligible for reads again. The observer shows reads returning to round-robin across both standbys.
7. Check lag status anytime
./induce-lag.sh status # Shows per-node lag from proxy admin API8. Tear down
docker compose down -vPorts
| Service | Port | Purpose |
|---|---|---|
| pg-primary | 65432 | Direct primary access |
| pg-standby-sync | 65442 | Direct sync standby |
| pg-standby-async | 65462 | Direct async standby |
| heliosproxy | 66432 | Proxy client connections |
| heliosproxy admin | 69090 | Admin/metrics API |
Key Configuration
| Setting | Value | Effect |
|---|---|---|
lag_routing_enabled | true | Enable lag-based read routing |
max_replica_lag_ms | 100 | Standbys above 100ms are excluded from reads |
ryw_enabled | true | After a write, reads pin to consistent nodes |
read_strategy | round_robin | Distribute reads across eligible standbys |
check_interval_secs | 2 | Health/lag checks every 2 seconds |