Change Data Capture (CDC) Tutorial
Change Data Capture (CDC) Tutorial
Available since: v3.2.0
Build: default for the in-memory CDC log; sync-experimental for the cross-process sync server
Modules: heliosdb_lite::sync::change_log, heliosdb_lite::sync::conflict, heliosdb_lite::tenant::CDCLog
UVP
CDC turns “what changed” into a stream you can replicate, audit, or feed into an event bus. Lite ships an in-memory change log per tenant, an enhanced conflict-detection system with vector clocks (Update-Update, Update-Delete, Delete-Update, Insert-Insert), and resolution strategies including Last-Write-Wins and vector-clock causal ordering. CDC is the foundation under sync, branch replication, and audit; you can read it directly from your application code today, while the persistence layer lands in v3.6+. Pair with DATABASE_BRANCHING_TUTORIAL for per-branch event streams.
Prerequisites
- HeliosDB Lite v3.2+
- Rust 1.75+
- 15 minutes
1. What CDC Captures
A ChangeEntry (the v2.3 enhanced shape, in sync::conflict):
pub struct ChangeEntry { pub data: Vec<u8>, // serialized row payload pub timestamp: DateTime<Utc>, // when the change happened pub node_id: Uuid, // origin node pub vector_clock: VectorClock, // causality pub operation: ChangeOperation, // Insert | Update | Delete}Three operations, six conflict shapes when merging two replicas:
pub enum ConflictType { UpdateUpdate, UpdateDelete, DeleteUpdate, InsertInsert,}2. The Change Log
sync::change_log::ChangeLog (re-exported as ChangeLogImpl) is an append-only structure keyed by table + row id. Use it directly when you want a per-process audit feed:
use heliosdb_lite::sync::{ChangeLogImpl, ChangeLogEntry, ChangeType, QueryOptions};
let log = ChangeLogImpl::new();
// Application code records a change after a successful write.log.append(ChangeLogEntry { table: "orders".to_string(), row_id: 42, change: ChangeType::Update, payload: serialised_row, timestamp: chrono::Utc::now(),});
// Consumer reads entries since a checkpoint.let opts = QueryOptions { table: Some("orders".to_string()), since: Some(last_seen_timestamp), limit: 100, ..Default::default()};for entry in log.query(opts) { handle(entry);}The exact field names follow the in-tree definitions; see src/sync/change_log.rs for the concrete signatures and the ChangeLogStats struct used by the system view.
3. Vector Clocks for Causality
A VectorClock is a HashMap<Uuid, u64> per node. It answers two questions per pair of clocks:
- Did A happen before B? (
a.happens_before(&b)) - Are A and B concurrent? (
a.conflicts_with(&b))
use heliosdb_lite::sync::VectorClock;use uuid::Uuid;
let node_a = Uuid::new_v4();let node_b = Uuid::new_v4();
let mut clock_a = VectorClock::new();let mut clock_b = VectorClock::new();
// A makes one local edit.clock_a.increment(node_a);
// B observes A's edit and makes its own.clock_b.merge(&clock_a);clock_b.increment(node_b);
assert!(clock_a.happens_before(&clock_b));assert!(!clock_a.conflicts_with(&clock_b));When two clocks satisfy neither happens_before direction, the underlying writes are concurrent — that’s the conflict signal.
4. Conflict Detection
sync::conflict::ConflictDetector consumes ChangeEntry pairs and emits Conflict records with full causal context:
pub struct Conflict { pub id: Uuid, pub conflict_type: ConflictType, pub table: String, pub row_id: RowId, pub local_entry: ChangeEntry, pub remote_entry: ChangeEntry, pub local_vector_clock: VectorClock, pub remote_vector_clock: VectorClock, pub detected_at: DateTime<Utc>,}Resolution strategies live in ConflictResolution:
pub enum ConflictResolution { LastWriteWins, // timestamp + node-id tie-break VectorClockCausal, // happens-before wins; concurrent escalates // ... custom strategies in src/sync/conflict.rs}A ConflictReport records the chosen resolution, the winning side, and a human-readable reason — useful when you have to surface “who won” in an admin UI.
5. Per-Tenant CDC Log
The multi-tenant layer (tenant::CDCLog) wraps the same model with tenant scoping and replication targets:
use heliosdb_lite::tenant::TenantManager;
let manager = TenantManager::new();let tenant = manager.register_tenant( "acme".to_string(), heliosdb_lite::tenant::IsolationMode::SharedSchema,);
// Tenant-scoped CDC log is initialised on registration; emit events through// the tenant API so they automatically respect quota tracking and the// tenant's plan-level CDC feature flag.CDC is gated by the plan’s PlanFeatures::cdc_enabled. The default free plan has it off; starter and above turn it on.
6. Use Cases
Replication to a downstream consumer
fn ship_to_kafka(log: &ChangeLogImpl, mut last_seen: chrono::DateTime<chrono::Utc>) { let opts = QueryOptions { since: Some(last_seen), limit: 1000, ..Default::default() }; for entry in log.query(opts) { kafka_producer.send(entry.table.clone(), entry.payload.clone()); last_seen = entry.timestamp; } checkpoint_save(last_seen);}Audit trail
The change log is already an audit log: every Insert/Update/Delete with origin node, time, and vector clock. Combine with the time-travel snapshots (see TIME_TRAVEL_TUTORIAL) to reconstruct row history end-to-end.
Event sourcing
Treat the log as the source of truth. Periodically materialise current state from the log into a regular table, or use MATERIALIZED VIEW (auto-refresh) over the log to build read models.
Multi-replica sync
Pair the CDC log with sync::SyncClient and sync::SyncServer to run an embedded → cloud sync loop. The protocol is documented in src/sync/protocol.rs (SyncProtocol trait, SyncMessage enum) and is wired up behind the sync-experimental feature flag while it stabilises.
7. Status
The README marks v3.2 CDC as core complete with persistence layer in progress. Today:
- In-memory change log and conflict detection are production-grade.
- Vector clocks, causal ordering, and tie-break resolution are fully tested.
- The disk-backed CDC persistence layer, automatic change-capture hooks, and
RETURNING-tuple capture are tracked for v3.6+ (see roadmap in the README).
That means: feel free to drive replication and audit pipelines off the in-memory log today; expect zero behavioural change once persistence ships. The public types in sync::conflict are stable.
Where Next
- DATABASE_BRANCHING_TUTORIAL — per-branch CDC for branch-to-server replication.
- TIME_TRAVEL_TUTORIAL — the historical-state companion to the change feed.
- RLS_POLICY_ADVANCED — make sure CDC consumers respect tenant boundaries.
- HTTP_SYNC_QUICKSTART — wire the change log to the HTTP sync server.