Skip to content

Change Data Capture (CDC) Tutorial

Change Data Capture (CDC) Tutorial

Available since: v3.2.0 Build: default for the in-memory CDC log; sync-experimental for the cross-process sync server Modules: heliosdb_lite::sync::change_log, heliosdb_lite::sync::conflict, heliosdb_lite::tenant::CDCLog


UVP

CDC turns “what changed” into a stream you can replicate, audit, or feed into an event bus. Lite ships an in-memory change log per tenant, an enhanced conflict-detection system with vector clocks (Update-Update, Update-Delete, Delete-Update, Insert-Insert), and resolution strategies including Last-Write-Wins and vector-clock causal ordering. CDC is the foundation under sync, branch replication, and audit; you can read it directly from your application code today, while the persistence layer lands in v3.6+. Pair with DATABASE_BRANCHING_TUTORIAL for per-branch event streams.


Prerequisites

  • HeliosDB Lite v3.2+
  • Rust 1.75+
  • 15 minutes

1. What CDC Captures

A ChangeEntry (the v2.3 enhanced shape, in sync::conflict):

pub struct ChangeEntry {
pub data: Vec<u8>, // serialized row payload
pub timestamp: DateTime<Utc>, // when the change happened
pub node_id: Uuid, // origin node
pub vector_clock: VectorClock, // causality
pub operation: ChangeOperation, // Insert | Update | Delete
}

Three operations, six conflict shapes when merging two replicas:

pub enum ConflictType {
UpdateUpdate,
UpdateDelete,
DeleteUpdate,
InsertInsert,
}

2. The Change Log

sync::change_log::ChangeLog (re-exported as ChangeLogImpl) is an append-only structure keyed by table + row id. Use it directly when you want a per-process audit feed:

use heliosdb_lite::sync::{ChangeLogImpl, ChangeLogEntry, ChangeType, QueryOptions};
let log = ChangeLogImpl::new();
// Application code records a change after a successful write.
log.append(ChangeLogEntry {
table: "orders".to_string(),
row_id: 42,
change: ChangeType::Update,
payload: serialised_row,
timestamp: chrono::Utc::now(),
});
// Consumer reads entries since a checkpoint.
let opts = QueryOptions {
table: Some("orders".to_string()),
since: Some(last_seen_timestamp),
limit: 100,
..Default::default()
};
for entry in log.query(opts) {
handle(entry);
}

The exact field names follow the in-tree definitions; see src/sync/change_log.rs for the concrete signatures and the ChangeLogStats struct used by the system view.


3. Vector Clocks for Causality

A VectorClock is a HashMap<Uuid, u64> per node. It answers two questions per pair of clocks:

  • Did A happen before B? (a.happens_before(&b))
  • Are A and B concurrent? (a.conflicts_with(&b))
use heliosdb_lite::sync::VectorClock;
use uuid::Uuid;
let node_a = Uuid::new_v4();
let node_b = Uuid::new_v4();
let mut clock_a = VectorClock::new();
let mut clock_b = VectorClock::new();
// A makes one local edit.
clock_a.increment(node_a);
// B observes A's edit and makes its own.
clock_b.merge(&clock_a);
clock_b.increment(node_b);
assert!(clock_a.happens_before(&clock_b));
assert!(!clock_a.conflicts_with(&clock_b));

When two clocks satisfy neither happens_before direction, the underlying writes are concurrent — that’s the conflict signal.


4. Conflict Detection

sync::conflict::ConflictDetector consumes ChangeEntry pairs and emits Conflict records with full causal context:

pub struct Conflict {
pub id: Uuid,
pub conflict_type: ConflictType,
pub table: String,
pub row_id: RowId,
pub local_entry: ChangeEntry,
pub remote_entry: ChangeEntry,
pub local_vector_clock: VectorClock,
pub remote_vector_clock: VectorClock,
pub detected_at: DateTime<Utc>,
}

Resolution strategies live in ConflictResolution:

pub enum ConflictResolution {
LastWriteWins, // timestamp + node-id tie-break
VectorClockCausal, // happens-before wins; concurrent escalates
// ... custom strategies in src/sync/conflict.rs
}

A ConflictReport records the chosen resolution, the winning side, and a human-readable reason — useful when you have to surface “who won” in an admin UI.


5. Per-Tenant CDC Log

The multi-tenant layer (tenant::CDCLog) wraps the same model with tenant scoping and replication targets:

use heliosdb_lite::tenant::TenantManager;
let manager = TenantManager::new();
let tenant = manager.register_tenant(
"acme".to_string(),
heliosdb_lite::tenant::IsolationMode::SharedSchema,
);
// Tenant-scoped CDC log is initialised on registration; emit events through
// the tenant API so they automatically respect quota tracking and the
// tenant's plan-level CDC feature flag.

CDC is gated by the plan’s PlanFeatures::cdc_enabled. The default free plan has it off; starter and above turn it on.


6. Use Cases

Replication to a downstream consumer

fn ship_to_kafka(log: &ChangeLogImpl, mut last_seen: chrono::DateTime<chrono::Utc>) {
let opts = QueryOptions {
since: Some(last_seen),
limit: 1000,
..Default::default()
};
for entry in log.query(opts) {
kafka_producer.send(entry.table.clone(), entry.payload.clone());
last_seen = entry.timestamp;
}
checkpoint_save(last_seen);
}

Audit trail

The change log is already an audit log: every Insert/Update/Delete with origin node, time, and vector clock. Combine with the time-travel snapshots (see TIME_TRAVEL_TUTORIAL) to reconstruct row history end-to-end.

Event sourcing

Treat the log as the source of truth. Periodically materialise current state from the log into a regular table, or use MATERIALIZED VIEW (auto-refresh) over the log to build read models.

Multi-replica sync

Pair the CDC log with sync::SyncClient and sync::SyncServer to run an embedded → cloud sync loop. The protocol is documented in src/sync/protocol.rs (SyncProtocol trait, SyncMessage enum) and is wired up behind the sync-experimental feature flag while it stabilises.


7. Status

The README marks v3.2 CDC as core complete with persistence layer in progress. Today:

  • In-memory change log and conflict detection are production-grade.
  • Vector clocks, causal ordering, and tie-break resolution are fully tested.
  • The disk-backed CDC persistence layer, automatic change-capture hooks, and RETURNING-tuple capture are tracked for v3.6+ (see roadmap in the README).

That means: feel free to drive replication and audit pipelines off the in-memory log today; expect zero behavioural change once persistence ships. The public types in sync::conflict are stable.


Where Next