Branch Storage Architecture - Part 1 of 3
Branch Storage Architecture - Part 1 of 3
Architecture Design
Navigation: Index | Part 1 | Part 2 →
HeliosDB Nano - Branch Storage Architecture
Copy-on-Write Branch Storage Backend Design
Version: 2.2 Created: November 18, 2025 Status: Architecture Design Document Target: HeliosDB Nano v0.2.0 Scope: Database branching storage backend with efficient copy-on-write
Executive Summary
This document defines the storage architecture for HeliosDB Nano’s database branching feature, implementing efficient copy-on-write (CoW) semantics with minimal storage overhead. The design leverages RocksDB’s MVCC capabilities and introduces a hierarchical branch metadata system that enables instant branch creation, efficient storage sharing, and conflict-aware merging.
Key Design Principles:
- Zero-Copy Branch Creation - Branches are created instantly via metadata only
- Shared Storage - Unchanged data is shared between branches (CoW)
- MVCC Integration - Branches integrate seamlessly with existing snapshot isolation
- Efficient Garbage Collection - Automatic cleanup of unreferenced versions
- Conflict Detection - Three-way merge with configurable resolution strategies
Performance Targets:
- Branch Creation: <10ms regardless of database size
- Branch Read Overhead: <5% vs. non-branched reads
- Storage Overhead: <2% per branch for metadata
- Merge Performance: Linear in changed data (not total data)
Table of Contents
- Architecture Overview
- Branch Metadata Schema
- Copy-on-Write Mechanism
- Key Space Design
- Data Structures
- API Design
- Branch Isolation
- Merge Strategies
- Garbage Collection
- MVCC Integration
- Performance Considerations
- Testing Strategy
1. Architecture Overview
1.1 High-Level Design
┌─────────────────────────────────────────────────────────────────┐│ Branch Storage Architecture │├─────────────────────────────────────────────────────────────────┤│ ││ ┌──────────────────────────────────────────────────────────┐ ││ │ Branch Manager │ ││ │ • Branch metadata CRUD │ ││ │ • Branch lineage tracking │ ││ │ • Merge coordination │ ││ └──────────────────────┬───────────────────────────────────┘ ││ │ ││ ┌──────────────────────▼───────────────────────────────────┐ ││ │ Copy-on-Write Layer │ ││ │ • Versioned key resolution │ ││ │ • Branch-aware reads/writes │ ││ │ • Version visibility rules │ ││ └──────────────────────┬───────────────────────────────────┘ ││ │ ││ ┌──────────────────────▼───────────────────────────────────┐ ││ │ MVCC Storage Layer │ ││ │ • Snapshot isolation (existing) │ ││ │ • Multi-version tuples │ ││ │ • Timestamp-based visibility │ ││ └──────────────────────┬───────────────────────────────────┘ ││ │ ││ ┌──────────────────────▼───────────────────────────────────┐ ││ │ RocksDB Backend │ ││ │ • Key-value storage │ ││ │ • SST files & memtables │ ││ │ • Compaction & compression │ ││ └───────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘1.2 Branch Hierarchy Model
Branches form a directed acyclic graph (DAG) where:
- Root Branch (
main): The default branch, created automatically - Child Branches: Created from a parent branch at a specific point in time
- Orphan Branches: Branches whose parent has been dropped (still valid)
Example Branch DAG:
main (root) │ ├─────┐ │ │ dev staging@T100 │ │ │ └─── feature-x@T150 │ └─── hotfix@T2001.3 Copy-on-Write Principles
- Branch Creation: Copy metadata only, no data duplication
- First Write: Data is copied on first modification in a branch
- Read Priority: Reads check branch-specific versions first, then parent
- Storage Sharing: Unchanged data remains shared across branches
- Isolation: Each branch maintains independent version history
2. Branch Metadata Schema
2.1 Branch Metadata Structure
Each branch has comprehensive metadata stored in RocksDB:
Key: branch:meta:<branch_name>Value: BranchMetadata (bincode serialized)
BranchMetadata { name: String, branch_id: BranchId, // Unique identifier parent_id: Option<BranchId>, // None for root branch created_at: Timestamp, // Creation timestamp created_from_snapshot: SnapshotId, // Parent snapshot at branch point state: BranchState, // Active, Dropped, Merged merge_base: Option<SnapshotId>, // Last merge point (for merge tracking) metadata: BranchMetadataExtras, // Options, stats, etc.}2.2 Branch State Transitions
┌─────────┐ │ Create │ └────┬────┘ │ ▼ ┌──────────┐ Merge ┌────────┐ │ Active ├─────────────────▶│ Merged │ └─────┬────┘ └────────┘ │ │ Drop ▼ ┌─────────┐ │ Dropped │ └─────────┘
States:- Active: Branch is active, can be read/written- Merged: Branch has been merged into target (read-only)- Dropped: Branch has been dropped (GC eligible)2.3 Branch Registry
Global branch registry for efficient lookup:
Key: branch:registryValue: BranchRegistry (bincode serialized)
BranchRegistry { branches: HashMap<BranchId, BranchName>, main_branch: BranchId, next_branch_id: u64,}2.4 Branch Lineage Index
Parent-child relationship index for GC and merge:
Key: branch:children:<parent_branch_id>Value: Vec<BranchId> (bincode serialized)
Enables:- Finding all children of a branch- Preventing parent drop while children exist- Efficient lineage traversal3. Copy-on-Write Mechanism
3.1 Key Versioning Strategy
Each data key is versioned with branch and timestamp information:
Physical Key Format:┌──────────┬─────────┬────────────┬───────────┬───────────┐│ Prefix │ BranchID│ UserKey │ Timestamp │ Version ││ (data:) │ (u64) │ (variable) │ (u64) │ (u32) │└──────────┴─────────┴────────────┴───────────┴───────────┘
Example:data:0000000001:users:123:0000000100:00000001 │ │ │ │ │ │ │ └─ Version counter │ │ └─────────── Timestamp │ └───────────────────── User key (table:row_id) └─────────────────────────────── Branch ID (1 = main)
Benefits:- Efficient range scans per branch- Natural clustering of branch data- Simple visibility checks3.2 Write Path - Copy-on-Write
fn write_to_branch( branch_id: BranchId, user_key: &str, value: &[u8], timestamp: Timestamp,) -> Result<()> { // 1. Check if key exists in current branch let existing = get_latest_version_in_branch(branch_id, user_key)?;
if existing.is_some() { // Key already modified in this branch - overwrite let physical_key = encode_key(branch_id, user_key, timestamp); db.put(physical_key, value)?; } else { // 2. Check parent branches for existing version let parent_version = get_from_parent_chain(branch_id, user_key)?;
// 3. Copy-on-write: Create new version in current branch let physical_key = encode_key(branch_id, user_key, timestamp); db.put(physical_key, value)?;
// Parent version remains unchanged (shared storage) }
Ok(())}3.3 Read Path - Branch Chain Resolution
fn read_from_branch( branch_id: BranchId, user_key: &str, snapshot: SnapshotId,) -> Result<Option<Value>> { // 1. Try current branch first if let Some(value) = get_visible_version_in_branch( branch_id, user_key, snapshot )? { return Ok(Some(value)); }
// 2. Walk up parent chain let mut current_branch = branch_id; while let Some(parent) = get_parent_branch(current_branch)? { // Get parent snapshot at branch point let branch_meta = get_branch_metadata(current_branch)?; let parent_snapshot = branch_meta.created_from_snapshot;
// Read from parent at its snapshot if let Some(value) = get_visible_version_in_branch( parent.branch_id, user_key, parent_snapshot )? { return Ok(Some(value)); }
current_branch = parent.branch_id; }
// 3. Not found in any branch Ok(None)}3.4 Storage Deduplication
Unchanged data is automatically shared:
Before Branch:main: key1 -> v1, key2 -> v2, key3 -> v3
After CREATE BRANCH dev FROM main:main: key1 -> v1, key2 -> v2, key3 -> v3dev: (empty - all reads delegate to main)
After UPDATE key2 in dev:main: key1 -> v1, key2 -> v2, key3 -> v3dev: key2 -> v2' (only modified data stored) key1, key3 still read from main
Storage Overhead: 1 value + metadata (~100 bytes)4. Key Space Design
4.1 Key Prefixes
All keys use prefixed namespaces for organization:
// Branch metadataconst BRANCH_META_PREFIX: &[u8] = b"branch:meta:";const BRANCH_REGISTRY: &[u8] = b"branch:registry";const BRANCH_CHILDREN_PREFIX: &[u8] = b"branch:children:";
// Branch data (versioned)const BRANCH_DATA_PREFIX: &[u8] = b"data:";
// Branch catalog (table schemas per branch)const BRANCH_CATALOG_PREFIX: &[u8] = b"branch:catalog:";
// Branch counters (row IDs per branch)const BRANCH_COUNTER_PREFIX: &[u8] = b"branch:counter:";
// GC trackingconst GC_REFS_PREFIX: &[u8] = b"gc:refs:";const GC_PENDING_PREFIX: &[u8] = b"gc:pending:";4.2 Key Encoding Functions
/// Encode a branch data keyfn encode_branch_data_key( branch_id: BranchId, user_key: &str, timestamp: Timestamp,) -> Vec<u8> { let mut key = Vec::new(); key.extend_from_slice(BRANCH_DATA_PREFIX); key.extend_from_slice(&branch_id.to_be_bytes()); key.push(b':'); key.extend_from_slice(user_key.as_bytes()); key.push(b':'); key.extend_from_slice(×tamp.to_be_bytes()); key}
/// Decode a branch data keyfn decode_branch_data_key(key: &[u8]) -> (BranchId, String, Timestamp) { // Parse components from key // Returns (branch_id, user_key, timestamp)}
/// Encode a branch metadata keyfn encode_branch_meta_key(branch_name: &str) -> Vec<u8> { let mut key = Vec::new(); key.extend_from_slice(BRANCH_META_PREFIX); key.extend_from_slice(branch_name.as_bytes()); key}4.3 Key Ordering Properties
RocksDB’s lexicographic key ordering provides natural clustering:
Sorted Key Space:┌─────────────────────────────────────────────────────────┐│ branch:children:1 -> [2, 3] ││ branch:children:2 -> [4] ││ branch:meta:dev -> BranchMetadata { ... } ││ branch:meta:main -> BranchMetadata { ... } ││ branch:registry -> BranchRegistry { ... } ││ data:0000000001:table1:row1:0000000100 -> value │ ← main branch│ data:0000000001:table1:row2:0000000105 -> value ││ data:0000000002:table1:row1:0000000120 -> value' │ ← dev branch│ data:0000000002:table1:row3:0000000125 -> value │└─────────────────────────────────────────────────────────┘
Benefits:- Sequential scans within a branch- Efficient prefix searches- Natural compaction grouping5. Data Structures
5.1 Core Rust Structures
/// Branch identifier (globally unique)pub type BranchId = u64;
/// Branch metadata#[derive(Debug, Clone, Serialize, Deserialize)]pub struct BranchMetadata { /// Branch name (user-visible) pub name: String,
/// Unique branch ID pub branch_id: BranchId,
/// Parent branch ID (None for root) pub parent_id: Option<BranchId>,
/// Creation timestamp pub created_at: Timestamp,
/// Snapshot at branch point pub created_from_snapshot: SnapshotId,
/// Current state pub state: BranchState,
/// Last merge base (for three-way merge) pub merge_base: Option<SnapshotId>,
/// Additional metadata pub options: BranchOptions,
/// Statistics pub stats: BranchStats,}
/// Branch state#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]pub enum BranchState { /// Branch is active Active,
/// Branch has been merged Merged { /// Target branch into_branch: BranchId, /// Merge timestamp at_timestamp: Timestamp, },
/// Branch has been dropped Dropped { /// Drop timestamp at_timestamp: Timestamp, },}
/// Branch creation/merge options#[derive(Debug, Clone, Default, Serialize, Deserialize)]pub struct BranchOptions { /// Replication factor (for distributed mode) pub replication_factor: Option<usize>,
/// Region hint (for distributed mode) pub region: Option<String>,
/// Custom metadata pub metadata: HashMap<String, String>,}
/// Branch statistics#[derive(Debug, Clone, Default, Serialize, Deserialize)]pub struct BranchStats { /// Number of modified keys in this branch pub modified_keys: u64,
/// Approximate storage size (bytes) pub storage_bytes: u64,
/// Number of commits in this branch pub commit_count: u64,
/// Last activity timestamp pub last_modified: Timestamp,}
/// Branch registry#[derive(Debug, Clone, Serialize, Deserialize)]pub struct BranchRegistry { /// Map of branch ID -> branch name pub branches: HashMap<BranchId, String>,
/// Main branch ID pub main_branch: BranchId,
/// Next available branch ID pub next_branch_id: u64,}
impl BranchRegistry { /// Get next branch ID (auto-increment) pub fn next_id(&mut self) -> BranchId { let id = self.next_branch_id; self.next_branch_id += 1; id }}5.2 Branch Manager Structure
/// Branch manager - coordinates all branch operationspub struct BranchManager { /// Storage engine reference storage: Arc<StorageEngine>,
/// Branch registry (cached) registry: Arc<RwLock<BranchRegistry>>,
/// Branch metadata cache metadata_cache: Arc<RwLock<HashMap<BranchId, BranchMetadata>>>,}
impl BranchManager { /// Create new branch manager pub fn new(storage: Arc<StorageEngine>) -> Result<Self> { let registry = Self::load_or_create_registry(&storage)?; Ok(Self { storage, registry: Arc::new(RwLock::new(registry)), metadata_cache: Arc::new(RwLock::new(HashMap::new())), }) }
/// Create a new branch pub fn create_branch( &self, name: &str, parent: Option<&str>, as_of: AsOfClause, options: BranchOptions, ) -> Result<BranchId>;
/// Drop a branch pub fn drop_branch(&self, name: &str, if_exists: bool) -> Result<()>;
/// Merge branches pub fn merge_branch( &self, source: &str, target: &str, options: MergeOptions, ) -> Result<MergeResult>;
/// Get branch metadata pub fn get_branch(&self, name: &str) -> Result<BranchMetadata>;
/// List all branches pub fn list_branches(&self) -> Result<Vec<BranchMetadata>>;
/// Get current branch (from context) pub fn current_branch(&self) -> Result<BranchId>;}5.3 Branch Transaction Structure
/// Branch-aware transactionpub struct BranchTransaction { /// Underlying MVCC transaction tx: Transaction,
/// Branch ID branch_id: BranchId,
/// Branch metadata (cached) branch_meta: BranchMetadata,
/// Parent chain (cached for reads) parent_chain: Vec<(BranchId, SnapshotId)>,}
impl BranchTransaction { /// Read from branch with parent fallback pub fn get(&self, key: &Key) -> Result<Option<Vec<u8>>>;
/// Write to branch (copy-on-write) pub fn put(&mut self, key: Key, value: Vec<u8>) -> Result<()>;
/// Delete from branch pub fn delete(&mut self, key: Key) -> Result<()>;
/// Commit transaction pub fn commit(self) -> Result<()>;
/// Rollback transaction pub fn rollback(self) -> Result<()>;}6. API Design
6.1 Branch Management API
/// Branch creationpub fn create_branch( storage: &StorageEngine, branch_name: &str, parent: Option<&str>, as_of: AsOfClause, options: BranchOptions,) -> Result<BranchId> { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;
// 1. Validate branch name if branch_mgr.get_branch(branch_name).is_ok() { return Err(Error::branch_exists(branch_name)); }
// 2. Resolve parent branch let parent_id = if let Some(parent_name) = parent { Some(branch_mgr.get_branch(parent_name)?.branch_id) } else { // None = CURRENT branch from context Some(branch_mgr.current_branch()?) };
// 3. Resolve AS OF snapshot let snapshot_id = resolve_as_of_clause(&as_of, storage)?;
// 4. Allocate branch ID let mut registry = branch_mgr.registry.write(); let branch_id = registry.next_id();
// 5. Create metadata let metadata = BranchMetadata { name: branch_name.to_string(), branch_id, parent_id, created_at: storage.current_timestamp(), created_from_snapshot: snapshot_id, state: BranchState::Active, merge_base: None, options, stats: BranchStats::default(), };
// 6. Persist metadata let key = encode_branch_meta_key(branch_name); let value = bincode::serialize(&metadata)?; storage.put(&key, &value)?;
// 7. Update registry registry.branches.insert(branch_id, branch_name.to_string()); let registry_value = bincode::serialize(&*registry)?; storage.put(BRANCH_REGISTRY, ®istry_value)?;
// 8. Update parent's children list if let Some(parent_id) = parent_id { add_child_branch(storage, parent_id, branch_id)?; }
Ok(branch_id)}
/// Branch deletionpub fn drop_branch( storage: &StorageEngine, branch_name: &str, if_exists: bool,) -> Result<()> { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;
// 1. Get branch metadata let metadata = match branch_mgr.get_branch(branch_name) { Ok(meta) => meta, Err(_) if if_exists => return Ok(()), Err(e) => return Err(e), };
// 2. Prevent dropping main branch if metadata.branch_id == branch_mgr.registry.read().main_branch { return Err(Error::cannot_drop_main_branch()); }
// 3. Check for child branches let children = get_child_branches(storage, metadata.branch_id)?; if !children.is_empty() { return Err(Error::branch_has_children( branch_name, children.len(), )); }
// 4. Mark as dropped (soft delete) let mut updated_meta = metadata.clone(); updated_meta.state = BranchState::Dropped { at_timestamp: storage.current_timestamp(), };
let key = encode_branch_meta_key(branch_name); let value = bincode::serialize(&updated_meta)?; storage.put(&key, &value)?;
// 5. Remove from registry let mut registry = branch_mgr.registry.write(); registry.branches.remove(&metadata.branch_id); let registry_value = bincode::serialize(&*registry)?; storage.put(BRANCH_REGISTRY, ®istry_value)?;
// 6. Schedule GC for branch data schedule_branch_gc(storage, metadata.branch_id)?;
Ok(())}
/// Branch mergingpub fn merge_branch( storage: &StorageEngine, source_name: &str, target_name: &str, options: MergeOptions,) -> Result<MergeResult> { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;
// 1. Get source and target metadata let source = branch_mgr.get_branch(source_name)?; let target = branch_mgr.get_branch(target_name)?;