Skip to content

HeliosDB Nano - Branch Storage User Guide

HeliosDB Nano - Branch Storage User Guide

Overview

HeliosDB Nano’s branch storage provides Git-like database branching with copy-on-write semantics. Create instant branches for development, testing, or staging environments without duplicating data.

Key Features

  • Instant Branch Creation: Create branches in <10ms regardless of database size
  • Copy-on-Write: Modified data is copied only when written, not at branch creation time
  • Storage Efficiency: <2% overhead per branch for metadata, shared storage for unchanged data
  • MVCC Integration: Branch-aware transactions with snapshot isolation guarantees
  • Hierarchical Branches: Support for multi-level branch hierarchies
  • Minimal Read Overhead: <5% read performance overhead for current branch

Architecture

Branch Hierarchy

Branches form a directed acyclic graph (DAG):

main (root)
├── dev
│ └── feature-x
└── staging
└── hotfix-1

Copy-on-Write Mechanism

  1. Branch Creation: Only metadata is created, no data is copied
  2. First Read: Data is read from the branch or parent chain
  3. First Write: Data is copied on first modification (copy-on-write)
  4. Subsequent Operations: Branch operates on its own data

Key Format

Physical keys encode branch information:

data:<branch_id>:<user_key>:<timestamp>
Example:
data:0000000002:users:123:0000000100
│ │ │
│ │ └─ Timestamp
│ └─────────── User key (table:row_id)
└─────────────────────── Branch ID (2 = dev)

Usage Examples

Basic Branch Operations

use heliosdb_nano::{Config, storage::{StorageEngine, BranchOptions}};
// Open database
let config = Config::in_memory();
let engine = StorageEngine::open_in_memory(&config)?;
// Create a development branch
let branch_id = engine.create_branch(
"dev", // Branch name
Some("main"), // Parent branch (None = current)
BranchOptions::default(), // Options
)?;
// List all branches
let branches = engine.list_branches()?;
for branch in branches {
println!("{}: {:?}", branch.name, branch.state);
}
// Drop a branch
engine.drop_branch("dev", false)?;

Branch Transactions

// Begin transaction on a specific branch
let mut tx = engine.begin_branch_transaction("dev")?;
// Read (checks current branch, then parent chain)
let value = tx.get(&b"users:123".to_vec())?;
// Write (copy-on-write)
tx.put(b"users:123".to_vec(), b"new_data".to_vec())?;
// Commit
tx.commit()?;

Branch Isolation Example

// Insert in main branch
engine.put(b"key1", b"main_value")?;
// Create dev branch
engine.create_branch("dev", Some("main"), BranchOptions::default())?;
// Read from dev (sees main's value)
let tx = engine.begin_branch_transaction("dev")?;
assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"main_value".to_vec()));
// Modify in dev
let mut tx = engine.begin_branch_transaction("dev")?;
tx.put(b"key1".to_vec(), b"dev_value".to_vec())?;
tx.commit()?;
// Main branch is unchanged
assert_eq!(engine.get(b"key1")?, Some(b"main_value".to_vec()));
// Dev branch has new value
let tx = engine.begin_branch_transaction("dev")?;
assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"dev_value".to_vec()));

Hierarchical Branches

// Create branch hierarchy: main -> dev -> feature
engine.create_branch("dev", Some("main"), BranchOptions::default())?;
engine.create_branch("feature", Some("dev"), BranchOptions::default())?;
// Write to main
engine.put(b"config", b"production")?;
// Read from feature branch (traverses: feature -> dev -> main)
let tx = engine.begin_branch_transaction("feature")?;
assert_eq!(tx.get(&b"config".to_vec())?, Some(b"production".to_vec()));
// Write to dev
let mut tx = engine.begin_branch_transaction("dev")?;
tx.put(b"config".to_vec(), b"development".to_vec())?;
tx.commit()?;
// Feature now sees dev's value
let tx = engine.begin_branch_transaction("feature")?;
assert_eq!(tx.get(&b"config".to_vec())?, Some(b"development".to_vec()));

Branch Options

use std::collections::HashMap;
use heliosdb_nano::storage::BranchOptions;
let mut metadata = HashMap::new();
metadata.insert("owner".to_string(), "alice".to_string());
metadata.insert("purpose".to_string(), "feature-dev".to_string());
let options = BranchOptions {
replication_factor: Some(3), // For distributed mode
region: Some("us-west".to_string()),
metadata,
};
engine.create_branch("feature", Some("dev"), options)?;

Branch Metadata

Each branch has comprehensive metadata:

let branch = engine.get_branch("dev")?;
println!("Name: {}", branch.name);
println!("Branch ID: {}", branch.branch_id);
println!("Parent ID: {:?}", branch.parent_id);
println!("Created at: {}", branch.created_at);
println!("State: {:?}", branch.state);
println!("Modified keys: {}", branch.stats.modified_keys);
println!("Storage bytes: {}", branch.stats.storage_bytes);

Performance Characteristics

Branch Creation

  • Latency: <10ms regardless of database size
  • Throughput: 1000+ branches/second
  • Storage: ~500 bytes of metadata per branch

Read Operations

  • Current Branch Hit: <0.1ms overhead vs. non-branched read
  • Parent Chain Lookup: <0.5ms overhead (proportional to chain depth)
  • Throughput: ~95% of non-branched read throughput

Write Operations

  • Copy-on-Write: <0.2ms overhead for first write to a key
  • Subsequent Writes: Same as non-branched write
  • Throughput: ~95% of non-branched write throughput

Storage Overhead

Example: 1GB database, 5 branches, 10% data modified per branch
Original: 1GB
Branches: 5 × (500 bytes metadata + 100MB data) ≈ 500MB
Total: 1.5GB (50% overhead, 10% per modified data as expected)

Best Practices

1. Branch Naming

Use descriptive, hierarchical names:

main
├── dev
├── staging
└── production-fixes
└── hotfix-2024-11-18

2. Branch Lifecycle

// Create for specific purpose
let branch_id = engine.create_branch("feature-auth", Some("dev"), options)?;
// Do work...
let mut tx = engine.begin_branch_transaction("feature-auth")?;
// ... perform operations
tx.commit()?;
// Clean up when done
engine.drop_branch("feature-auth", false)?;

3. Avoid Deep Hierarchies

Keep branch hierarchies shallow (≤5 levels) for optimal read performance:

✓ Good: main -> dev -> feature
✗ Avoid: main -> dev -> team -> user -> feature -> sub-feature

4. Regular Cleanup

Drop merged or abandoned branches:

// Get all branches
let branches = engine.list_branches()?;
// Drop inactive branches
for branch in branches {
if should_cleanup(&branch) {
engine.drop_branch(&branch.name, false)?;
}
}

Limitations

Current Limitations

  1. No Merge Support: Merge functionality is not yet implemented
  2. No Garbage Collection: Dropped branch data is marked but not yet cleaned up
  3. No Branch Permissions: All branches have the same access level
  4. No Distributed Branching: Branches are local to a single node

Cannot Drop Rules

  • Main Branch: Cannot drop the main (root) branch
  • Parent with Children: Cannot drop a branch that has child branches
// This will fail - main cannot be dropped
engine.drop_branch("main", false)?; // Error
// This will fail - dev has child 'feature'
engine.create_branch("dev", Some("main"), options)?;
engine.create_branch("feature", Some("dev"), options)?;
engine.drop_branch("dev", false)?; // Error: has children

Troubleshooting

Branch Not Found

match engine.get_branch("unknown") {
Ok(branch) => println!("Found: {}", branch.name),
Err(e) => println!("Error: {}", e), // "Branch 'unknown' not found"
}

Cannot Drop Branch

// Check if branch has children
let branch = engine.get_branch("dev")?;
// Manual check via metadata would be needed
// Or use if_exists flag
engine.drop_branch("dev", true)?; // No error if not exists

Read Performance Issues

If reads are slow on a branch:

  1. Check hierarchy depth: Deep hierarchies cause multiple lookups
  2. Verify parent chain: Each parent adds ~0.1ms overhead
  3. Consider flattening: Recreate branch from main if too deep

Implementation Details

Key Components

  1. BranchManager: Manages branch metadata and lifecycle
  2. BranchTransaction: Branch-aware MVCC transactions
  3. BranchRegistry: Global branch ID registry
  4. Parent Chain Cache: Cached parent relationships for fast lookups

Thread Safety

All branch operations are thread-safe:

use std::sync::Arc;
use std::thread;
let engine = Arc::new(StorageEngine::open_in_memory(&config)?);
// Safe concurrent access
let handles: Vec<_> = (0..10).map(|i| {
let engine = Arc::clone(&engine);
thread::spawn(move || {
let mut tx = engine.begin_branch_transaction("dev").unwrap();
tx.put(format!("key{}", i).into_bytes(), b"value".to_vec()).unwrap();
tx.commit().unwrap();
})
}).collect();
for handle in handles {
handle.join().unwrap();
}

Future Enhancements

Planned Features

  1. Branch Merging: Three-way merge with conflict detection
  2. Garbage Collection: Automatic cleanup of dropped branch data
  3. Branch Snapshots: Create lightweight snapshots within branches
  4. Branch Permissions: Fine-grained access control per branch
  5. Distributed Branching: Cross-region branch replication
  6. Branch Triggers: Execute code on branch events (create, merge, drop)

SQL Integration (Future)

-- Create branch
CREATE DATABASE BRANCH dev FROM main AS OF NOW;
-- Switch to branch
SET branch = dev;
-- Merge branch
MERGE DATABASE BRANCH dev INTO main
WITH (
conflict_resolution = 'source_wins',
delete_branch_after = true
);
-- Drop branch
DROP DATABASE BRANCH dev;
-- List branches
SELECT * FROM pg_database_branches();

See Also

Conclusion

Branch storage in HeliosDB Nano provides a powerful, efficient way to manage database variants with minimal overhead. The copy-on-write architecture ensures instant branch creation while maintaining strong isolation guarantees and excellent performance.

For questions or issues, please refer to the architecture document or open an issue on GitHub.