HeliosDB Nano - Branch Storage User Guide
HeliosDB Nano - Branch Storage User Guide
Overview
HeliosDB Nano’s branch storage provides Git-like database branching with copy-on-write semantics. Create instant branches for development, testing, or staging environments without duplicating data.
Key Features
- Instant Branch Creation: Create branches in <10ms regardless of database size
- Copy-on-Write: Modified data is copied only when written, not at branch creation time
- Storage Efficiency: <2% overhead per branch for metadata, shared storage for unchanged data
- MVCC Integration: Branch-aware transactions with snapshot isolation guarantees
- Hierarchical Branches: Support for multi-level branch hierarchies
- Minimal Read Overhead: <5% read performance overhead for current branch
Architecture
Branch Hierarchy
Branches form a directed acyclic graph (DAG):
main (root) │ ├── dev │ └── feature-x │ └── staging └── hotfix-1Copy-on-Write Mechanism
- Branch Creation: Only metadata is created, no data is copied
- First Read: Data is read from the branch or parent chain
- First Write: Data is copied on first modification (copy-on-write)
- Subsequent Operations: Branch operates on its own data
Key Format
Physical keys encode branch information:
data:<branch_id>:<user_key>:<timestamp>
Example:data:0000000002:users:123:0000000100 │ │ │ │ │ └─ Timestamp │ └─────────── User key (table:row_id) └─────────────────────── Branch ID (2 = dev)Usage Examples
Basic Branch Operations
use heliosdb_nano::{Config, storage::{StorageEngine, BranchOptions}};
// Open databaselet config = Config::in_memory();let engine = StorageEngine::open_in_memory(&config)?;
// Create a development branchlet branch_id = engine.create_branch( "dev", // Branch name Some("main"), // Parent branch (None = current) BranchOptions::default(), // Options)?;
// List all brancheslet branches = engine.list_branches()?;for branch in branches { println!("{}: {:?}", branch.name, branch.state);}
// Drop a branchengine.drop_branch("dev", false)?;Branch Transactions
// Begin transaction on a specific branchlet mut tx = engine.begin_branch_transaction("dev")?;
// Read (checks current branch, then parent chain)let value = tx.get(&b"users:123".to_vec())?;
// Write (copy-on-write)tx.put(b"users:123".to_vec(), b"new_data".to_vec())?;
// Committx.commit()?;Branch Isolation Example
// Insert in main branchengine.put(b"key1", b"main_value")?;
// Create dev branchengine.create_branch("dev", Some("main"), BranchOptions::default())?;
// Read from dev (sees main's value)let tx = engine.begin_branch_transaction("dev")?;assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"main_value".to_vec()));
// Modify in devlet mut tx = engine.begin_branch_transaction("dev")?;tx.put(b"key1".to_vec(), b"dev_value".to_vec())?;tx.commit()?;
// Main branch is unchangedassert_eq!(engine.get(b"key1")?, Some(b"main_value".to_vec()));
// Dev branch has new valuelet tx = engine.begin_branch_transaction("dev")?;assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"dev_value".to_vec()));Hierarchical Branches
// Create branch hierarchy: main -> dev -> featureengine.create_branch("dev", Some("main"), BranchOptions::default())?;engine.create_branch("feature", Some("dev"), BranchOptions::default())?;
// Write to mainengine.put(b"config", b"production")?;
// Read from feature branch (traverses: feature -> dev -> main)let tx = engine.begin_branch_transaction("feature")?;assert_eq!(tx.get(&b"config".to_vec())?, Some(b"production".to_vec()));
// Write to devlet mut tx = engine.begin_branch_transaction("dev")?;tx.put(b"config".to_vec(), b"development".to_vec())?;tx.commit()?;
// Feature now sees dev's valuelet tx = engine.begin_branch_transaction("feature")?;assert_eq!(tx.get(&b"config".to_vec())?, Some(b"development".to_vec()));Branch Options
use std::collections::HashMap;use heliosdb_nano::storage::BranchOptions;
let mut metadata = HashMap::new();metadata.insert("owner".to_string(), "alice".to_string());metadata.insert("purpose".to_string(), "feature-dev".to_string());
let options = BranchOptions { replication_factor: Some(3), // For distributed mode region: Some("us-west".to_string()), metadata,};
engine.create_branch("feature", Some("dev"), options)?;Branch Metadata
Each branch has comprehensive metadata:
let branch = engine.get_branch("dev")?;
println!("Name: {}", branch.name);println!("Branch ID: {}", branch.branch_id);println!("Parent ID: {:?}", branch.parent_id);println!("Created at: {}", branch.created_at);println!("State: {:?}", branch.state);println!("Modified keys: {}", branch.stats.modified_keys);println!("Storage bytes: {}", branch.stats.storage_bytes);Performance Characteristics
Branch Creation
- Latency: <10ms regardless of database size
- Throughput: 1000+ branches/second
- Storage: ~500 bytes of metadata per branch
Read Operations
- Current Branch Hit: <0.1ms overhead vs. non-branched read
- Parent Chain Lookup: <0.5ms overhead (proportional to chain depth)
- Throughput: ~95% of non-branched read throughput
Write Operations
- Copy-on-Write: <0.2ms overhead for first write to a key
- Subsequent Writes: Same as non-branched write
- Throughput: ~95% of non-branched write throughput
Storage Overhead
Example: 1GB database, 5 branches, 10% data modified per branch
Original: 1GBBranches: 5 × (500 bytes metadata + 100MB data) ≈ 500MBTotal: 1.5GB (50% overhead, 10% per modified data as expected)Best Practices
1. Branch Naming
Use descriptive, hierarchical names:
main├── dev├── staging└── production-fixes └── hotfix-2024-11-182. Branch Lifecycle
// Create for specific purposelet branch_id = engine.create_branch("feature-auth", Some("dev"), options)?;
// Do work...let mut tx = engine.begin_branch_transaction("feature-auth")?;// ... perform operationstx.commit()?;
// Clean up when doneengine.drop_branch("feature-auth", false)?;3. Avoid Deep Hierarchies
Keep branch hierarchies shallow (≤5 levels) for optimal read performance:
✓ Good: main -> dev -> feature✗ Avoid: main -> dev -> team -> user -> feature -> sub-feature4. Regular Cleanup
Drop merged or abandoned branches:
// Get all brancheslet branches = engine.list_branches()?;
// Drop inactive branchesfor branch in branches { if should_cleanup(&branch) { engine.drop_branch(&branch.name, false)?; }}Limitations
Current Limitations
- No Merge Support: Merge functionality is not yet implemented
- No Garbage Collection: Dropped branch data is marked but not yet cleaned up
- No Branch Permissions: All branches have the same access level
- No Distributed Branching: Branches are local to a single node
Cannot Drop Rules
- Main Branch: Cannot drop the main (root) branch
- Parent with Children: Cannot drop a branch that has child branches
// This will fail - main cannot be droppedengine.drop_branch("main", false)?; // Error
// This will fail - dev has child 'feature'engine.create_branch("dev", Some("main"), options)?;engine.create_branch("feature", Some("dev"), options)?;engine.drop_branch("dev", false)?; // Error: has childrenTroubleshooting
Branch Not Found
match engine.get_branch("unknown") { Ok(branch) => println!("Found: {}", branch.name), Err(e) => println!("Error: {}", e), // "Branch 'unknown' not found"}Cannot Drop Branch
// Check if branch has childrenlet branch = engine.get_branch("dev")?;// Manual check via metadata would be needed
// Or use if_exists flagengine.drop_branch("dev", true)?; // No error if not existsRead Performance Issues
If reads are slow on a branch:
- Check hierarchy depth: Deep hierarchies cause multiple lookups
- Verify parent chain: Each parent adds ~0.1ms overhead
- Consider flattening: Recreate branch from main if too deep
Implementation Details
Key Components
- BranchManager: Manages branch metadata and lifecycle
- BranchTransaction: Branch-aware MVCC transactions
- BranchRegistry: Global branch ID registry
- Parent Chain Cache: Cached parent relationships for fast lookups
Thread Safety
All branch operations are thread-safe:
use std::sync::Arc;use std::thread;
let engine = Arc::new(StorageEngine::open_in_memory(&config)?);
// Safe concurrent accesslet handles: Vec<_> = (0..10).map(|i| { let engine = Arc::clone(&engine); thread::spawn(move || { let mut tx = engine.begin_branch_transaction("dev").unwrap(); tx.put(format!("key{}", i).into_bytes(), b"value".to_vec()).unwrap(); tx.commit().unwrap(); })}).collect();
for handle in handles { handle.join().unwrap();}Future Enhancements
Planned Features
- Branch Merging: Three-way merge with conflict detection
- Garbage Collection: Automatic cleanup of dropped branch data
- Branch Snapshots: Create lightweight snapshots within branches
- Branch Permissions: Fine-grained access control per branch
- Distributed Branching: Cross-region branch replication
- Branch Triggers: Execute code on branch events (create, merge, drop)
SQL Integration (Future)
-- Create branchCREATE DATABASE BRANCH dev FROM main AS OF NOW;
-- Switch to branchSET branch = dev;
-- Merge branchMERGE DATABASE BRANCH dev INTO mainWITH ( conflict_resolution = 'source_wins', delete_branch_after = true);
-- Drop branchDROP DATABASE BRANCH dev;
-- List branchesSELECT * FROM pg_database_branches();See Also
- [Branch Storage Architecture](/home/claude/HeliosDB Nano/docs/architecture/BRANCH_STORAGE_ARCHITECTURE.md)
- MVCC Implementation Guide
- Time Travel Queries
- Storage Engine API
Conclusion
Branch storage in HeliosDB Nano provides a powerful, efficient way to manage database variants with minimal overhead. The copy-on-write architecture ensures instant branch creation while maintaining strong isolation guarantees and excellent performance.
For questions or issues, please refer to the architecture document or open an issue on GitHub.