Transaction Isolation Levels in HeliosDB
Transaction Isolation Levels in HeliosDB
This document describes the transaction isolation levels implemented in HeliosDB and their guarantees.
Overview
HeliosDB implements distributed transactions using the Two-Phase Commit (2PC) protocol with four standard SQL isolation levels. Each level provides different trade-offs between consistency guarantees and performance.
Isolation Levels
1. Read Uncommitted
Level: Lowest isolation, highest performance
Guarantees:
- No read locks acquired
- Can read uncommitted (dirty) data from other transactions
- Minimal overhead
Anomalies Allowed:
- Dirty reads: Reading uncommitted changes from other transactions
- Non-repeatable reads: Same query returns different results within transaction
- Phantom reads: New rows appear in range queries
Use Cases:
- Analytics queries where approximate results are acceptable
- Read-only queries on rapidly changing data
- Reporting where data consistency is not critical
Example:
let tx_id = coordinator .begin_transaction(IsolationLevel::ReadUncommitted, 5000) .await?;2. Read Committed
Level: Default isolation level
Guarantees:
- Only reads committed data
- No dirty reads
- Write locks held until transaction completes
- Read locks released immediately after read
Anomalies Allowed:
- Non-repeatable reads: Reading same row twice may return different values
- Phantom reads: Range queries may return different row sets
Anomalies Prevented:
- Dirty reads: Cannot read uncommitted changes
Use Cases:
- Most general-purpose OLTP workloads
- Web applications where strong consistency isn’t required
- Default for most database operations
Example:
let tx_id = coordinator .begin_transaction(IsolationLevel::ReadCommitted, 5000) .await?;
// Can read committed data, but subsequent reads may see different values3. Repeatable Read
Level: Strong isolation
Guarantees:
- All reads within transaction see consistent snapshot
- Read locks held until transaction completes
- Write locks held until transaction completes
- Same row read multiple times returns same value
Anomalies Allowed:
- Phantom reads: New rows may appear in range queries
Anomalies Prevented:
- Dirty reads
- Non-repeatable reads
Use Cases:
- Financial transactions requiring consistency
- Multi-step operations that read same data multiple times
- Report generation requiring consistent view
Example:
let tx_id = coordinator .begin_transaction(IsolationLevel::RepeatableRead, 5000) .await?;
// Reading same key multiple times returns same valuelet value1 = read_key("key1").await?;// ... other operations ...let value2 = read_key("key1").await?;// value1 == value2 guaranteed4. Serializable
Level: Highest isolation, strictest consistency
Guarantees:
- Full isolation - transactions appear to execute serially
- Predicate locks prevent phantom reads
- Read and write locks held until transaction completes
- No anomalies possible
Anomalies Allowed:
- None
Anomalies Prevented:
- Dirty reads
- Non-repeatable reads
- Phantom reads
Use Cases:
- Critical financial transactions
- Inventory management systems
- Any scenario requiring absolute consistency
Example:
let tx_id = coordinator .begin_transaction(IsolationLevel::Serializable, 5000) .await?;
// Complete isolation - transaction appears atomic to outside observersLock Acquisition by Isolation Level
| Isolation Level | Read Locks | Write Locks | Predicate Locks |
|---|---|---|---|
| Read Uncommitted | No | Yes | No |
| Read Committed | No (immediate release) | Yes | No |
| Repeatable Read | Yes | Yes | No |
| Serializable | Yes | Yes | Yes |
Two-Phase Commit Protocol
HeliosDB uses 2PC for distributed transactions across multiple storage nodes:
Phase 1: Prepare
- Coordinator sends PREPARE request to all participants
- Each participant:
- Validates operations
- Acquires necessary locks
- Creates undo log entries
- Writes prepare record to log
- Votes YES or NO
- Coordinator waits for all votes
Phase 2: Commit/Abort
If all participants vote YES:
- Coordinator writes COMMIT to log
- Coordinator sends COMMIT to all participants
- Participants apply changes and release locks
- Participants write COMMIT to log
If any participant votes NO:
- Coordinator writes ABORT to log
- Coordinator sends ABORT to all participants
- Participants rollback using undo log
- Participants release locks
- Participants write ABORT to log
Deadlock Detection
HeliosDB implements distributed deadlock detection using a wait-for graph:
- Transactions waiting for locks create edges in the graph
- Cycles in the graph indicate deadlocks
- When detected, one transaction is aborted to break the cycle
- Detection runs periodically on the coordinator
Example:
T1 holds lock on A, waits for BT2 holds lock on B, waits for A→ Deadlock detected, T1 or T2 abortedTransaction Timeout
All transactions have configurable timeouts:
- Default: 30 seconds
- Prevents indefinite waits
- Automatically aborts timed-out transactions
- Releases all held locks
Example:
// 5 second timeoutlet tx_id = coordinator .begin_transaction(IsolationLevel::ReadCommitted, 5000) .await?;Crash Recovery
HeliosDB provides crash recovery for transactions:
Coordinator Recovery
On restart, the coordinator:
- Reads transaction log
- Identifies in-doubt transactions (in PREPARE/COMMIT/ABORT state)
- Queries participants for their state
- Completes or aborts uncertain transactions
Participant Recovery
On restart, each participant:
- Reads local transaction log
- Identifies prepared transactions without decision
- Queries coordinator for decision
- Completes or aborts based on coordinator response
- If coordinator unavailable, aborts for safety
Performance Characteristics
Throughput by Isolation Level
Based on benchmarks with 100 concurrent transactions:
| Isolation Level | Throughput | Avg Latency |
|---|---|---|
| Read Uncommitted | ~5000 tx/sec | 0.2 ms |
| Read Committed | ~3000 tx/sec | 0.33 ms |
| Repeatable Read | ~1500 tx/sec | 0.67 ms |
| Serializable | ~800 tx/sec | 1.25 ms |
Note: Performance depends on contention level and workload characteristics
Lock Contention
Lock contention increases with isolation level:
- Read Uncommitted: No read contention
- Read Committed: Minimal contention (short-lived read locks)
- Repeatable Read: Moderate contention (long-lived read locks)
- Serializable: High contention (predicate locks prevent concurrency)
Best Practices
1. Choose Appropriate Isolation Level
- Use Read Committed as default
- Use Serializable only when necessary
- Consider Read Uncommitted for analytics
2. Keep Transactions Short
- Minimize lock hold time
- Reduce deadlock probability
- Improve throughput
3. Order Lock Acquisition
- Always acquire locks in same order
- Reduces deadlock probability
- Consider using table ID + key ordering
4. Handle Conflicts Gracefully
match coordinator.commit(tx_id).await { Ok(_) => println!("Transaction committed"), Err(HeliosError::LockConflict(_)) => { // Retry with backoff tokio::time::sleep(Duration::from_millis(100)).await; retry_transaction().await?; } Err(HeliosError::Deadlock(cycle)) => { // Log deadlock and retry warn!("Deadlock detected: {:?}", cycle); retry_transaction().await?; } Err(e) => return Err(e),}5. Set Appropriate Timeouts
- Short timeouts for interactive workloads (1-5 seconds)
- Longer timeouts for batch operations (30-60 seconds)
- Consider query complexity when setting timeout
API Examples
Basic Transaction
use heliosdb_compute::{TransactionCoordinator, IsolationLevel, TransactionOperation};use bytes::Bytes;
let coordinator = TransactionCoordinator::new();
// Begin transactionlet tx_id = coordinator .begin_transaction(IsolationLevel::ReadCommitted, 5000) .await?;
// Execute operationscoordinator.execute_operation( tx_id, TransactionOperation::Write { table_id: 1, key: Bytes::from("key1"), value: Bytes::from("value1"), }).await?;
// Commitcoordinator.commit(tx_id).await?;With Query Executor
use heliosdb_compute::{TransactionalExecutor, IsolationLevel};use std::sync::Arc;
let coordinator = Arc::new(TransactionCoordinator::new());let executor = TransactionalExecutor::new( "127.0.0.1:5555".to_string(), 10, coordinator,);
let session_id = 1;
// Begin transactionexecutor.begin_transaction( session_id, IsolationLevel::RepeatableRead, 5000).await?;
// Execute querieslet results = executor.execute_in_transaction(session_id, query_plan).await?;
// Commitexecutor.commit_transaction(session_id).await?;Handling Failures
// Prepare for potential rollbacklet tx_id = coordinator.begin_transaction( IsolationLevel::Serializable, 5000).await?;
let result = async { coordinator.execute_operation(tx_id, op1).await?; coordinator.execute_operation(tx_id, op2).await?; Ok(())}.await;
match result { Ok(_) => coordinator.commit(tx_id).await?, Err(e) => { coordinator.abort(tx_id).await?; return Err(e); }}Implementation Details
Locking Protocol
- Two-phase locking (2PL) with strict mode
- Growing phase: Acquire locks as needed
- Shrinking phase: Release all locks at commit/abort
- No lock releases until transaction completes
Write-Ahead Logging
All transaction state changes are logged before execution:
- BEGIN record
- OPERATION records
- PREPARE record
- COMMIT/ABORT record
Logs are persisted to disk for crash recovery.
Lock Management
Each participant maintains:
- Write lock table: resource → transaction ID
- Read lock table: resource → set of transaction IDs
- Lock wait queue: resource → queue of waiting transactions
Undo Logging
For rollback capability, participants maintain undo logs:
- Before images of modified data
- Operation type (INSERT/UPDATE/DELETE)
- Used during abort to restore previous state
Future Enhancements
- Optimistic Concurrency Control: Alternative to locking for read-heavy workloads
- Snapshot Isolation: Alternative isolation level with better concurrency
- Distributed Deadlock Detection: Cross-coordinator deadlock detection
- Transaction Priority: High-priority transactions can preempt low-priority ones
- Adaptive Timeout: Automatically adjust timeouts based on workload