Skip to content

Transaction Isolation Levels in HeliosDB

Transaction Isolation Levels in HeliosDB

This document describes the transaction isolation levels implemented in HeliosDB and their guarantees.

Overview

HeliosDB implements distributed transactions using the Two-Phase Commit (2PC) protocol with four standard SQL isolation levels. Each level provides different trade-offs between consistency guarantees and performance.

Isolation Levels

1. Read Uncommitted

Level: Lowest isolation, highest performance

Guarantees:

  • No read locks acquired
  • Can read uncommitted (dirty) data from other transactions
  • Minimal overhead

Anomalies Allowed:

  • Dirty reads: Reading uncommitted changes from other transactions
  • Non-repeatable reads: Same query returns different results within transaction
  • Phantom reads: New rows appear in range queries

Use Cases:

  • Analytics queries where approximate results are acceptable
  • Read-only queries on rapidly changing data
  • Reporting where data consistency is not critical

Example:

let tx_id = coordinator
.begin_transaction(IsolationLevel::ReadUncommitted, 5000)
.await?;

2. Read Committed

Level: Default isolation level

Guarantees:

  • Only reads committed data
  • No dirty reads
  • Write locks held until transaction completes
  • Read locks released immediately after read

Anomalies Allowed:

  • Non-repeatable reads: Reading same row twice may return different values
  • Phantom reads: Range queries may return different row sets

Anomalies Prevented:

  • Dirty reads: Cannot read uncommitted changes

Use Cases:

  • Most general-purpose OLTP workloads
  • Web applications where strong consistency isn’t required
  • Default for most database operations

Example:

let tx_id = coordinator
.begin_transaction(IsolationLevel::ReadCommitted, 5000)
.await?;
// Can read committed data, but subsequent reads may see different values

3. Repeatable Read

Level: Strong isolation

Guarantees:

  • All reads within transaction see consistent snapshot
  • Read locks held until transaction completes
  • Write locks held until transaction completes
  • Same row read multiple times returns same value

Anomalies Allowed:

  • Phantom reads: New rows may appear in range queries

Anomalies Prevented:

  • Dirty reads
  • Non-repeatable reads

Use Cases:

  • Financial transactions requiring consistency
  • Multi-step operations that read same data multiple times
  • Report generation requiring consistent view

Example:

let tx_id = coordinator
.begin_transaction(IsolationLevel::RepeatableRead, 5000)
.await?;
// Reading same key multiple times returns same value
let value1 = read_key("key1").await?;
// ... other operations ...
let value2 = read_key("key1").await?;
// value1 == value2 guaranteed

4. Serializable

Level: Highest isolation, strictest consistency

Guarantees:

  • Full isolation - transactions appear to execute serially
  • Predicate locks prevent phantom reads
  • Read and write locks held until transaction completes
  • No anomalies possible

Anomalies Allowed:

  • None

Anomalies Prevented:

  • Dirty reads
  • Non-repeatable reads
  • Phantom reads

Use Cases:

  • Critical financial transactions
  • Inventory management systems
  • Any scenario requiring absolute consistency

Example:

let tx_id = coordinator
.begin_transaction(IsolationLevel::Serializable, 5000)
.await?;
// Complete isolation - transaction appears atomic to outside observers

Lock Acquisition by Isolation Level

Isolation LevelRead LocksWrite LocksPredicate Locks
Read UncommittedNoYesNo
Read CommittedNo (immediate release)YesNo
Repeatable ReadYesYesNo
SerializableYesYesYes

Two-Phase Commit Protocol

HeliosDB uses 2PC for distributed transactions across multiple storage nodes:

Phase 1: Prepare

  1. Coordinator sends PREPARE request to all participants
  2. Each participant:
    • Validates operations
    • Acquires necessary locks
    • Creates undo log entries
    • Writes prepare record to log
    • Votes YES or NO
  3. Coordinator waits for all votes

Phase 2: Commit/Abort

If all participants vote YES:

  1. Coordinator writes COMMIT to log
  2. Coordinator sends COMMIT to all participants
  3. Participants apply changes and release locks
  4. Participants write COMMIT to log

If any participant votes NO:

  1. Coordinator writes ABORT to log
  2. Coordinator sends ABORT to all participants
  3. Participants rollback using undo log
  4. Participants release locks
  5. Participants write ABORT to log

Deadlock Detection

HeliosDB implements distributed deadlock detection using a wait-for graph:

  • Transactions waiting for locks create edges in the graph
  • Cycles in the graph indicate deadlocks
  • When detected, one transaction is aborted to break the cycle
  • Detection runs periodically on the coordinator

Example:

T1 holds lock on A, waits for B
T2 holds lock on B, waits for A
→ Deadlock detected, T1 or T2 aborted

Transaction Timeout

All transactions have configurable timeouts:

  • Default: 30 seconds
  • Prevents indefinite waits
  • Automatically aborts timed-out transactions
  • Releases all held locks

Example:

// 5 second timeout
let tx_id = coordinator
.begin_transaction(IsolationLevel::ReadCommitted, 5000)
.await?;

Crash Recovery

HeliosDB provides crash recovery for transactions:

Coordinator Recovery

On restart, the coordinator:

  1. Reads transaction log
  2. Identifies in-doubt transactions (in PREPARE/COMMIT/ABORT state)
  3. Queries participants for their state
  4. Completes or aborts uncertain transactions

Participant Recovery

On restart, each participant:

  1. Reads local transaction log
  2. Identifies prepared transactions without decision
  3. Queries coordinator for decision
  4. Completes or aborts based on coordinator response
  5. If coordinator unavailable, aborts for safety

Performance Characteristics

Throughput by Isolation Level

Based on benchmarks with 100 concurrent transactions:

Isolation LevelThroughputAvg Latency
Read Uncommitted~5000 tx/sec0.2 ms
Read Committed~3000 tx/sec0.33 ms
Repeatable Read~1500 tx/sec0.67 ms
Serializable~800 tx/sec1.25 ms

Note: Performance depends on contention level and workload characteristics

Lock Contention

Lock contention increases with isolation level:

  • Read Uncommitted: No read contention
  • Read Committed: Minimal contention (short-lived read locks)
  • Repeatable Read: Moderate contention (long-lived read locks)
  • Serializable: High contention (predicate locks prevent concurrency)

Best Practices

1. Choose Appropriate Isolation Level

  • Use Read Committed as default
  • Use Serializable only when necessary
  • Consider Read Uncommitted for analytics

2. Keep Transactions Short

  • Minimize lock hold time
  • Reduce deadlock probability
  • Improve throughput

3. Order Lock Acquisition

  • Always acquire locks in same order
  • Reduces deadlock probability
  • Consider using table ID + key ordering

4. Handle Conflicts Gracefully

match coordinator.commit(tx_id).await {
Ok(_) => println!("Transaction committed"),
Err(HeliosError::LockConflict(_)) => {
// Retry with backoff
tokio::time::sleep(Duration::from_millis(100)).await;
retry_transaction().await?;
}
Err(HeliosError::Deadlock(cycle)) => {
// Log deadlock and retry
warn!("Deadlock detected: {:?}", cycle);
retry_transaction().await?;
}
Err(e) => return Err(e),
}

5. Set Appropriate Timeouts

  • Short timeouts for interactive workloads (1-5 seconds)
  • Longer timeouts for batch operations (30-60 seconds)
  • Consider query complexity when setting timeout

API Examples

Basic Transaction

use heliosdb_compute::{TransactionCoordinator, IsolationLevel, TransactionOperation};
use bytes::Bytes;
let coordinator = TransactionCoordinator::new();
// Begin transaction
let tx_id = coordinator
.begin_transaction(IsolationLevel::ReadCommitted, 5000)
.await?;
// Execute operations
coordinator.execute_operation(
tx_id,
TransactionOperation::Write {
table_id: 1,
key: Bytes::from("key1"),
value: Bytes::from("value1"),
}
).await?;
// Commit
coordinator.commit(tx_id).await?;

With Query Executor

use heliosdb_compute::{TransactionalExecutor, IsolationLevel};
use std::sync::Arc;
let coordinator = Arc::new(TransactionCoordinator::new());
let executor = TransactionalExecutor::new(
"127.0.0.1:5555".to_string(),
10,
coordinator,
);
let session_id = 1;
// Begin transaction
executor.begin_transaction(
session_id,
IsolationLevel::RepeatableRead,
5000
).await?;
// Execute queries
let results = executor.execute_in_transaction(session_id, query_plan).await?;
// Commit
executor.commit_transaction(session_id).await?;

Handling Failures

// Prepare for potential rollback
let tx_id = coordinator.begin_transaction(
IsolationLevel::Serializable,
5000
).await?;
let result = async {
coordinator.execute_operation(tx_id, op1).await?;
coordinator.execute_operation(tx_id, op2).await?;
Ok(())
}.await;
match result {
Ok(_) => coordinator.commit(tx_id).await?,
Err(e) => {
coordinator.abort(tx_id).await?;
return Err(e);
}
}

Implementation Details

Locking Protocol

  • Two-phase locking (2PL) with strict mode
  • Growing phase: Acquire locks as needed
  • Shrinking phase: Release all locks at commit/abort
  • No lock releases until transaction completes

Write-Ahead Logging

All transaction state changes are logged before execution:

  • BEGIN record
  • OPERATION records
  • PREPARE record
  • COMMIT/ABORT record

Logs are persisted to disk for crash recovery.

Lock Management

Each participant maintains:

  • Write lock table: resource → transaction ID
  • Read lock table: resource → set of transaction IDs
  • Lock wait queue: resource → queue of waiting transactions

Undo Logging

For rollback capability, participants maintain undo logs:

  • Before images of modified data
  • Operation type (INSERT/UPDATE/DELETE)
  • Used during abort to restore previous state

Future Enhancements

  1. Optimistic Concurrency Control: Alternative to locking for read-heavy workloads
  2. Snapshot Isolation: Alternative isolation level with better concurrency
  3. Distributed Deadlock Detection: Cross-coordinator deadlock detection
  4. Transaction Priority: High-priority transactions can preempt low-priority ones
  5. Adaptive Timeout: Automatically adjust timeouts based on workload