Skip to content

Error Handling Best Practices for HeliosDB

Error Handling Best Practices for HeliosDB

Version: 1.0 Date: November 9, 2025 Status: Production Guidelines Audience: All HeliosDB developers


Purpose

This document establishes error handling best practices for HeliosDB to ensure production-ready code quality, eliminate panics, and provide excellent error diagnostics.

Key Principle: Production code MUST NEVER panic. All error conditions must be handled gracefully with descriptive error messages.


📋 Quick Reference

The Golden Rules

  1. NEVER use .unwrap() in production code
  2. NEVER use .expect() in production code
  3. Always propagate errors with ? operator
  4. Provide descriptive error messages
  5. Use Result<T, E> for fallible operations
  6. Use Option for optional values
  7. Test code CAN use .unwrap() for simplicity
  8. Document why unsafe is safe (when unavoidable)

🚫 Anti-Patterns (DO NOT DO THIS)

Anti-Pattern 1: unwrap() in Production Code

Problem: Causes panic and crashes the database

// ❌ BAD: Will panic if sorted_entries is empty
fn create_sstable(sorted_entries: Vec<Entry>) -> SSTable {
let min_key = sorted_entries.first().unwrap().key.clone();
let max_key = sorted_entries.last().unwrap().key.clone();
SSTable {
min_key,
max_key,
entries: sorted_entries,
}
}

Why It’s Bad:

  • Panics on empty input (data loss)
  • No error context (hard to debug)
  • Cannot recover (database crash)
  • Production risk: CRITICAL

Real Impact: SSTable creation failure → Database corruption


Anti-Pattern 2: expect() with Generic Messages

// ❌ BAD: Generic error message, still panics
fn get_latest_timestamp() -> u64 {
SystemTime::now()
.duration_since(UNIX_EPOCH)
.expect("Time went backwards") // Unhelpful in logs
.as_millis() as u64
}

Why It’s Bad:

  • Still panics (same as unwrap)
  • Generic message doesn’t help debugging
  • No error propagation
  • Cannot handle NTP sync, clock adjustments

Real Impact: XA transaction coordination failure → Data inconsistency


Anti-Pattern 3: Nested unwrap() Chains

// ❌ VERY BAD: Multiple panic points, hard to debug
fn process_data(data: &HashMap<String, Vec<Value>>) -> Value {
data.get("values")
.unwrap()
.first()
.unwrap()
.clone()
}

Why It’s Bad:

  • 3 panic points in 4 lines
  • Which unwrap failed? Unknown
  • No error context
  • Impossible to recover

Anti-Pattern 4: Silent Error Swallowing

// ❌ BAD: Errors are hidden, causes silent failures
fn load_config(path: &Path) -> Config {
match File::open(path) {
Ok(file) => parse_config(file),
Err(_) => Config::default(), // Error lost!
}
}

Why It’s Bad:

  • Permission errors hidden
  • File not found hidden
  • Corrupt file hidden
  • Wrong behavior, no diagnostics

Anti-Pattern 5: Generic Error Types

// ❌ BAD: Generic error loses context
fn read_sstable(id: u64) -> Result<SSTable, String> {
let path = format!("data/{}.sst", id);
let data = std::fs::read(path)
.map_err(|e| e.to_string())?; // Context lost!
deserialize(&data)
.map_err(|e| e.to_string())?
}

Why It’s Bad:

  • Cannot distinguish IO vs. deserialization errors
  • Cannot implement retries
  • Poor error reporting
  • Hard to debug

Best Practices (DO THIS INSTEAD)

Best Practice 1: Proper Result Handling

// GOOD: Proper error handling with context
use crate::error::HeliosError;
fn create_sstable(sorted_entries: Vec<Entry>) -> Result<SSTable, HeliosError> {
if sorted_entries.is_empty() {
return Err(HeliosError::Storage(
"Cannot create SSTable from empty entries".to_string()
));
}
let min_key = sorted_entries
.first()
.ok_or_else(|| HeliosError::Storage(
"Empty sorted entries after validation".to_string()
))?
.key.clone();
let max_key = sorted_entries
.last()
.ok_or_else(|| HeliosError::Storage(
"Empty sorted entries after validation".to_string()
))?
.key.clone();
Ok(SSTable {
min_key,
max_key,
entries: sorted_entries,
})
}

Why It’s Good:

  • No panics possible
  • Descriptive error messages
  • Early validation
  • Errors propagate with ?
  • Caller can handle or propagate
  • Logs show exact error

Best Practice 2: Helper Functions for Common Patterns

// GOOD: Helper function encapsulates error handling
/// Get current timestamp in milliseconds since UNIX_EPOCH
///
/// Safe timestamp generation that handles edge cases:
/// - System clock adjustments
/// - NTP sync
/// - Clock going backwards
/// - Virtualization time skew
///
/// Returns 0 if SystemTime fails (extremely rare).
#[inline]
fn current_timestamp_millis() -> u64 {
SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or(Duration::from_secs(0)) // Safe fallback
.as_millis() as u64
}
// Usage is simple and safe
fn create_transaction() -> Transaction {
Transaction {
id: generate_id(),
timestamp: current_timestamp_millis(), // Never panics
data: vec![],
}
}

Why It’s Good:

  • Centralizes error handling logic
  • Documented edge cases
  • Safe fallback (0 timestamp is detectable)
  • Reusable across codebase
  • Inline for performance
  • Never panics

When to Use: Common patterns (SystemTime, parsing, conversions)


Best Practice 3: Early Validation

// GOOD: Validate inputs early
fn process_batch(items: Vec<Item>) -> Result<Vec<ProcessedItem>, HeliosError> {
// Validate inputs first
if items.is_empty() {
return Err(HeliosError::InvalidInput(
"Batch cannot be empty".to_string()
));
}
if items.len() > MAX_BATCH_SIZE {
return Err(HeliosError::InvalidInput(
format!("Batch size {} exceeds maximum {}", items.len(), MAX_BATCH_SIZE)
));
}
// Process with confidence (inputs validated)
let mut results = Vec::with_capacity(items.len());
for item in items {
results.push(process_item(item)?);
}
Ok(results)
}

Why It’s Good:

  • Fail fast on invalid input
  • Clear error messages
  • No partial processing
  • Easy to test
  • Performance: validate once

Best Practice 4: Descriptive Error Types

// GOOD: Structured error types with context
#[derive(Debug, Clone)]
pub enum SSTableError {
EmptyEntries,
InvalidRange { min: Vec<u8>, max: Vec<u8> },
IOError { path: PathBuf, source: String },
CorruptedData { offset: u64, expected: u32, found: u32 },
}
impl std::fmt::Display for SSTableError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
SSTableError::EmptyEntries => {
write!(f, "Cannot create SSTable from empty entries")
}
SSTableError::InvalidRange { min, max } => {
write!(f, "Invalid key range: min={:?} > max={:?}", min, max)
}
SSTableError::IOError { path, source } => {
write!(f, "IO error reading {}: {}", path.display(), source)
}
SSTableError::CorruptedData { offset, expected, found } => {
write!(f, "Corrupted data at offset {}: expected checksum {}, found {}",
offset, expected, found)
}
}
}
}
impl std::error::Error for SSTableError {}
// Usage
fn read_sstable(path: &Path) -> Result<SSTable, SSTableError> {
let data = std::fs::read(path)
.map_err(|e| SSTableError::IOError {
path: path.to_path_buf(),
source: e.to_string(),
})?;
// ... deserialize with proper error handling
Ok(sstable)
}

Why It’s Good:

  • Type-safe error handling
  • Structured error data
  • Easy to match on error type
  • Excellent error messages
  • Enables retries based on error type
  • Good logging

Best Practice 5: Error Context Propagation

// GOOD: Errors carry full context up the stack
use anyhow::{Context, Result}; // Or use custom error chaining
fn load_sstable_file(id: u64) -> Result<SSTable> {
let path = get_sstable_path(id)?;
let data = std::fs::read(&path)
.context(format!("Failed to read SSTable file: {}", path.display()))?;
let sstable = deserialize_sstable(&data)
.context(format!("Failed to deserialize SSTable {}", id))?;
validate_sstable(&sstable)
.context(format!("SSTable {} failed validation", id))?;
Ok(sstable)
}
// Error output example:
// Error: Failed to load SSTable 12345
// Caused by:
// 0: Failed to read SSTable file: /data/sstables/12345.sst
// 1: No such file or directory (os error 2)

Why It’s Good:

  • Full error chain visible
  • Easy to debug
  • Context at each layer
  • Root cause preserved
  • Great for logs

Best Practice 6: Option Handling

// GOOD: Proper Option handling
fn get_user_by_id(id: u64) -> Result<User, HeliosError> {
let users = get_user_cache()?;
users.get(&id)
.cloned()
.ok_or_else(|| HeliosError::NotFound(
format!("User {} not found in cache", id)
))
}
// Alternative: Return Option when "not found" is valid
fn get_cached_value(key: &str) -> Option<Value> {
let cache = CACHE.lock().unwrap(); // Lock unwrap is OK (poison)
cache.get(key).cloned()
}
// Caller decides how to handle None
match get_cached_value("key") {
Some(value) => use_value(value),
None => load_from_disk("key")?, // Fallback
}

Why It’s Good:

  • Clear semantics (Option vs Result)
  • Descriptive errors for Result
  • None is valid for Option
  • Caller flexibility

Common Patterns & Solutions

Pattern 1: Array/Collection Access

// ❌ BAD
let first = collection.first().unwrap();
let last = collection.last().unwrap();
// GOOD: Validate first
if collection.is_empty() {
return Err(HeliosError::EmptyCollection);
}
let first = &collection[0];
let last = &collection[collection.len() - 1];
// ALSO GOOD: Propagate Option
let first = collection.first()
.ok_or_else(|| HeliosError::EmptyCollection)?;
let last = collection.last()
.ok_or_else(|| HeliosError::EmptyCollection)?;

Pattern 2: SystemTime Operations

// ❌ BAD
let duration = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap();
// GOOD: Helper function with safe fallback
fn current_timestamp_millis() -> u64 {
SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or(Duration::from_secs(0))
.as_millis() as u64
}
// ALSO GOOD: Return Result if precision matters
fn precise_timestamp() -> Result<u64, HeliosError> {
SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|d| d.as_millis() as u64)
.map_err(|e| HeliosError::SystemTime(e.to_string()))
}

Pattern 3: Deque/VecDeque Access

// ❌ BAD
let front = deque.front().unwrap();
let back = deque.back().unwrap();
// GOOD
let front = deque.front()
.ok_or_else(|| HeliosError::Storage("Deque unexpectedly empty".to_string()))?;
let back = deque.back()
.ok_or_else(|| HeliosError::Storage("Deque unexpectedly empty".to_string()))?;

Pattern 4: Parsing Strings/Numbers

// ❌ BAD
let port: u16 = port_str.parse().unwrap();
// GOOD
let port: u16 = port_str.parse()
.map_err(|e| HeliosError::InvalidConfig(
format!("Invalid port '{}': {}", port_str, e)
))?;
// ALSO GOOD: With default
let port: u16 = port_str.parse().unwrap_or(5432);

Pattern 5: HashMap/BTreeMap Get

// ❌ BAD
let value = map.get(&key).unwrap();
// GOOD: When key MUST exist
let value = map.get(&key)
.ok_or_else(|| HeliosError::InvalidState(
format!("Required key '{}' not found in map", key)
))?;
// ALSO GOOD: When key might not exist
if let Some(value) = map.get(&key) {
process(value);
} else {
use_default();
}

Pattern 6: Channel Operations

// ❌ BAD
sender.send(msg).unwrap();
let msg = receiver.recv().unwrap();
// GOOD
sender.send(msg)
.map_err(|e| HeliosError::ChannelClosed(
format!("Failed to send message: {}", e)
))?;
let msg = receiver.recv()
.map_err(|e| HeliosError::ChannelClosed(
format!("Failed to receive message: {}", e)
))?;

Pattern 7: Mutex/RwLock Poisoning

// ⚠ SPECIAL CASE: Lock poisoning
// Mutex/RwLock unwrap() is acceptable because:
// 1. Poison means panic happened while locked
// 2. Data may be corrupted, cannot safely continue
// 3. Unwrap propagates panic (correct behavior)
// ACCEPTABLE in most cases
let guard = mutex.lock().unwrap();
// BETTER: Handle poison if recovery possible
let guard = mutex.lock()
.unwrap_or_else(|poisoned| {
error!("Mutex poisoned, data may be corrupted");
poisoned.into_inner() // Use data anyway (risky!)
});
// BEST: Avoid shared mutable state
// Use message passing (channels) instead of locks

Special Cases

When unwrap() IS Acceptable

1. Test Code

#[cfg(test)]
mod tests {
#[test]
fn test_sstable_creation() {
let entries = vec![entry1, entry2];
let sstable = create_sstable(entries).unwrap(); // OK in tests
assert_eq!(sstable.entries.len(), 2);
}
}

2. Static/Compile-Time Validated Data

// OK: Regex is valid at compile time
static EMAIL_REGEX: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
.unwrap() // Panic at startup if regex invalid (correct)
});

3. Mutex/RwLock Poison (See Pattern 7)

4. Initialization (Once Cell, Lazy Static)

// OK: Initialize once, panic if fails
static CONFIG: OnceCell<Config> = OnceCell::new();
fn init_config(path: &Path) {
let config = load_config(path).unwrap(); // Panic on startup if config invalid
CONFIG.set(config).unwrap();
}

When to Use expect() vs unwrap()

General Rule: Prefer neither. Use ? or explicit error handling.

If You Must:

// Slightly better: expect() with explanation
let config = load_config(path)
.expect("Config file must be valid at startup");
// Same as:
let config = load_config(path).unwrap();

Verdict: expect() is marginally better than unwrap() (message in panic), but both should be avoided in production code.


🧪 Testing Error Handling

Test That Errors Are Returned

#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_create_sstable_empty_entries() {
let result = create_sstable(vec![]);
assert!(result.is_err());
let err = result.unwrap_err();
assert!(matches!(err, HeliosError::Storage(_)));
}
#[test]
fn test_create_sstable_valid() {
let entries = vec![
Entry::new(b"key1".to_vec(), b"value1".to_vec()),
Entry::new(b"key2".to_vec(), b"value2".to_vec()),
];
let result = create_sstable(entries);
assert!(result.is_ok());
let sstable = result.unwrap();
assert_eq!(sstable.entries.len(), 2);
}
#[test]
fn test_error_message_quality() {
let result = create_sstable(vec![]);
let err_msg = format!("{}", result.unwrap_err());
// Error messages should be descriptive
assert!(err_msg.contains("empty"));
assert!(err_msg.contains("SSTable") || err_msg.contains("entries"));
}
}

Refactoring Guidelines

Step 1: Identify unwrap() Calls

Terminal window
# Find all unwrap() in production code (exclude tests)
grep -r "\.unwrap()" crate-name/src/ --exclude-dir=tests
# Priority order:
# 1. CRITICAL: In storage/transaction/consensus paths
# 2. HIGH: In frequently executed paths
# 3. MEDIUM: In utility functions
# 4. LOW: In rarely executed paths

Step 2: Categorize Each unwrap()

For each unwrap(), ask:

  1. Can this fail? (Yes = must fix, No = consider expect())
  2. How often is it called? (Frequent = higher priority)
  3. What happens if it panics? (Data loss = CRITICAL)
  4. Is there a better pattern? (Helper function? Early validation?)

Step 3: Apply Appropriate Fix

Use the patterns in this document to fix each unwrap().

Step 4: Update Tests

Ensure tests cover both success and error cases.

Step 5: Verify

Terminal window
# Compile
cargo check -p crate-name
# Test
cargo test -p crate-name
# Clippy (optional: enforce no unwrap)
cargo clippy -p crate-name -- -D clippy::unwrap_used

🎓 Learning Resources

Internal Resources

  • Security Audit Report: docs/SECURITY_AUDIT_REPORT.md
  • Security Remediation Plan: docs/SECURITY_REMEDIATION_PLAN.md
  • Day 1 Completion Report: docs/SECURITY_FIX_DAY1_COMPLETE.md

Example Files (Good Error Handling)

  • heliosdb-security/src/ (Grade: 8.5/10)
    • Zero unwrap() in production code
    • Model for other crates

External Resources


Checklist for Code Review

For Authors

Before submitting code, verify:

  • Zero unwrap() in production code (except special cases)
  • Zero expect() in production code
  • All Result types propagated with ?
  • Error messages are descriptive
  • Edge cases validated (empty collections, None, etc.)
  • Tests cover error cases
  • Documentation explains error conditions

For Reviewers

Check for:

  • No unwrap() or expect() in production paths
  • Proper Result<T, E> usage
  • Descriptive error types (not String)
  • Error context preserved
  • Edge cases handled
  • Tests for error paths
  • unsafe blocks documented (if any)

Quick Migration Guide

Before (Unsafe)

fn process(data: &[u8]) -> Vec<u8> {
let first = data.first().unwrap();
let last = data.last().unwrap();
let result = compute(*first, *last).unwrap();
result.to_vec()
}

After (Safe)

fn process(data: &[u8]) -> Result<Vec<u8>, HeliosError> {
if data.is_empty() {
return Err(HeliosError::InvalidInput("Data cannot be empty".to_string()));
}
let first = data[0];
let last = data[data.len() - 1];
let result = compute(first, last)
.map_err(|e| HeliosError::Computation(e.to_string()))?;
Ok(result.to_vec())
}

Changes

  1. Return Result instead of plain type
  2. Early validation (empty check)
  3. Index access (safe after validation)
  4. Error propagation with ?
  5. Descriptive error messages

Tips & Tricks

Tip 1: Use Clippy Lints

Cargo.toml
[lints.clippy]
unwrap_used = "deny"
expect_used = "warn"
panic = "deny"

Tip 2: Pre-commit Hook

.git/hooks/pre-commit
#!/bin/bash
# Deny unwrap() in production code
if git diff --cached --name-only | grep -E "src/.*\.rs$" | xargs grep -l "\.unwrap()" ; then
echo "ERROR: unwrap() found in production code!"
echo "Please use proper error handling."
exit 1
fi

Tip 3: IDE Configuration

Configure your IDE to highlight unwrap() calls:

  • VS Code: Rust Analyzer → Diagnostics → Clippy
  • IntelliJ IDEA: Rust Plugin → Inspections → Enable clippy

Tip 4: Error Message Template

Use this template for error messages:

"<What failed>: <Why it failed> [<Context>]"
Examples:
"Failed to create SSTable: empty entries"
"Failed to read file /data/sstable.db: Permission denied"
"Invalid port '99999': number too large"

📚 Summary

Key Takeaways

  1. Never unwrap() in production - Use Result and ? operator
  2. Validate early - Check inputs before processing
  3. Descriptive errors - Help debugging with good messages
  4. Helper functions - Centralize common error handling patterns
  5. Test error paths - Ensure errors are handled correctly
  6. Code review - Catch unwrap() before merge

Production-Ready Error Handling

Before: Code with unwrap() = 🔴 Production risk After: Code with Result + ? = Production ready

Questions?

Contact: security-team@heliosdb.com


Document Version: 1.0 Last Updated: November 9, 2025 Status: Active Guidelines Next Review: December 9, 2025