Error Handling Best Practices for HeliosDB
Error Handling Best Practices for HeliosDB
Version: 1.0 Date: November 9, 2025 Status: Production Guidelines Audience: All HeliosDB developers
Purpose
This document establishes error handling best practices for HeliosDB to ensure production-ready code quality, eliminate panics, and provide excellent error diagnostics.
Key Principle: Production code MUST NEVER panic. All error conditions must be handled gracefully with descriptive error messages.
📋 Quick Reference
The Golden Rules
- NEVER use
.unwrap()in production code - NEVER use
.expect()in production code - Always propagate errors with
?operator - Provide descriptive error messages
- Use Result<T, E> for fallible operations
- Use Option
for optional values - Test code CAN use
.unwrap()for simplicity - Document why unsafe is safe (when unavoidable)
🚫 Anti-Patterns (DO NOT DO THIS)
Anti-Pattern 1: unwrap() in Production Code
Problem: Causes panic and crashes the database
// ❌ BAD: Will panic if sorted_entries is emptyfn create_sstable(sorted_entries: Vec<Entry>) -> SSTable { let min_key = sorted_entries.first().unwrap().key.clone(); let max_key = sorted_entries.last().unwrap().key.clone();
SSTable { min_key, max_key, entries: sorted_entries, }}Why It’s Bad:
- Panics on empty input (data loss)
- No error context (hard to debug)
- Cannot recover (database crash)
- Production risk: CRITICAL
Real Impact: SSTable creation failure → Database corruption
Anti-Pattern 2: expect() with Generic Messages
// ❌ BAD: Generic error message, still panicsfn get_latest_timestamp() -> u64 { SystemTime::now() .duration_since(UNIX_EPOCH) .expect("Time went backwards") // Unhelpful in logs .as_millis() as u64}Why It’s Bad:
- Still panics (same as unwrap)
- Generic message doesn’t help debugging
- No error propagation
- Cannot handle NTP sync, clock adjustments
Real Impact: XA transaction coordination failure → Data inconsistency
Anti-Pattern 3: Nested unwrap() Chains
// ❌ VERY BAD: Multiple panic points, hard to debugfn process_data(data: &HashMap<String, Vec<Value>>) -> Value { data.get("values") .unwrap() .first() .unwrap() .clone()}Why It’s Bad:
- 3 panic points in 4 lines
- Which unwrap failed? Unknown
- No error context
- Impossible to recover
Anti-Pattern 4: Silent Error Swallowing
// ❌ BAD: Errors are hidden, causes silent failuresfn load_config(path: &Path) -> Config { match File::open(path) { Ok(file) => parse_config(file), Err(_) => Config::default(), // Error lost! }}Why It’s Bad:
- Permission errors hidden
- File not found hidden
- Corrupt file hidden
- Wrong behavior, no diagnostics
Anti-Pattern 5: Generic Error Types
// ❌ BAD: Generic error loses contextfn read_sstable(id: u64) -> Result<SSTable, String> { let path = format!("data/{}.sst", id); let data = std::fs::read(path) .map_err(|e| e.to_string())?; // Context lost!
deserialize(&data) .map_err(|e| e.to_string())?}Why It’s Bad:
- Cannot distinguish IO vs. deserialization errors
- Cannot implement retries
- Poor error reporting
- Hard to debug
Best Practices (DO THIS INSTEAD)
Best Practice 1: Proper Result Handling
// GOOD: Proper error handling with contextuse crate::error::HeliosError;
fn create_sstable(sorted_entries: Vec<Entry>) -> Result<SSTable, HeliosError> { if sorted_entries.is_empty() { return Err(HeliosError::Storage( "Cannot create SSTable from empty entries".to_string() )); }
let min_key = sorted_entries .first() .ok_or_else(|| HeliosError::Storage( "Empty sorted entries after validation".to_string() ))? .key.clone();
let max_key = sorted_entries .last() .ok_or_else(|| HeliosError::Storage( "Empty sorted entries after validation".to_string() ))? .key.clone();
Ok(SSTable { min_key, max_key, entries: sorted_entries, })}Why It’s Good:
- No panics possible
- Descriptive error messages
- Early validation
- Errors propagate with
? - Caller can handle or propagate
- Logs show exact error
Best Practice 2: Helper Functions for Common Patterns
// GOOD: Helper function encapsulates error handling/// Get current timestamp in milliseconds since UNIX_EPOCH////// Safe timestamp generation that handles edge cases:/// - System clock adjustments/// - NTP sync/// - Clock going backwards/// - Virtualization time skew////// Returns 0 if SystemTime fails (extremely rare).#[inline]fn current_timestamp_millis() -> u64 { SystemTime::now() .duration_since(UNIX_EPOCH) .unwrap_or(Duration::from_secs(0)) // Safe fallback .as_millis() as u64}
// Usage is simple and safefn create_transaction() -> Transaction { Transaction { id: generate_id(), timestamp: current_timestamp_millis(), // Never panics data: vec![], }}Why It’s Good:
- Centralizes error handling logic
- Documented edge cases
- Safe fallback (0 timestamp is detectable)
- Reusable across codebase
- Inline for performance
- Never panics
When to Use: Common patterns (SystemTime, parsing, conversions)
Best Practice 3: Early Validation
// GOOD: Validate inputs earlyfn process_batch(items: Vec<Item>) -> Result<Vec<ProcessedItem>, HeliosError> { // Validate inputs first if items.is_empty() { return Err(HeliosError::InvalidInput( "Batch cannot be empty".to_string() )); }
if items.len() > MAX_BATCH_SIZE { return Err(HeliosError::InvalidInput( format!("Batch size {} exceeds maximum {}", items.len(), MAX_BATCH_SIZE) )); }
// Process with confidence (inputs validated) let mut results = Vec::with_capacity(items.len()); for item in items { results.push(process_item(item)?); }
Ok(results)}Why It’s Good:
- Fail fast on invalid input
- Clear error messages
- No partial processing
- Easy to test
- Performance: validate once
Best Practice 4: Descriptive Error Types
// GOOD: Structured error types with context#[derive(Debug, Clone)]pub enum SSTableError { EmptyEntries, InvalidRange { min: Vec<u8>, max: Vec<u8> }, IOError { path: PathBuf, source: String }, CorruptedData { offset: u64, expected: u32, found: u32 },}
impl std::fmt::Display for SSTableError { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { SSTableError::EmptyEntries => { write!(f, "Cannot create SSTable from empty entries") } SSTableError::InvalidRange { min, max } => { write!(f, "Invalid key range: min={:?} > max={:?}", min, max) } SSTableError::IOError { path, source } => { write!(f, "IO error reading {}: {}", path.display(), source) } SSTableError::CorruptedData { offset, expected, found } => { write!(f, "Corrupted data at offset {}: expected checksum {}, found {}", offset, expected, found) } } }}
impl std::error::Error for SSTableError {}
// Usagefn read_sstable(path: &Path) -> Result<SSTable, SSTableError> { let data = std::fs::read(path) .map_err(|e| SSTableError::IOError { path: path.to_path_buf(), source: e.to_string(), })?;
// ... deserialize with proper error handling Ok(sstable)}Why It’s Good:
- Type-safe error handling
- Structured error data
- Easy to match on error type
- Excellent error messages
- Enables retries based on error type
- Good logging
Best Practice 5: Error Context Propagation
// GOOD: Errors carry full context up the stackuse anyhow::{Context, Result}; // Or use custom error chaining
fn load_sstable_file(id: u64) -> Result<SSTable> { let path = get_sstable_path(id)?;
let data = std::fs::read(&path) .context(format!("Failed to read SSTable file: {}", path.display()))?;
let sstable = deserialize_sstable(&data) .context(format!("Failed to deserialize SSTable {}", id))?;
validate_sstable(&sstable) .context(format!("SSTable {} failed validation", id))?;
Ok(sstable)}
// Error output example:// Error: Failed to load SSTable 12345// Caused by:// 0: Failed to read SSTable file: /data/sstables/12345.sst// 1: No such file or directory (os error 2)Why It’s Good:
- Full error chain visible
- Easy to debug
- Context at each layer
- Root cause preserved
- Great for logs
Best Practice 6: Option Handling
// GOOD: Proper Option handlingfn get_user_by_id(id: u64) -> Result<User, HeliosError> { let users = get_user_cache()?;
users.get(&id) .cloned() .ok_or_else(|| HeliosError::NotFound( format!("User {} not found in cache", id) ))}
// Alternative: Return Option when "not found" is validfn get_cached_value(key: &str) -> Option<Value> { let cache = CACHE.lock().unwrap(); // Lock unwrap is OK (poison) cache.get(key).cloned()}
// Caller decides how to handle Nonematch get_cached_value("key") { Some(value) => use_value(value), None => load_from_disk("key")?, // Fallback}Why It’s Good:
- Clear semantics (Option vs Result)
- Descriptive errors for Result
- None is valid for Option
- Caller flexibility
Common Patterns & Solutions
Pattern 1: Array/Collection Access
// ❌ BADlet first = collection.first().unwrap();let last = collection.last().unwrap();
// GOOD: Validate firstif collection.is_empty() { return Err(HeliosError::EmptyCollection);}let first = &collection[0];let last = &collection[collection.len() - 1];
// ALSO GOOD: Propagate Optionlet first = collection.first() .ok_or_else(|| HeliosError::EmptyCollection)?;let last = collection.last() .ok_or_else(|| HeliosError::EmptyCollection)?;Pattern 2: SystemTime Operations
// ❌ BADlet duration = SystemTime::now() .duration_since(UNIX_EPOCH) .unwrap();
// GOOD: Helper function with safe fallbackfn current_timestamp_millis() -> u64 { SystemTime::now() .duration_since(UNIX_EPOCH) .unwrap_or(Duration::from_secs(0)) .as_millis() as u64}
// ALSO GOOD: Return Result if precision mattersfn precise_timestamp() -> Result<u64, HeliosError> { SystemTime::now() .duration_since(UNIX_EPOCH) .map(|d| d.as_millis() as u64) .map_err(|e| HeliosError::SystemTime(e.to_string()))}Pattern 3: Deque/VecDeque Access
// ❌ BADlet front = deque.front().unwrap();let back = deque.back().unwrap();
// GOODlet front = deque.front() .ok_or_else(|| HeliosError::Storage("Deque unexpectedly empty".to_string()))?;let back = deque.back() .ok_or_else(|| HeliosError::Storage("Deque unexpectedly empty".to_string()))?;Pattern 4: Parsing Strings/Numbers
// ❌ BADlet port: u16 = port_str.parse().unwrap();
// GOODlet port: u16 = port_str.parse() .map_err(|e| HeliosError::InvalidConfig( format!("Invalid port '{}': {}", port_str, e) ))?;
// ALSO GOOD: With defaultlet port: u16 = port_str.parse().unwrap_or(5432);Pattern 5: HashMap/BTreeMap Get
// ❌ BADlet value = map.get(&key).unwrap();
// GOOD: When key MUST existlet value = map.get(&key) .ok_or_else(|| HeliosError::InvalidState( format!("Required key '{}' not found in map", key) ))?;
// ALSO GOOD: When key might not existif let Some(value) = map.get(&key) { process(value);} else { use_default();}Pattern 6: Channel Operations
// ❌ BADsender.send(msg).unwrap();let msg = receiver.recv().unwrap();
// GOODsender.send(msg) .map_err(|e| HeliosError::ChannelClosed( format!("Failed to send message: {}", e) ))?;
let msg = receiver.recv() .map_err(|e| HeliosError::ChannelClosed( format!("Failed to receive message: {}", e) ))?;Pattern 7: Mutex/RwLock Poisoning
// ⚠ SPECIAL CASE: Lock poisoning// Mutex/RwLock unwrap() is acceptable because:// 1. Poison means panic happened while locked// 2. Data may be corrupted, cannot safely continue// 3. Unwrap propagates panic (correct behavior)
// ACCEPTABLE in most caseslet guard = mutex.lock().unwrap();
// BETTER: Handle poison if recovery possiblelet guard = mutex.lock() .unwrap_or_else(|poisoned| { error!("Mutex poisoned, data may be corrupted"); poisoned.into_inner() // Use data anyway (risky!) });
// BEST: Avoid shared mutable state// Use message passing (channels) instead of locksSpecial Cases
When unwrap() IS Acceptable
1. Test Code
#[cfg(test)]mod tests { #[test] fn test_sstable_creation() { let entries = vec![entry1, entry2]; let sstable = create_sstable(entries).unwrap(); // OK in tests assert_eq!(sstable.entries.len(), 2); }}2. Static/Compile-Time Validated Data
// OK: Regex is valid at compile timestatic EMAIL_REGEX: Lazy<Regex> = Lazy::new(|| { Regex::new(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$") .unwrap() // Panic at startup if regex invalid (correct)});3. Mutex/RwLock Poison (See Pattern 7)
4. Initialization (Once Cell, Lazy Static)
// OK: Initialize once, panic if failsstatic CONFIG: OnceCell<Config> = OnceCell::new();
fn init_config(path: &Path) { let config = load_config(path).unwrap(); // Panic on startup if config invalid CONFIG.set(config).unwrap();}When to Use expect() vs unwrap()
General Rule: Prefer neither. Use ? or explicit error handling.
If You Must:
// Slightly better: expect() with explanationlet config = load_config(path) .expect("Config file must be valid at startup");
// Same as:let config = load_config(path).unwrap();Verdict: expect() is marginally better than unwrap() (message in panic), but both should be avoided in production code.
🧪 Testing Error Handling
Test That Errors Are Returned
#[cfg(test)]mod tests { use super::*;
#[test] fn test_create_sstable_empty_entries() { let result = create_sstable(vec![]); assert!(result.is_err());
let err = result.unwrap_err(); assert!(matches!(err, HeliosError::Storage(_))); }
#[test] fn test_create_sstable_valid() { let entries = vec![ Entry::new(b"key1".to_vec(), b"value1".to_vec()), Entry::new(b"key2".to_vec(), b"value2".to_vec()), ];
let result = create_sstable(entries); assert!(result.is_ok());
let sstable = result.unwrap(); assert_eq!(sstable.entries.len(), 2); }
#[test] fn test_error_message_quality() { let result = create_sstable(vec![]); let err_msg = format!("{}", result.unwrap_err());
// Error messages should be descriptive assert!(err_msg.contains("empty")); assert!(err_msg.contains("SSTable") || err_msg.contains("entries")); }}Refactoring Guidelines
Step 1: Identify unwrap() Calls
# Find all unwrap() in production code (exclude tests)grep -r "\.unwrap()" crate-name/src/ --exclude-dir=tests
# Priority order:# 1. CRITICAL: In storage/transaction/consensus paths# 2. HIGH: In frequently executed paths# 3. MEDIUM: In utility functions# 4. LOW: In rarely executed pathsStep 2: Categorize Each unwrap()
For each unwrap(), ask:
- Can this fail? (Yes = must fix, No = consider expect())
- How often is it called? (Frequent = higher priority)
- What happens if it panics? (Data loss = CRITICAL)
- Is there a better pattern? (Helper function? Early validation?)
Step 3: Apply Appropriate Fix
Use the patterns in this document to fix each unwrap().
Step 4: Update Tests
Ensure tests cover both success and error cases.
Step 5: Verify
# Compilecargo check -p crate-name
# Testcargo test -p crate-name
# Clippy (optional: enforce no unwrap)cargo clippy -p crate-name -- -D clippy::unwrap_used🎓 Learning Resources
Internal Resources
- Security Audit Report: docs/SECURITY_AUDIT_REPORT.md
- Security Remediation Plan: docs/SECURITY_REMEDIATION_PLAN.md
- Day 1 Completion Report: docs/SECURITY_FIX_DAY1_COMPLETE.md
Example Files (Good Error Handling)
- heliosdb-security/src/ (Grade: 8.5/10)
- Zero unwrap() in production code
- Model for other crates
External Resources
- Rust Error Handling Survey
- anyhow Crate - Ergonomic error handling
- thiserror Crate - Derive macros for custom errors
Checklist for Code Review
For Authors
Before submitting code, verify:
- Zero unwrap() in production code (except special cases)
- Zero expect() in production code
- All Result types propagated with
? - Error messages are descriptive
- Edge cases validated (empty collections, None, etc.)
- Tests cover error cases
- Documentation explains error conditions
For Reviewers
Check for:
- No unwrap() or expect() in production paths
- Proper Result<T, E> usage
- Descriptive error types (not String)
- Error context preserved
- Edge cases handled
- Tests for error paths
- unsafe blocks documented (if any)
Quick Migration Guide
Before (Unsafe)
fn process(data: &[u8]) -> Vec<u8> { let first = data.first().unwrap(); let last = data.last().unwrap(); let result = compute(*first, *last).unwrap(); result.to_vec()}After (Safe)
fn process(data: &[u8]) -> Result<Vec<u8>, HeliosError> { if data.is_empty() { return Err(HeliosError::InvalidInput("Data cannot be empty".to_string())); }
let first = data[0]; let last = data[data.len() - 1];
let result = compute(first, last) .map_err(|e| HeliosError::Computation(e.to_string()))?;
Ok(result.to_vec())}Changes
- Return Result instead of plain type
- Early validation (empty check)
- Index access (safe after validation)
- Error propagation with
? - Descriptive error messages
Tips & Tricks
Tip 1: Use Clippy Lints
[lints.clippy]unwrap_used = "deny"expect_used = "warn"panic = "deny"Tip 2: Pre-commit Hook
#!/bin/bash# Deny unwrap() in production codeif git diff --cached --name-only | grep -E "src/.*\.rs$" | xargs grep -l "\.unwrap()" ; then echo "ERROR: unwrap() found in production code!" echo "Please use proper error handling." exit 1fiTip 3: IDE Configuration
Configure your IDE to highlight unwrap() calls:
- VS Code: Rust Analyzer → Diagnostics → Clippy
- IntelliJ IDEA: Rust Plugin → Inspections → Enable clippy
Tip 4: Error Message Template
Use this template for error messages:
"<What failed>: <Why it failed> [<Context>]"
Examples:"Failed to create SSTable: empty entries""Failed to read file /data/sstable.db: Permission denied""Invalid port '99999': number too large"📚 Summary
Key Takeaways
- Never unwrap() in production - Use Result and
?operator - Validate early - Check inputs before processing
- Descriptive errors - Help debugging with good messages
- Helper functions - Centralize common error handling patterns
- Test error paths - Ensure errors are handled correctly
- Code review - Catch unwrap() before merge
Production-Ready Error Handling
Before: Code with unwrap() = 🔴 Production risk
After: Code with Result + ? = Production ready
Questions?
Contact: security-team@heliosdb.com
Document Version: 1.0 Last Updated: November 9, 2025 Status: Active Guidelines Next Review: December 9, 2025