HashJoin Operator Architecture Design
HashJoin Operator Architecture Design
Document Version: 1.0 Created: November 13, 2025 Owner: System Architecture Designer Status: ⚠️ PARTIALLY IMPLEMENTED (Plan Structure Only) Algorithm Source: Silberschatz “Database System Concepts” (Chapter 12), Ramakrishnan “Database Management Systems” (Chapter 14)
STATUS UPDATE (2025-12-09)
Implementation Progress: 30% Complete
What’s Implemented
- ✅
PhysicalPlan::HashJoinvariant defined insrc/optimizer/planner.rs(lines 48-57) - ✅ Planner logic for hash join selection (
should_use_hash_joinmethod) - ✅ Plan structure supports all join types (INNER, LEFT, RIGHT, FULL)
What’s NOT Yet Implemented
- ❌ Executor implementation (
compute/executor.rs) - ❌ Hash table construction and probing logic
- ❌ Memory management and limits
- ❌ Outer join unmatched tuple emission
- ❌ Integration tests for hash join execution
Current Behavior
Queries that match hash join criteria fall back to nested loop join (less optimal but correct).
Estimated Effort to Complete: 3-5 days (Phase 4 in roadmap)
Executive Summary
This document specifies the architecture for HashJoinOperator, a critical physical operator for HeliosDB Nano that implements hash-based join algorithms. The design follows standard database textbook algorithms (Silberschatz, Ramakrishan) and integrates with the existing Volcano/Iterator execution model.
Key Design Decisions:
- Two-phase algorithm: Build phase (hash table construction) + Probe phase (tuple matching)
- FNV-1a hash function (fast, non-cryptographic, good distribution)
- In-memory hash table with overflow chaining
- Support for INNER, LEFT, RIGHT, FULL OUTER joins
- Integration with existing PhysicalOperator trait
- Memory-bounded operation with configurable limits
1. Architecture Overview
1.1 Standard Hash Join Algorithm
The classic hash join algorithm (Silberschatz Ch. 12.5.3) operates in two phases:
- Build Phase: Read all tuples from the smaller relation (right/inner), extract join keys, build hash table
- Probe Phase: Read tuples from the larger relation (left/outer), probe hash table, emit matches
Algorithm Complexity:
- Time: O(|R| + |S|) for relations R, S
- Space: O(|S|) where S is the smaller (build) relation
- Assumes hash table fits in memory (addressed via memory limits)
1.2 Integration with Volcano Model
HashJoinOperator implements the PhysicalOperator trait:
pub trait PhysicalOperator { fn next(&mut self) -> Result<Option<Tuple>>; fn schema(&self) -> Arc<Schema>;}Execution Flow:
- First call to
next(): Execute build phase (materialize right side into hash table) - Subsequent calls: Execute probe phase (stream left side, probe hash table)
- Return
Ok(None)when exhausted
2. Data Structures
2.1 HashJoinOperator Struct
use std::collections::HashMap;use std::sync::Arc;use crate::{Result, Tuple, Schema, Value};use crate::sql::{LogicalExpr, JoinType};
/// Hash join operator////// Implements hash-based join using classic two-phase algorithm:/// 1. Build phase: Hash all tuples from right (build) side/// 2. Probe phase: Stream left (probe) side, lookup matches////// Algorithm from Silberschatz "Database System Concepts" Ch. 12.5.3pub struct HashJoinOperator { // Input operators left: Box<dyn PhysicalOperator>, right: Box<dyn PhysicalOperator>,
// Join specification join_type: JoinType, on_condition: Option<LogicalExpr>,
// Hash table (key: join columns, value: matching tuples) hash_table: HashMap<JoinKey, Vec<Tuple>>,
// Output schema output_schema: Arc<Schema>,
// Expression evaluator evaluator: crate::sql::Evaluator,
// State machine state: JoinState,
// Probe phase state current_left_tuple: Option<Tuple>, current_matches: Vec<Tuple>, match_index: usize,
// LEFT/RIGHT/FULL join state matched_left_tuples: HashSet<usize>, // Track which left tuples matched (for FULL) matched_right_keys: HashSet<JoinKey>, // Track which right keys matched (for RIGHT/FULL)
// Memory management memory_limit: usize, // Maximum memory for hash table (bytes) memory_used: usize, // Current memory usage estimate}
/// Join key for hash table lookups#[derive(Debug, Clone, PartialEq, Eq, Hash)]struct JoinKey(Vec<Value>);
/// State machine for join execution#[derive(Debug, Clone, Copy, PartialEq, Eq)]enum JoinState { /// Initial state - build phase not started Initial, /// Building hash table from right side Building, /// Probing hash table with left side Probing, /// Emitting unmatched tuples (for outer joins) EmittingUnmatched, /// Exhausted - no more tuples Exhausted,}2.2 Hash Table Design
Structure: HashMap<JoinKey, Vec<Tuple>>
- Key: JoinKey wrapping
Vec<Value>(supports composite keys) - Value:
Vec<Tuple>(handles duplicate join keys via chaining)
Rationale:
- Standard Rust HashMap provides O(1) average-case lookup
- FNV-1a hash (default Rust hasher is SipHash-1-3, we override for performance)
- Overflow chaining handles hash collisions and duplicate keys naturally
Memory Model:
Hash Table Memory = HashMap overhead (24 bytes per entry) + JoinKey size (8 bytes per Value + 24 bytes Vec overhead) + Vec<Tuple> size (24 bytes + N * Tuple size)Estimated: ~100-200 bytes per entry + tuple data
3. Build Phase Algorithm
3.1 Pseudocode (Silberschatz Algorithm 12.10)
Algorithm: HashJoinBuild(R)Input: Right relation R, join attributesOutput: Hash table H
1. H := empty hash table2. for each tuple t_r in R do3. k := extract_join_key(t_r)4. H[k].append(t_r)5. return H3.2 Rust Implementation
impl HashJoinOperator { /// Execute build phase: construct hash table from right side /// /// This phase materializes ALL tuples from the right (build) input /// into an in-memory hash table indexed by join keys. /// /// Algorithm: Silberschatz Ch. 12.5.3 fn build_phase(&mut self) -> Result<()> { // Transition state if self.state != JoinState::Initial { return Err(Error::query_execution("Build phase already executed")); } self.state = JoinState::Building;
// Read all tuples from right (build) side while let Some(tuple) = self.right.next()? { // Extract join key from tuple let key = self.extract_join_key(&tuple, true)?;
// Estimate memory for this tuple let tuple_size = estimate_tuple_size(&tuple); let key_size = estimate_key_size(&key); let entry_overhead = 24; // HashMap entry overhead let additional_memory = tuple_size + key_size + entry_overhead;
// Check memory limit if self.memory_used + additional_memory > self.memory_limit { return Err(Error::query_execution( format!( "Hash join exceeds memory limit ({} bytes). Consider increasing limit or using nested loop join.", self.memory_limit ) )); }
// Insert into hash table (with overflow chaining) self.hash_table .entry(key) .or_insert_with(Vec::new) .push(tuple);
self.memory_used += additional_memory; }
// Transition to probe phase self.state = JoinState::Probing;
Ok(()) }
/// Extract join key from a tuple /// /// For ON clause like: a.id = b.id AND a.type = b.type /// Extract values of [a.id, a.type] or [b.id, b.type] depending on side fn extract_join_key(&self, tuple: &Tuple, is_right_side: bool) -> Result<JoinKey> { if let Some(condition) = &self.on_condition { let key_values = self.extract_join_columns(condition, tuple, is_right_side)?; Ok(JoinKey(key_values)) } else { // Cross join - use empty key (all tuples match) Ok(JoinKey(vec![])) } }
/// Extract join column values from ON condition /// /// Handles: /// - Simple equality: a.id = b.id /// - Composite keys: a.id = b.id AND a.type = b.type /// - Complex expressions: EXTRACT(YEAR FROM a.date) = b.year fn extract_join_columns( &self, condition: &LogicalExpr, tuple: &Tuple, is_right_side: bool, ) -> Result<Vec<Value>> { use LogicalExpr::*; use BinaryOperator::*;
match condition { BinaryExpr { left, op: Eq, right } => { // Single equality: extract the appropriate side let expr = if is_right_side { right } else { left }; let value = self.evaluator.evaluate(expr, tuple)?; Ok(vec![value]) } BinaryExpr { left, op: And, right } => { // Composite key: a.x = b.x AND a.y = b.y let mut values = self.extract_join_columns(left, tuple, is_right_side)?; values.extend(self.extract_join_columns(right, tuple, is_right_side)?); Ok(values) } _ => { // For non-equality joins, we'll evaluate the full condition later // Use empty key to force Cartesian product, then filter Ok(vec![]) } } }}
/// Estimate memory size of a tuplefn estimate_tuple_size(tuple: &Tuple) -> usize { // Base tuple overhead + values let base = 24; // Vec overhead let values_size: usize = tuple.values.iter() .map(|v| estimate_value_size(v)) .sum(); base + values_size}
/// Estimate memory size of a valuefn estimate_value_size(value: &Value) -> usize { use Value::*; match value { Null => 1, Boolean(_) => 1, Int2(_) => 2, Int4(_) => 4, Int8(_) => 8, Float4(_) => 4, Float8(_) => 8, String(s) => 24 + s.len(), Bytes(b) => 24 + b.len(), Vector(v) => 24 + v.len() * 4, Array(arr) => 24 + arr.iter().map(estimate_value_size).sum::<usize>(), Json(_) => 256, // Rough estimate for JSON _ => 64, // Conservative estimate }}
fn estimate_key_size(key: &JoinKey) -> usize { 24 + key.0.iter().map(estimate_value_size).sum::<usize>()}Build Phase Complexity:
- Time: O(|R|) where R is right relation
- Space: O(|R|) for hash table
- Memory limit: Configurable (default 100MB)
4. Probe Phase Algorithm
4.1 Pseudocode (Silberschatz Algorithm 12.11)
Algorithm: HashJoinProbe(S, H)Input: Left relation S, hash table HOutput: Stream of joined tuples
1. for each tuple t_s in S do2. k := extract_join_key(t_s)3. if k in H then4. for each tuple t_r in H[k] do5. if join_condition(t_s, t_r) then6. emit join(t_s, t_r)4.2 Rust Implementation (PhysicalOperator::next)
impl PhysicalOperator for HashJoinOperator { fn next(&mut self) -> Result<Option<Tuple>> { // Execute build phase on first call if self.state == JoinState::Initial { self.build_phase()?; }
match self.state { JoinState::Probing => self.probe_phase(), JoinState::EmittingUnmatched => self.emit_unmatched(), JoinState::Exhausted => Ok(None), _ => Err(Error::query_execution("Invalid join state")), } }
fn schema(&self) -> Arc<Schema> { self.output_schema.clone() }}
impl HashJoinOperator { /// Probe phase: stream left side, lookup matches in hash table fn probe_phase(&mut self) -> Result<Option<Tuple>> { loop { // If we have pending matches for current left tuple, emit them if self.match_index < self.current_matches.len() { let right_tuple = &self.current_matches[self.match_index]; self.match_index += 1;
let left_tuple = self.current_left_tuple.as_ref() .ok_or_else(|| Error::query_execution("Missing left tuple"))?;
// Mark this as a matched tuple (for outer joins) if matches!(self.join_type, JoinType::Full | JoinType::Left) { // Track that this left tuple found a match // (Implementation detail: we'd need tuple IDs for this) }
return Ok(Some(self.join_tuples(left_tuple, right_tuple))); }
// Get next left tuple match self.left.next()? { None => { // No more left tuples // For outer joins, emit unmatched tuples if matches!(self.join_type, JoinType::Left | JoinType::Right | JoinType::Full) { self.state = JoinState::EmittingUnmatched; return self.emit_unmatched(); }
self.state = JoinState::Exhausted; return Ok(None); } Some(left_tuple) => { // Extract join key and probe hash table let key = self.extract_join_key(&left_tuple, false)?;
// Lookup in hash table if let Some(matches) = self.hash_table.get(&key) { // Found matches - filter by full join condition let filtered_matches: Vec<Tuple> = matches.iter() .filter(|right_tuple| { self.evaluate_join_condition(&left_tuple, right_tuple) .unwrap_or(false) }) .cloned() .collect();
if !filtered_matches.is_empty() { // Store matches and emit first one self.current_left_tuple = Some(left_tuple); self.current_matches = filtered_matches; self.match_index = 0;
// Continue loop to emit first match continue; } }
// No matches found for this left tuple // For LEFT/FULL join, emit with NULLs if matches!(self.join_type, JoinType::Left | JoinType::Full) { return Ok(Some(self.join_with_nulls_right(&left_tuple))); }
// For INNER join, skip this tuple continue; } } } }
/// Evaluate full join condition on combined tuple fn evaluate_join_condition(&self, left: &Tuple, right: &Tuple) -> Result<bool> { if let Some(condition) = &self.on_condition { // Combine tuples for evaluation let combined = self.join_tuples(left, right);
// Evaluate condition let result = self.evaluator.evaluate(condition, &combined)?;
match result { Value::Boolean(b) => Ok(b), _ => Ok(false), } } else { // No condition = cross join = always match Ok(true) } }
/// Join two tuples (concatenate values) fn join_tuples(&self, left: &Tuple, right: &Tuple) -> Tuple { let mut values = left.values.clone(); values.extend(right.values.clone()); Tuple::new(values) }
/// Join left tuple with NULLs (for unmatched left tuple in LEFT/FULL join) fn join_with_nulls_right(&self, left: &Tuple) -> Tuple { let mut values = left.values.clone();
// Add NULLs for all right columns let right_schema = self.right.schema(); values.extend(vec![Value::Null; right_schema.columns.len()]);
Tuple::new(values) }
/// Join right tuple with NULLs (for unmatched right tuple in RIGHT/FULL join) fn join_with_nulls_left(&self, right: &Tuple) -> Tuple { let left_schema = self.left.schema(); let mut values = vec![Value::Null; left_schema.columns.len()]; values.extend(right.values.clone()); Tuple::new(values) }}Probe Phase Complexity:
- Time: O(|S|) where S is left relation (assuming O(1) hash lookup)
- Space: O(1) for state (tuples streamed, not materialized)
5. Join Type Handling
5.1 INNER JOIN (Default)
Semantics: Emit only tuples that match the join condition
Algorithm: Standard probe phase (implemented above)
- Build hash table from right
- Probe with left
- Emit only matching pairs
Example:
SELECT * FROM employees eINNER JOIN departments d ON e.dept_id = d.id5.2 LEFT OUTER JOIN
Semantics: All left tuples + matching right tuples (NULLs for unmatched left)
Algorithm Modification:
fn probe_phase_left_join(&mut self) -> Result<Option<Tuple>> { // ... standard probe ...
// No matches found for left tuple if no_matches { // Emit left tuple with NULLs for right side return Ok(Some(self.join_with_nulls_right(&left_tuple))); }}Key Change: When no hash table match, emit left tuple + NULL padding
5.3 RIGHT OUTER JOIN
Semantics: All right tuples + matching left tuples (NULLs for unmatched right)
Algorithm: Transform to LEFT JOIN by swapping inputs
impl HashJoinOperator { fn new_right_join( left: Box<dyn PhysicalOperator>, right: Box<dyn PhysicalOperator>, on: Option<LogicalExpr>, ) -> Self { // RIGHT JOIN: swap left and right, execute as LEFT JOIN Self::new_left_join( right, // Swap: right becomes left (probe) left, // Swap: left becomes right (build) Self::swap_join_condition(on), // Adjust ON clause ) }
/// Swap column references in join condition fn swap_join_condition(condition: Option<LogicalExpr>) -> Option<LogicalExpr> { // Convert: left.col = right.col -> right.col = left.col // This requires tracking which columns belong to which side // (Implementation detail - analyze condition and swap references) condition // Simplified - actual impl swaps column refs }}Rationale: RIGHT JOIN is uncommon; transforming to LEFT JOIN simplifies implementation
5.4 FULL OUTER JOIN
Semantics: All left tuples + all right tuples (NULLs for unmatched on either side)
Algorithm: Two-pass approach
fn emit_unmatched(&mut self) -> Result<Option<Tuple>> { match self.join_type { JoinType::Left => { // Already handled during probe phase self.state = JoinState::Exhausted; Ok(None) } JoinType::Right | JoinType::Full => { // Emit unmatched tuples from hash table (right side) for (key, tuples) in &self.hash_table { if !self.matched_right_keys.contains(key) { for tuple in tuples { // Emit right tuple with NULLs for left side return Ok(Some(self.join_with_nulls_left(tuple))); } } }
self.state = JoinState::Exhausted; Ok(None) } _ => { self.state = JoinState::Exhausted; Ok(None) } }}Implementation Notes:
- Track matched keys during probe phase:
matched_right_keys: HashSet<JoinKey> - After probe phase completes, iterate hash table and emit unmatched right tuples
- Add NULL padding for left columns
Complexity: O(|R|) additional pass over hash table
6. Memory Management
6.1 Memory Limits
Strategy: Bounded memory with explicit limits
const DEFAULT_MEMORY_LIMIT: usize = 100 * 1024 * 1024; // 100 MB
impl HashJoinOperator { pub fn new( left: Box<dyn PhysicalOperator>, right: Box<dyn PhysicalOperator>, join_type: JoinType, on: Option<LogicalExpr>, ) -> Result<Self> { Self::with_memory_limit(left, right, join_type, on, DEFAULT_MEMORY_LIMIT) }
pub fn with_memory_limit( left: Box<dyn PhysicalOperator>, right: Box<dyn PhysicalOperator>, join_type: JoinType, on: Option<LogicalExpr>, memory_limit: usize, ) -> Result<Self> { // ... initialize ... }}Behavior:
- Track memory usage during build phase:
memory_used - Reject inserts exceeding
memory_limit - Return error: “Hash join exceeds memory limit. Consider using nested loop join.”
- Future optimization: Spill to disk (hybrid hash join)
6.2 Build Side Selection
Strategy: Build on smaller relation
fn choose_build_side( left: &Box<dyn PhysicalOperator>, right: &Box<dyn PhysicalOperator>,) -> (Box<dyn PhysicalOperator>, Box<dyn PhysicalOperator>) { // Heuristic: If we have cardinality estimates, build on smaller side // For now, always build on right (convention) // Future: Query optimizer should swap inputs based on statistics (left.clone(), right.clone())}Rationale: Optimizer responsibility (not operator implementation)
6.3 Hash Table Sizing
Strategy: Let Rust HashMap auto-grow
// HashMap::with_capacity(estimated_size) for better performancelet estimated_tuples = 10_000; // Default estimateself.hash_table = HashMap::with_capacity(estimated_tuples);Trade-off: Slight memory overhead vs. avoiding rehashing
7. Integration Plan
7.1 Changes to Existing Code
File: /home/claude/HeliosDB/heliosdb-nano/src/sql/executor.rs
Modification 1: Add HashJoinOperator to executor plan translation
impl<'a> Executor<'a> { fn plan_to_operator(&self, plan: &LogicalPlan) -> Result<Box<dyn PhysicalOperator>> { match plan { // ... existing cases ...
LogicalPlan::Join { left, right, join_type, on } => { let left_op = self.plan_to_operator(left)?; let right_op = self.plan_to_operator(right)?;
// Decision: Use HashJoin or NestedLoopJoin? // For MVP: always use HashJoin for equi-joins if is_equi_join(on) { Ok(Box::new(HashJoinOperator::new( left_op, right_op, join_type.clone(), on.clone(), )?)) } else { // Fall back to nested loop for non-equi joins Ok(Box::new(NestedLoopJoinOperator::new( left_op, right_op, join_type.clone(), on.clone(), )?)) } }
// ... rest of cases ... } }}
/// Check if join condition is an equi-join (equality on join keys)fn is_equi_join(condition: &Option<LogicalExpr>) -> bool { // Implementation: Walk condition tree, check for equality operators // Return true if all join predicates are equality comparisons true // Simplified - always use hash join for MVP}Modification 2: Add HashJoinOperator module
// At top of executor.rsmod hash_join;use hash_join::HashJoinOperator;7.2 New Files
File: /home/claude/HeliosDB/heliosdb-nano/src/sql/hash_join.rs
- Contains
HashJoinOperatorimplementation - Contains helper functions (
estimate_tuple_size, etc.) - Contains
JoinKeyandJoinStatetypes - ~500-600 lines of code
7.3 Testing Strategy
Unit Tests (in hash_join.rs):
#[cfg(test)]mod tests { #[test] fn test_inner_join_simple() { // Two tables: employees, departments // Test basic INNER JOIN on dept_id }
#[test] fn test_left_join_with_nulls() { // Test LEFT JOIN produces NULLs for unmatched }
#[test] fn test_composite_key() { // Test JOIN with multiple key columns }
#[test] fn test_duplicate_keys() { // Test hash table overflow chaining }
#[test] fn test_memory_limit() { // Test that memory limit is enforced }}Integration Tests (in tests/advanced_sql_test.rs):
#[test]fn test_join_with_where() { // SELECT * FROM a JOIN b ON a.id = b.id WHERE a.value > 10}
#[test]fn test_multi_table_join() { // Three-way join: a JOIN b JOIN c}
#[test]fn test_join_with_aggregation() { // SELECT dept, COUNT(*) FROM employees JOIN departments GROUP BY dept}Performance Benchmarks:
#[bench]fn bench_hash_join_100k(b: &mut Bencher) { // Benchmark with 100K rows // Target: >10K joins/sec}8. Performance Characteristics
8.1 Theoretical Complexity
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Build Phase | O(N) | O(N) |
| Probe Phase | O(M) | O(1) |
| Total | O(N + M) | O(N) |
Where N = right relation size, M = left relation size
8.2 Expected Performance
Assumptions:
- FNV-1a hash: ~2 CPU cycles per byte
- Memory bandwidth: ~50 GB/s
- Tuple size: ~100 bytes average
Estimated Throughput:
- Build: ~500K tuples/sec (memory-bound)
- Probe: ~1M tuples/sec (CPU-bound)
- Combined: ~10K-50K joins/sec (depends on selectivity)
Comparison to Nested Loop:
- Nested loop: O(N * M) = 100x-1000x slower for large inputs
- Hash join preferred when: N * M > 1000 (rough heuristic)
8.3 Memory Usage
Example: Join 100K tuples (100 bytes each)
- Hash table: 100K * (100 bytes + 48 bytes overhead) = ~14 MB
- Well within 100 MB default limit
Guideline: Hash join efficient up to ~500K-1M tuples on build side
9. Future Optimizations
9.1 Hybrid Hash Join (Spill to Disk)
Problem: Hash table exceeds memory limit Solution: Partition relations, spill partitions to disk (Grace Hash Join)
Algorithm (Ramakrishnan Ch. 14.4.2):
- Partition both relations into N buckets using hash function h1
- Build hash table for each bucket using hash function h2 (h1 != h2)
- Join partitions one at a time
Complexity: O(N + M) with 2-3 passes over data
Implementation Estimate: +300 LOC, requires disk I/O abstraction
9.2 Bloom Filter Optimization
Problem: Probe phase wastes time on non-matching keys Solution: Bloom filter to skip lookups for keys not in hash table
Algorithm:
- During build: Insert all keys into Bloom filter
- During probe: Check Bloom filter before hash table lookup
- False positive: Harmless (lookup finds nothing)
- False negative: Impossible by design
Benefit: 10-30% speedup when selectivity < 50%
Implementation Estimate: +100 LOC
9.3 SIMD Key Comparison
Problem: Key equality checks dominate CPU time Solution: Vectorized comparison using SIMD (AVX2/AVX512)
Benefit: 2-4x speedup for integer keys
Implementation Estimate: +200 LOC, requires std::arch or external crate
10. Architectural Decision Records (ADRs)
ADR-001: Use FNV-1a Hash Function
Context: Need fast, non-cryptographic hash for join keys
Decision: Use FNV-1a (Fowler-Noll-Vo) hash
Rationale:
- Excellent distribution (low collision rate)
- Fast: 2-3 CPU cycles per byte
- Simple implementation (public domain)
- Better than SipHash-1-3 (Rust default) for performance
Trade-offs:
- Not cryptographically secure (irrelevant for joins)
- Slightly worse than xxHash (but simpler)
Status: Accepted
ADR-002: In-Memory Only (No Disk Spill)
Context: Need to handle large joins that exceed memory
Decision: Initial implementation is in-memory only with explicit limits
Rationale:
- MVP scope: Keep implementation simple
- 100 MB limit covers 90%+ of real-world joins
- Error message guides users to alternatives (nested loop, increase limit)
- Hybrid hash join is future optimization
Trade-offs:
- Cannot join relations larger than memory limit
- User must manually switch to nested loop join
Status: Accepted (with future enhancement path)
ADR-003: Build on Right, Probe on Left
Context: Which side should be build vs. probe?
Decision: Always build on right, probe on left (convention)
Rationale:
- Matches SQL semantics (right side is typically inner relation)
- Query optimizer can swap inputs based on cardinality estimates
- Operator implementation is agnostic (works either way)
Trade-offs:
- Relies on optimizer to choose smaller relation for right side
- Without optimizer hints, may build on larger side (suboptimal)
Status: Accepted (optimizer enhancement tracked separately)
ADR-004: Support All Join Types
Context: Which join types to support?
Decision: INNER, LEFT, RIGHT, FULL (all standard SQL join types)
Rationale:
- Complete SQL compatibility
- LEFT join is critical for outer join patterns
- RIGHT/FULL are easily derived from LEFT
- Cross join is degenerate case (empty join key)
Trade-offs:
- Additional complexity for outer join tracking
- ~100 LOC for unmatched tuple emission
Status: Accepted
11. Implementation Roadmap
Phase 1: Core Hash Join (Days 1-2)
Tasks:
- Create
hash_join.rsmodule - Implement
HashJoinOperatorstruct - Implement build phase
- Implement probe phase (INNER JOIN only)
- Unit tests for basic functionality
Deliverable: INNER JOIN working
Phase 2: Outer Joins (Day 3)
Tasks:
- Add LEFT JOIN support
- Add RIGHT JOIN support (transform to LEFT)
- Add FULL JOIN support
- Unit tests for each join type
Deliverable: All join types working
Phase 3: Integration & Testing (Day 4)
Tasks:
- Integrate with executor (
plan_to_operator) - Add equi-join detection
- Integration tests
- Performance benchmarks
Deliverable: Hash join production-ready
Phase 4: Documentation & Polish (Day 5)
Tasks:
- Rustdoc comments
- User-facing error messages
- Memory limit configuration
- Code review
Deliverable: Release-ready
Total Estimate: 5 days (40 hours)
12. Conclusion
This architecture provides a complete, textbook-correct implementation of hash join for HeliosDB Nano. The design:
- Follows Standard Algorithms: Silberschatz Ch. 12.5.3, Ramakrishnan Ch. 14
- Integrates with Volcano Model: Implements
PhysicalOperatortrait - Supports All Join Types: INNER, LEFT, RIGHT, FULL
- Manages Memory Explicitly: Configurable limits, clear error messages
- Provides Future Optimization Path: Hybrid hash join, Bloom filters, SIMD
Key Metrics:
- Time Complexity: O(N + M) (optimal for equi-joins)
- Space Complexity: O(N) where N is build side
- Expected Performance: 10K-50K joins/sec
- Memory Limit: 100 MB default (configurable)
Next Steps:
- Coder Agent: Implement
hash_join.rsfollowing this spec - Tester Agent: Develop comprehensive test suite
- Optimizer Agent: Add cost model for join operator selection
- Documentation Agent: User guide for JOIN operations
Document Status: APPROVED FOR IMPLEMENTATION Review Date: November 13, 2025 Reviewed By: System Architecture Designer Next Review: Post-Implementation (after Phase 4)