EXPLAIN Enhancements - Quick Reference Guide
EXPLAIN Enhancements - Quick Reference Guide
Overview
Enhanced EXPLAIN functionality provides comprehensive query optimization insights including configuration tracking, feature detection, and optimizer decision reasoning.
Quick Start
Basic Usage
use heliosdb_compute::optimizer::EnhancedOptimizer;
let mut optimizer = EnhancedOptimizer::new();let plan = /* your PhysicalPlan */;
// Get optimized plan with explain outputlet (optimized, explain) = optimizer.optimize_with_explain(plan)?;
// Display human-readable formatprintln!("{}", explain.to_text());
// Or export as JSONlet json = explain.to_json()?;Configuration Override
use heliosdb_compute::optimizer::{ConfigSnapshot, ConfigValue};use std::collections::HashMap;
let mut overrides = HashMap::new();overrides.insert("work_mem_mb".to_string(), ConfigValue::Usize(512));overrides.insert("enable_simd".to_string(), ConfigValue::Bool(true));overrides.insert("max_parallel_workers".to_string(), ConfigValue::Usize(16));
let config = ConfigSnapshot::with_overrides(overrides);Features Detected
Implicit Features (Automatically Detected)
-
Partition Pruning
- Trigger: Range predicates on partitioned tables
- Benefit: Skips unnecessary partitions, reduces I/O
- Example:
WHERE date BETWEEN '2024-01-01' AND '2024-01-31'
-
Predicate Pushdown
- Trigger: WHERE clauses in query
- Benefit: Filters applied early, reduces data transfer
- Example:
WHERE age >= 18 AND country = 'US'
-
Projection Pushdown
- Trigger: SELECT specific columns
- Benefit: Reads only needed columns, reduces I/O
- Example:
SELECT id, name FROM users
-
SIMD Vectorization
- Trigger: Numeric aggregations (SUM, AVG, COUNT)
- Benefit: 4x speedup using AVX2/AVX512
- Example:
SELECT SUM(amount), AVG(price) FROM orders
-
Index Usage
- Trigger: High selectivity predicates with available index
- Benefit: 10-100x faster than sequential scan
- Example:
WHERE email = 'user@example.com'
-
Parallel Execution
- Trigger: Large table scans or aggregations
- Benefit: Near-linear speedup with multiple workers
- Example: Scanning table with 8+ shards
-
Join Optimization
- Trigger: Multiple joins in query
- Benefit: Optimal join order reduces intermediate results
- Example:
SELECT * FROM a JOIN b JOIN c
-
Hash Join Strategy
- Trigger: Equi-joins with build side fitting in memory
- Benefit: O(n+m) complexity vs O(n*m) for nested loop
- Example:
JOIN ON a.id = b.user_id
Explicit Features (User-Requested)
Features activated via query hints or configuration settings.
Configuration Parameters
Memory Settings
work_mem_mb: Work memory per operation (default: 256)shared_buffers_mb: Shared buffer cache (default: 1024)effective_cache_size_mb: OS cache estimate (default: 4096)
Join Strategies
enable_hashjoin: Enable hash joins (default: true)enable_mergejoin: Enable merge joins (default: true)enable_nestloop: Enable nested loop joins (default: true)
Scan Strategies
enable_indexscan: Enable index scans (default: true)enable_seqscan: Enable sequential scans (default: true)enable_bitmapscan: Enable bitmap scans (default: true)
Parallelism
max_parallel_workers: System-wide max workers (default: 8)max_parallel_workers_per_gather: Per-query max (default: 4)parallel_tuple_cost: Cost per tuple for parallel (default: 0.1)parallel_setup_cost: Parallel setup overhead (default: 1000.0)
Cost Model
seq_page_cost: Sequential page read cost (default: 1.0)random_page_cost: Random page read cost (default: 4.0)cpu_tuple_cost: Per-tuple CPU cost (default: 0.01)cpu_index_tuple_cost: Per-index-tuple CPU cost (default: 0.005)cpu_operator_cost: Per-operator CPU cost (default: 0.0025)
Advanced Features
enable_partition_pruning: Partition pruning (default: true)enable_simd: SIMD vectorization (default: true)enable_jit: JIT compilation (default: false)
Optimizer Decisions
Decision Types
-
Join Strategy Selection
- Options: Hash Join, Merge Join, Nested Loop Join
- Factors: Build size, sort order, index availability
-
Scan Strategy Selection
- Options: Index Scan, Sequential Scan, Bitmap Scan
- Factors: Selectivity, index availability, table size
-
Join Order Optimization
- Options: All permutations of join order
- Factors: Cardinality estimates, join selectivity
-
Aggregation Strategy
- Options: Hash Aggregation, Sort Aggregation
- Factors: Number of groups, memory availability
-
Parallelism Activation
- Options: Serial vs Parallel execution
- Factors: Table size, operation cost, worker availability
-
Pushdown Optimizations
- Options: Apply at storage vs compute layer
- Factors: Predicate complexity, data transfer cost
Output Formats
Text Format
EXPLAIN PLAN (Enhanced)Estimated Cost: 1523.45Estimated Rows: 12,500Planning Time: 2.34ms
Configuration at Plan Time: work_mem: 256MB, shared_buffers: 1024MB max_parallel_workers: 8, enable_simd: true
Active Features (3): ✓ SIMD Vectorization (IMPLICIT) Trigger: Numeric aggregation operations Benefit: 4x speedup Savings: 75.0% (1000.00 -> 250.00) ...
Optimizer Decisions (2): 1. Join Strategy Selection CHOSEN: Hash Join (cost: 1000.00, rows: 50000) Reasoning: Build side fits in memory REJECTED: Merge Join (cost: 3000.00, 3.0x more expensive) ...
Execution Plan: -> HashAggregate(...) -> HashJoin(...) -> TableScan(...)JSON Format
{ "plan": { ... }, "estimated_cost": 1523.45, "estimated_rows": 12500, "config_params": { "work_mem_mb": 256, "enable_simd": true, ... }, "active_features": { "implicit": [ { "name": "SIMD Vectorization", "category": "Vectorization", "trigger": "Numeric aggregation operations", "benefit": "4x speedup", "savings": { "cost_without": 1000.0, "cost_with": 250.0, "percent_reduction": 75.0 }, "confidence": 0.9 } ], "explicit": [] }, "decisions": [ ... ], "metadata": { "planning_time_ms": 2.34, "optimizer_version": "7.0.0-enhanced", "warnings": [] }}Common Patterns
Detecting Performance Issues
let explain = optimizer.explain_plan(&plan);
// Check for warningsfor warning in explain.metadata.warnings { eprintln!("WARNING: {}", warning);}
// Check if parallelism is available but not usedlet has_parallel = explain.active_features.implicit .iter() .any(|f| f.name == "Parallel Execution");
if !has_parallel && explain.estimated_rows > 1_000_000 { println!("Consider enabling parallelism for large table");}
// Check SIMD usagelet has_simd = explain.active_features.implicit .iter() .any(|f| f.name == "SIMD Vectorization");
if !has_simd && !explain.config_params.enable_simd { println!("SIMD is disabled, enable for better performance");}Comparing Plans
// Plan A with current configlet config_a = ConfigSnapshot::capture();let features_a = FeatureDetector::detect_features(&plan, &config_a);
// Plan B with SIMD disabledlet mut overrides = HashMap::new();overrides.insert("enable_simd".to_string(), ConfigValue::Bool(false));let config_b = ConfigSnapshot::with_overrides(overrides);let features_b = FeatureDetector::detect_features(&plan, &config_b);
println!("Features with SIMD: {}", features_a.total_count());println!("Features without SIMD: {}", features_b.total_count());Extracting Cost Savings
let total_savings: f64 = explain.active_features.implicit .iter() .filter_map(|f| f.savings.as_ref()) .map(|s| s.percent_reduction) .sum();
println!("Total optimizations saved {:.1}% cost", total_savings);Troubleshooting
”No features detected”
Cause: Simple query or missing optimization opportunities Solution: Check if query has:
- WHERE clauses (for predicate pushdown)
- Aggregations (for SIMD)
- Joins (for join optimization)
- Large tables (for parallelism)
“Low confidence scores”
Cause: Placeholder cost estimates Solution: Integrate with actual statistics:
// Update table statisticsoptimizer.update_table_stats(table_id, row_count, size_bytes);“Missing optimizer decisions”
Cause: Decisions not recorded during optimization
Solution: Use optimize_with_explain() instead of optimize():
// Records decisions ✓let (plan, explain) = optimizer.optimize_with_explain(plan)?;
// Doesn't record decisions ✗let plan = optimizer.optimize(plan)?;Best Practices
-
Always Use explain_with_optimize() for Analysis
- Captures full context
- Records all decisions
- Generates warnings
-
Check Warnings First
- Identify quick wins
- Find missing indexes
- Detect configuration issues
-
Compare Before/After
- Test configuration changes
- Validate optimizations
- Track performance improvements
-
Monitor Confidence Scores
- High confidence (>0.8): Reliable estimates
- Medium confidence (0.5-0.8): Approximate estimates
- Low confidence (<0.5): Placeholder estimates
-
Export JSON for Tools
- Integration with dashboards
- Automated analysis
- Historical tracking
Performance Impact
- Planning Time: +2-5ms for explain generation
- Memory: +5-10KB per explain output
- Runtime: Zero impact (explain is optional)
Related Features
- Query Hints: Force specific strategies (future)
- Statistics Manager: Provide accurate cardinality estimates
- Cost Model: Refine cost estimates based on hardware
- JIT Compilation: Additional vectorization (future)
Examples
Example 1: Detect Missing Index
let explain = optimizer.explain_plan(&plan);
// Check if sequential scan is usedlet has_seq_scan = explain.metadata.warnings .iter() .any(|w| w.contains("Sequential scan"));
if has_seq_scan { println!("Consider adding an index to improve performance");}Example 2: Validate SIMD Usage
let plan = build_aggregation_plan();let explain = optimizer.explain_plan(&plan);
let simd_feature = explain.active_features.implicit .iter() .find(|f| f.name == "SIMD Vectorization");
if let Some(simd) = simd_feature { if let Some(savings) = &simd.savings { println!("SIMD provides {:.1}% speedup", savings.percent_reduction); }}Example 3: Join Order Analysis
let join_decisions = explain.decisions .iter() .filter(|d| d.decision_point.contains("Join"));
for decision in join_decisions { println!("Chosen: {} (cost: {:.2})", decision.chosen.name, decision.chosen.cost);
for rejected in &decision.rejected { println!(" Rejected: {} ({:.1}x more expensive)", rejected.name, rejected.relative_cost); }}Version
- Version: 7.0.0-enhanced
- Release Date: November 14, 2025
- Agent: Agent 8 - Week 2 Days 1-4