EXPLAIN Enhancements - Quick Reference Guide

Overview

Enhanced EXPLAIN functionality provides comprehensive query optimization insights including configuration tracking, feature detection, and optimizer decision reasoning.

Quick Start

Basic Usage

use heliosdb_compute::optimizer::EnhancedOptimizer;

let mut optimizer = EnhancedOptimizer::new();
let plan = /* your PhysicalPlan */;

// Get optimized plan with explain output
let (optimized, explain) = optimizer.optimize_with_explain(plan)?;

// Display human-readable format
println!("{}", explain.to_text());

// Or export as JSON
let json = explain.to_json()?;

Configuration Override

use heliosdb_compute::optimizer::{ConfigSnapshot, ConfigValue};
use std::collections::HashMap;

let mut overrides = HashMap::new();
overrides.insert("work_mem_mb".to_string(), ConfigValue::Usize(512));
overrides.insert("enable_simd".to_string(), ConfigValue::Bool(true));
overrides.insert("max_parallel_workers".to_string(), ConfigValue::Usize(16));

let config = ConfigSnapshot::with_overrides(overrides);

Features Detected

Implicit Features (Automatically Detected)

Partition Pruning
- Trigger: Range predicates on partitioned tables
- Benefit: Skips unnecessary partitions, reduces I/O
- Example: WHERE date BETWEEN '2024-01-01' AND '2024-01-31'
Predicate Pushdown
- Trigger: WHERE clauses in query
- Benefit: Filters applied early, reduces data transfer
- Example: WHERE age >= 18 AND country = 'US'
Projection Pushdown
- Trigger: SELECT specific columns
- Benefit: Reads only needed columns, reduces I/O
- Example: SELECT id, name FROM users
SIMD Vectorization
- Trigger: Numeric aggregations (SUM, AVG, COUNT)
- Benefit: 4x speedup using AVX2/AVX512
- Example: SELECT SUM(amount), AVG(price) FROM orders
Index Usage
- Trigger: High selectivity predicates with available index
- Benefit: 10-100x faster than sequential scan
- Example: WHERE email = 'user@example.com'
Parallel Execution
- Trigger: Large table scans or aggregations
- Benefit: Near-linear speedup with multiple workers
- Example: Scanning table with 8+ shards
Join Optimization
- Trigger: Multiple joins in query
- Benefit: Optimal join order reduces intermediate results
- Example: SELECT * FROM a JOIN b JOIN c
Hash Join Strategy
- Trigger: Equi-joins with build side fitting in memory
- Benefit: O(n+m) complexity vs O(n*m) for nested loop
- Example: JOIN ON a.id = b.user_id

Explicit Features (User-Requested)

Features activated via query hints or configuration settings.

Configuration Parameters

Memory Settings

work_mem_mb: Work memory per operation (default: 256)
shared_buffers_mb: Shared buffer cache (default: 1024)
effective_cache_size_mb: OS cache estimate (default: 4096)

Join Strategies

enable_hashjoin: Enable hash joins (default: true)
enable_mergejoin: Enable merge joins (default: true)
enable_nestloop: Enable nested loop joins (default: true)

Scan Strategies

enable_indexscan: Enable index scans (default: true)
enable_seqscan: Enable sequential scans (default: true)
enable_bitmapscan: Enable bitmap scans (default: true)

Parallelism

max_parallel_workers: System-wide max workers (default: 8)
max_parallel_workers_per_gather: Per-query max (default: 4)
parallel_tuple_cost: Cost per tuple for parallel (default: 0.1)
parallel_setup_cost: Parallel setup overhead (default: 1000.0)

Cost Model

seq_page_cost: Sequential page read cost (default: 1.0)
random_page_cost: Random page read cost (default: 4.0)
cpu_tuple_cost: Per-tuple CPU cost (default: 0.01)
cpu_index_tuple_cost: Per-index-tuple CPU cost (default: 0.005)
cpu_operator_cost: Per-operator CPU cost (default: 0.0025)

Advanced Features

enable_partition_pruning: Partition pruning (default: true)
enable_simd: SIMD vectorization (default: true)
enable_jit: JIT compilation (default: false)

Optimizer Decisions

Decision Types

Join Strategy Selection
- Options: Hash Join, Merge Join, Nested Loop Join
- Factors: Build size, sort order, index availability
Scan Strategy Selection
- Options: Index Scan, Sequential Scan, Bitmap Scan
- Factors: Selectivity, index availability, table size
Join Order Optimization
- Options: All permutations of join order
- Factors: Cardinality estimates, join selectivity
Aggregation Strategy
- Options: Hash Aggregation, Sort Aggregation
- Factors: Number of groups, memory availability
Parallelism Activation
- Options: Serial vs Parallel execution
- Factors: Table size, operation cost, worker availability
Pushdown Optimizations
- Options: Apply at storage vs compute layer
- Factors: Predicate complexity, data transfer cost

Output Formats

Text Format

EXPLAIN PLAN (Enhanced)
Estimated Cost: 1523.45
Estimated Rows: 12,500
Planning Time: 2.34ms

Configuration at Plan Time:
  work_mem: 256MB, shared_buffers: 1024MB
  max_parallel_workers: 8, enable_simd: true

Active Features (3):
  ✓ SIMD Vectorization (IMPLICIT)
    Trigger: Numeric aggregation operations
    Benefit: 4x speedup
    Savings: 75.0% (1000.00 -> 250.00)
  ...

Optimizer Decisions (2):
  1. Join Strategy Selection
     CHOSEN: Hash Join (cost: 1000.00, rows: 50000)
     Reasoning: Build side fits in memory
     REJECTED: Merge Join (cost: 3000.00, 3.0x more expensive)
  ...

Execution Plan:
  -> HashAggregate(...)
    -> HashJoin(...)
      -> TableScan(...)

JSON Format

{
  "plan": { ... },
  "estimated_cost": 1523.45,
  "estimated_rows": 12500,
  "config_params": {
    "work_mem_mb": 256,
    "enable_simd": true,
    ...
  },
  "active_features": {
    "implicit": [
      {
        "name": "SIMD Vectorization",
        "category": "Vectorization",
        "trigger": "Numeric aggregation operations",
        "benefit": "4x speedup",
        "savings": {
          "cost_without": 1000.0,
          "cost_with": 250.0,
          "percent_reduction": 75.0
        },
        "confidence": 0.9
      }
    ],
    "explicit": []
  },
  "decisions": [ ... ],
  "metadata": {
    "planning_time_ms": 2.34,
    "optimizer_version": "7.0.0-enhanced",
    "warnings": []
  }
}

Common Patterns

Detecting Performance Issues

let explain = optimizer.explain_plan(&plan);

// Check for warnings
for warning in explain.metadata.warnings {
    eprintln!("WARNING: {}", warning);
}

// Check if parallelism is available but not used
let has_parallel = explain.active_features.implicit
    .iter()
    .any(|f| f.name == "Parallel Execution");

if !has_parallel && explain.estimated_rows > 1_000_000 {
    println!("Consider enabling parallelism for large table");
}

// Check SIMD usage
let has_simd = explain.active_features.implicit
    .iter()
    .any(|f| f.name == "SIMD Vectorization");

if !has_simd && !explain.config_params.enable_simd {
    println!("SIMD is disabled, enable for better performance");
}

Comparing Plans

// Plan A with current config
let config_a = ConfigSnapshot::capture();
let features_a = FeatureDetector::detect_features(&plan, &config_a);

// Plan B with SIMD disabled
let mut overrides = HashMap::new();
overrides.insert("enable_simd".to_string(), ConfigValue::Bool(false));
let config_b = ConfigSnapshot::with_overrides(overrides);
let features_b = FeatureDetector::detect_features(&plan, &config_b);

println!("Features with SIMD: {}", features_a.total_count());
println!("Features without SIMD: {}", features_b.total_count());

Extracting Cost Savings

let total_savings: f64 = explain.active_features.implicit
    .iter()
    .filter_map(|f| f.savings.as_ref())
    .map(|s| s.percent_reduction)
    .sum();

println!("Total optimizations saved {:.1}% cost", total_savings);

Troubleshooting

”No features detected”

Cause: Simple query or missing optimization opportunities Solution: Check if query has:

WHERE clauses (for predicate pushdown)
Aggregations (for SIMD)
Joins (for join optimization)
Large tables (for parallelism)

“Low confidence scores”

Cause: Placeholder cost estimates Solution: Integrate with actual statistics:

// Update table statistics
optimizer.update_table_stats(table_id, row_count, size_bytes);

“Missing optimizer decisions”

Cause: Decisions not recorded during optimization Solution: Use optimize_with_explain() instead of optimize():

// Records decisions ✓
let (plan, explain) = optimizer.optimize_with_explain(plan)?;

// Doesn't record decisions ✗
let plan = optimizer.optimize(plan)?;

Best Practices

Always Use explain_with_optimize() for Analysis
- Captures full context
- Records all decisions
- Generates warnings
Check Warnings First
- Identify quick wins
- Find missing indexes
- Detect configuration issues
Compare Before/After
- Test configuration changes
- Validate optimizations
- Track performance improvements
Monitor Confidence Scores
- High confidence (>0.8): Reliable estimates
- Medium confidence (0.5-0.8): Approximate estimates
- Low confidence (<0.5): Placeholder estimates
Export JSON for Tools
- Integration with dashboards
- Automated analysis
- Historical tracking

Performance Impact

Planning Time: +2-5ms for explain generation
Memory: +5-10KB per explain output
Runtime: Zero impact (explain is optional)

Query Hints: Force specific strategies (future)
Statistics Manager: Provide accurate cardinality estimates
Cost Model: Refine cost estimates based on hardware
JIT Compilation: Additional vectorization (future)

Examples

Example 1: Detect Missing Index

let explain = optimizer.explain_plan(&plan);

// Check if sequential scan is used
let has_seq_scan = explain.metadata.warnings
    .iter()
    .any(|w| w.contains("Sequential scan"));

if has_seq_scan {
    println!("Consider adding an index to improve performance");
}

Example 2: Validate SIMD Usage

let plan = build_aggregation_plan();
let explain = optimizer.explain_plan(&plan);

let simd_feature = explain.active_features.implicit
    .iter()
    .find(|f| f.name == "SIMD Vectorization");

if let Some(simd) = simd_feature {
    if let Some(savings) = &simd.savings {
        println!("SIMD provides {:.1}% speedup", savings.percent_reduction);
    }
}

Example 3: Join Order Analysis

let join_decisions = explain.decisions
    .iter()
    .filter(|d| d.decision_point.contains("Join"));

for decision in join_decisions {
    println!("Chosen: {} (cost: {:.2})",
        decision.chosen.name,
        decision.chosen.cost);

    for rejected in &decision.rejected {
        println!("  Rejected: {} ({:.1}x more expensive)",
            rejected.name,
            rejected.relative_cost);
    }
}

Version

Version: 7.0.0-enhanced

EXPLAIN Enhancements - Quick Reference Guide

EXPLAIN Enhancements - Quick Reference Guide

Overview

Quick Start

Basic Usage

Configuration Override

Features Detected

Implicit Features (Automatically Detected)

Explicit Features (User-Requested)

Configuration Parameters

Memory Settings

Join Strategies

Scan Strategies

Parallelism

Cost Model

Advanced Features

Optimizer Decisions

Decision Types

Output Formats

Text Format

JSON Format

Common Patterns

Detecting Performance Issues

Comparing Plans

Extracting Cost Savings

Troubleshooting

”No features detected”

“Low confidence scores”

“Missing optimizer decisions”

Best Practices

Performance Impact

Related Features

Examples

Example 1: Detect Missing Index

Example 2: Validate SIMD Usage

Example 3: Join Order Analysis

Version

See Also