Why Not Analysis - Quick Reference Guide

Feature: Query Optimizer “Why Not” Analysis Version: 7.0.0 Module: heliosdb-compute::optimizer::why_not

Overview

The “Why Not” analyzer explains why the query optimizer didn’t apply certain optimizations, helping you understand and improve query performance.

Quick Start

use heliosdb_compute::optimizer::{WhyNotAnalyzer, WhyNotTableStatistics};
use heliosdb_compute::planner::{IndexMetadata, IndexType, PhysicalPlan};

// 1. Create analyzer
let mut analyzer = WhyNotAnalyzer::new();

// 2. Register indexes
analyzer.register_index(IndexMetadata {
    name: "idx_users_email".to_string(),
    table_id: 1,
    index_type: IndexType::BTree,
    columns: vec![0],
    included_columns: vec![],
    is_covering: false,
});

// 3. Register statistics
analyzer.register_statistics(
    "users".to_string(),
    WhyNotTableStatistics {
        last_analyzed: "2024-10-01".to_string(),
        days_since_analysis: 45,
        row_count: 1000000,
        percent_modified: 38.5,
    },
);

// 4. Analyze plan
let analysis = analyzer.analyze(&plan, &predicates);

// 5. View results
println!("{}", analysis.to_text());

Analysis Categories

1. Index Usage Analysis

Detects: Why available indexes weren’t used

Common Reasons:

❌ No WHERE clause predicates
❌ LIKE with leading wildcard (%search)
❌ Hash index used for range queries
❌ Low selectivity (seq scan cheaper)

Example Output:

INDEX USAGE ANALYSIS (1 indexes)
────────────────────────────────────────────────────────────

1. Index: idx_users_email
   Table: users | Type: BTree | Severity: High
   Reason: LIKE with leading wildcard '%search'
    Suggestion: Use full-text search or trigram index
   💰 Cost Impact: 40.0x slower (50.00 vs 2000.00)

2. Statistics Freshness

Detects: Stale table statistics affecting plans

Severity Levels:

🟢 Low: 7-14 days old
🟡 Medium: 14-30 days old
🔴 High: > 30 days old

Example Output:

STATISTICS WARNINGS (1 tables)
────────────────────────────────────────────────────────────

1. Table: orders
   Last Updated: 2024-10-01 | Severity: High
   Freshness: 45 days old (38.5% modified)
    Action: ANALYZE orders;

3. Cardinality Mismatches

Detects: Large differences between estimated and actual row counts

Severity Thresholds:

🟢 Low: < 100% error
🟡 Medium: 100-500% error
🟠 High: 500-1000% error
🔴 Critical: > 1000% error

Example Output:

CARDINALITY MISMATCHES (1 detected)
────────────────────────────────────────────────────────────

1. Operation: Hash Join (customers JOIN orders)
   Estimated: 1000 rows | Actual: 150000 rows
   Error Rate: 14900.0% (150.0x underestimate)
   Likely Cause: Correlated predicates not in statistics
    Fix: Create extended statistics

4. Optimization Suggestions

Provides: Actionable recommendations with implementation steps

Categories:

Indexing: Create, modify, or drop indexes
Statistics: Update or create statistics
Schema: Schema modifications
Configuration: Parameter tuning
QueryRewrite: Query restructuring

Priority Levels: Critical > High > Medium > Low

Effort Estimates:

Trivial: < 5 minutes
🔨 Low: 5-30 minutes
Medium: 30 min - 2 hours
🏗 High: 2-8 hours
🏭 VeryHigh: > 8 hours

Example Output:

OPTIMIZATION SUGGESTIONS (1 recommendations)
────────────────────────────────────────────────────────────

1. Create covering index for frequent query
   Category: Indexing | Priority: High | Effort: Low
   Description: Query executed 50K times/day
    Expected Benefit: 90.0% improvement (1000.00 -> 100.00)
   Implementation:
     • CREATE INDEX idx_users_status ON users(status) INCLUDE (name, email);
     • ANALYZE users;
     • Test with EXPLAIN ANALYZE

Integration with EXPLAIN

use heliosdb_compute::optimizer::ExplainOutput;

// Create EXPLAIN output
let mut explain = ExplainOutput::new(plan, cost, rows);

// Run why_not analysis
let analysis = analyzer.analyze(&plan, &predicates);

// Attach to EXPLAIN
explain.set_why_not_analysis(analysis);

// Print complete output
println!("{}", explain.to_text());

Output:

EXPLAIN PLAN (Enhanced)
Estimated Cost: 1000.00
Estimated Rows: 50000

[... plan details ...]

════════════════════════════════════════════════════════════
WHY NOT ANALYSIS
Confidence: 100.0%
Total Issues: 3

[... detailed analysis ...]

Common Patterns

Pattern 1: Missing Index

// Detected: TableScan with predicates, no indexes
// Suggestion: CREATE INDEX idx_table_col ON table(column);
// Benefit: 10-100x improvement

Pattern 2: Sort Elimination

// Detected: Sort -> TableScan
// Suggestion: CREATE INDEX ON table(order_by_column);
// Benefit: Eliminates expensive sort operation

Pattern 3: Covering Index

// Detected: HashAggregate -> TableScan
// Suggestion: CREATE INDEX ON table(group_by) INCLUDE (agg_columns);
// Benefit: Index-only scan, reduced I/O

Pattern 4: Hash Index Limitation

// Detected: Hash index with range predicates
// Suggestion: ALTER INDEX USING BTREE;
// Benefit: Enables range query support

API Reference

WhyNotAnalyzer

impl WhyNotAnalyzer {
    pub fn new() -> Self;
    pub fn register_index(&mut self, index: IndexMetadata);
    pub fn register_statistics(&mut self, table: String, stats: TableStatistics);
    pub fn analyze(&self, plan: &PhysicalPlan, predicates: &[Predicate]) -> WhyNotAnalysis;
}

WhyNotAnalysis

impl WhyNotAnalysis {
    pub fn new() -> Self;
    pub fn total_issues(&self) -> usize;
    pub fn has_critical_issues(&self) -> bool;
    pub fn to_text(&self) -> String;
    pub fn add_index_reason(&mut self, reason: IndexNotUsedReason);
    pub fn add_statistics_warning(&mut self, warning: StatisticsWarning);
    pub fn add_cardinality_mismatch(&mut self, mismatch: CardinalityMismatch);
    pub fn add_suggestion(&mut self, suggestion: OptimizationSuggestion);
}

Key Types

pub enum Severity {
    Low,
    Medium,
    High,
    Critical,
}

pub enum SuggestionCategory {
    Indexing,
    Statistics,
    Schema,
    Configuration,
    QueryRewrite,
}

pub enum Priority {
    Low,
    Medium,
    High,
    Critical,
}

pub enum Effort {
    Trivial,  // < 5 min
    Low,      // 5-30 min
    Medium,   // 30 min - 2 hours
    High,     // 2-8 hours
    VeryHigh, // > 8 hours
}

Checking Results

let analysis = analyzer.analyze(&plan, &predicates);

// Check for critical issues
if analysis.has_critical_issues() {
    eprintln!("⚠  Critical issues detected!");
}

// Count issues
println!("Total issues found: {}", analysis.total_issues());

// Iterate over findings
for reason in &analysis.index_analysis {
    if reason.severity >= Severity::High {
        println!("High priority: {}", reason.reason);
    }
}

// Get suggestions by priority
let high_priority: Vec<_> = analysis.suggestions
    .iter()
    .filter(|s| s.priority >= Priority::High)
    .collect();

Best Practices

1. Register All Available Indexes

for index in database.get_indexes() {
    analyzer.register_index(index);
}

2. Keep Statistics Current

analyzer.register_statistics(
    table_name,
    WhyNotTableStatistics {
        last_analyzed: get_last_analyze_time(),
        days_since_analysis: calculate_days(),
        row_count: get_row_count(),
        percent_modified: calculate_modifications(),
    },
);

3. Act on High-Priority Suggestions

for suggestion in analysis.suggestions {
    if suggestion.priority >= Priority::High && suggestion.effort <= Effort::Low {
        // Quick wins - implement immediately
        execute_suggestion(suggestion);
    }
}

4. Monitor Statistics Freshness

for warning in analysis.statistics_warnings {
    if warning.days_since_analysis > 30 {
        // Schedule ANALYZE job
        schedule_analyze(warning.table_name);
    }
}

Troubleshooting

Issue: No suggestions generated

Cause: Plan is already optimal or analyzer lacks metadata Fix: Register indexes and statistics

Issue: Too many false positives

Cause: Statistics not registered Fix: Provide accurate statistics metadata

Issue: Missing cost impact

Cause: Cost model not calibrated Fix: Update cost model parameters

Performance Tips

Analysis Overhead: < 1ms for most queries
Memory Usage: ~1KB per finding
Scalability: Handles 100+ node plans efficiently

Example Workflow

// 1. Setup analyzer
let mut analyzer = WhyNotAnalyzer::new();

// 2. Load metadata
for index in load_indexes() {
    analyzer.register_index(index);
}
for (table, stats) in load_statistics() {
    analyzer.register_statistics(table, stats);
}

// 3. Analyze query
let plan = optimizer.create_plan(query)?;
let analysis = analyzer.analyze(&plan, &predicates);

// 4. Review findings
if analysis.has_critical_issues() {
    log::error!("Critical optimization issues detected");
}

// 5. Generate report
let report = analysis.to_text();
send_to_dba(report);

// 6. Apply suggestions
for suggestion in analysis.suggestions {
    if suggestion.priority == Priority::Critical {
        apply_suggestion_immediately(suggestion);
    }
}

Testing

# Run all tests
cargo test --package heliosdb-compute why_not_analysis_tests

# Run with output
cargo test --package heliosdb-compute why_not_analysis_tests -- --nocapture

# Run specific test
cargo test --package heliosdb-compute test_why_not_analysis_basic

Why Not Analysis - Quick Reference Guide

Why Not Analysis - Quick Reference Guide

Overview

Quick Start

Analysis Categories

1. Index Usage Analysis

2. Statistics Freshness

3. Cardinality Mismatches

4. Optimization Suggestions

Integration with EXPLAIN

Common Patterns

Pattern 1: Missing Index

Pattern 2: Sort Elimination

Pattern 3: Covering Index

Pattern 4: Hash Index Limitation

API Reference

WhyNotAnalyzer

WhyNotAnalysis

Key Types

Checking Results

Best Practices

1. Register All Available Indexes

2. Keep Statistics Current

3. Act on High-Priority Suggestions

4. Monitor Statistics Freshness

Troubleshooting

Issue: No suggestions generated

Issue: Too many false positives

Issue: Missing cost impact

Performance Tips

Example Workflow

Testing

Further Reading