Skip to content

Workload Optimizer Guide

Workload Optimizer Guide

Status: Production Ready
Version: v6.0 Phase 2 M1
Feature: F5.1.4 Workload-Aware Query Optimization

Overview

Complete guide to using HeliosDB’s workload-aware query optimization with pattern recognition and similarity matching.

Quick Start

use heliosdb_workload::{PatternAnalyzer, PatternAnalyzerConfig};
// Initialize analyzer with default config
let config = PatternAnalyzerConfig::default();
let mut analyzer = PatternAnalyzer::new(config);
// Analyze a query
let pattern = analyzer.analyze_query(
"SELECT name, email FROM users WHERE age > 25 AND country = 'US'"
)?;
// Find similar historical queries
let similar = analyzer.find_similar_patterns(&pattern);
// Estimate query cost based on historical data
let cost = analyzer.estimate_cost(&pattern);
println!("Estimated cost: {:.2}", cost);

Pattern Types

1. Select Pattern

Simple SELECT queries with filters:

SELECT * FROM users WHERE age > 25

2. Join Pattern

Multi-table JOIN operations:

SELECT u.*, o.total
FROM users u
JOIN orders o ON u.id = o.user_id

3. Aggregate Pattern

GROUP BY with aggregations:

SELECT country, COUNT(*), AVG(age)
FROM users
GROUP BY country

4. Mutation Pattern

INSERT/UPDATE/DELETE operations:

UPDATE users SET status = 'active' WHERE id = 123

5. Complex Pattern

Subqueries, CTEs, window functions:

WITH active_users AS (
SELECT * FROM users WHERE status = 'active'
)
SELECT *, ROW_NUMBER() OVER (PARTITION BY country ORDER BY created_at)
FROM active_users

Configuration

Basic Configuration

let config = PatternAnalyzerConfig {
max_patterns: 10_000, // Maximum patterns to cache
similarity_threshold: 0.8, // Similarity matching threshold
enable_cost_estimation: true, // Enable cost estimation
enable_statistics: true, // Track execution statistics
};

Advanced Configuration

let config = PatternAnalyzerConfig {
max_patterns: 50_000, // Large cache for high-traffic systems
similarity_threshold: 0.85, // Higher threshold = more selective matching
enable_cost_estimation: true,
enable_statistics: true,
eviction_policy: EvictionPolicy::LRU,
update_interval_ms: 1000, // Update statistics every second
};

Similarity Matching

The analyzer uses a configurable similarity threshold (0.0 - 1.0):

  • 0.8 (default): Moderate similarity required
  • 0.9: High similarity (stricter matching)
  • 0.7: Lower similarity (more matches)
// Find patterns similar to current query
let similar_patterns = analyzer.find_similar_patterns(&pattern);
for similar in similar_patterns {
println!("Similar query: {} (similarity: {:.2})",
similar.sql, similar.similarity_score);
}

Cost Estimation

Estimate query cost based on historical execution:

// Get cost estimate
let cost = analyzer.estimate_cost(&pattern);
// Cost includes:
// - Average execution time (ms)
// - Average rows scanned
// - Average memory usage (bytes)
println!("Expected execution time: {:.2}ms", cost.avg_time_ms);
println!("Expected rows scanned: {}", cost.avg_rows_scanned);
println!("Expected memory: {} bytes", cost.avg_memory_bytes);

Integration with Query Optimizer

use heliosdb_compute::QueryOptimizer;
use heliosdb_workload::PatternAnalyzer;
// Initialize optimizer with pattern analyzer
let mut optimizer = QueryOptimizer::new();
let mut analyzer = PatternAnalyzer::new(config);
// Analyze query pattern
let pattern = analyzer.analyze_query(&sql)?;
// Check for similar historical queries
if let Some(cached_plan) = analyzer.find_cached_plan(&pattern) {
// Reuse cached query plan
return Ok(cached_plan);
}
// Generate new plan with cost estimation
let estimated_cost = analyzer.estimate_cost(&pattern);
let plan = optimizer.generate_plan(&sql, estimated_cost)?;
// Cache the plan for future use
analyzer.cache_plan(&pattern, &plan)?;

Performance Monitoring

Track pattern analyzer performance:

// Get analyzer statistics
let stats = analyzer.get_statistics();
println!("Total patterns: {}", stats.total_patterns);
println!("Cache hit rate: {:.2}%", stats.cache_hit_rate * 100.0);
println!("Average similarity score: {:.2}", stats.avg_similarity);
println!("Average cost estimation accuracy: {:.2}%",
stats.cost_estimation_accuracy * 100.0);

Best Practices

1. Cache Size Tuning

Choose max_patterns based on workload:

  • Low traffic (<100 QPS): 1,000-5,000 patterns
  • Medium traffic (100-1,000 QPS): 10,000-20,000 patterns
  • High traffic (>1,000 QPS): 50,000+ patterns

2. Similarity Threshold

Adjust based on query diversity:

  • High diversity (many unique queries): Lower threshold (0.7-0.75)
  • Medium diversity: Default threshold (0.8)
  • Low diversity (repetitive queries): Higher threshold (0.85-0.9)

3. Statistics Update Frequency

Balance accuracy vs performance:

  • Real-time systems: 100-500ms updates
  • Standard systems: 1,000ms updates (default)
  • Batch systems: 5,000-10,000ms updates

TPC-H Validation

The pattern analyzer has been validated with 16 TPC-H queries:

Terminal window
# Run TPC-H validation tests
cargo test tpch_validation --features tpch

All 16 TPC-H queries are correctly classified and optimized.

SQL Parser Integration

Enhanced SQL parser with AST-based analysis:

// Parse SQL and extract tables
let parsed = analyzer.parse_sql(&sql)?;
println!("Query type: {:?}", parsed.query_type);
println!("Tables: {:?}", parsed.tables);
println!("Predicates: {:?}", parsed.predicates);

Troubleshooting

Low Cache Hit Rate

If cache hit rate is <50%:

  1. Increase max_patterns
  2. Lower similarity_threshold
  3. Verify query parameterization

Inaccurate Cost Estimation

If cost estimates are off by >20%:

  1. Wait for more historical data (100+ executions per pattern)
  2. Verify statistics are being updated
  3. Check for outliers in execution times

High Memory Usage

If analyzer uses too much memory:

  1. Reduce max_patterns
  2. Decrease update_interval_ms
  3. Enable more aggressive LRU eviction

Support

For issues or questions:


Last Updated: November 1, 2025
Version: 1.0
Status: Production Ready