Workload Optimizer Guide
Workload Optimizer Guide
Status: Production Ready
Version: v6.0 Phase 2 M1
Feature: F5.1.4 Workload-Aware Query Optimization
Overview
Complete guide to using HeliosDB’s workload-aware query optimization with pattern recognition and similarity matching.
Quick Start
use heliosdb_workload::{PatternAnalyzer, PatternAnalyzerConfig};
// Initialize analyzer with default configlet config = PatternAnalyzerConfig::default();let mut analyzer = PatternAnalyzer::new(config);
// Analyze a querylet pattern = analyzer.analyze_query( "SELECT name, email FROM users WHERE age > 25 AND country = 'US'")?;
// Find similar historical querieslet similar = analyzer.find_similar_patterns(&pattern);
// Estimate query cost based on historical datalet cost = analyzer.estimate_cost(&pattern);println!("Estimated cost: {:.2}", cost);Pattern Types
1. Select Pattern
Simple SELECT queries with filters:
SELECT * FROM users WHERE age > 252. Join Pattern
Multi-table JOIN operations:
SELECT u.*, o.totalFROM users uJOIN orders o ON u.id = o.user_id3. Aggregate Pattern
GROUP BY with aggregations:
SELECT country, COUNT(*), AVG(age)FROM usersGROUP BY country4. Mutation Pattern
INSERT/UPDATE/DELETE operations:
UPDATE users SET status = 'active' WHERE id = 1235. Complex Pattern
Subqueries, CTEs, window functions:
WITH active_users AS ( SELECT * FROM users WHERE status = 'active')SELECT *, ROW_NUMBER() OVER (PARTITION BY country ORDER BY created_at)FROM active_usersConfiguration
Basic Configuration
let config = PatternAnalyzerConfig { max_patterns: 10_000, // Maximum patterns to cache similarity_threshold: 0.8, // Similarity matching threshold enable_cost_estimation: true, // Enable cost estimation enable_statistics: true, // Track execution statistics};Advanced Configuration
let config = PatternAnalyzerConfig { max_patterns: 50_000, // Large cache for high-traffic systems similarity_threshold: 0.85, // Higher threshold = more selective matching enable_cost_estimation: true, enable_statistics: true, eviction_policy: EvictionPolicy::LRU, update_interval_ms: 1000, // Update statistics every second};Similarity Matching
The analyzer uses a configurable similarity threshold (0.0 - 1.0):
- 0.8 (default): Moderate similarity required
- 0.9: High similarity (stricter matching)
- 0.7: Lower similarity (more matches)
// Find patterns similar to current querylet similar_patterns = analyzer.find_similar_patterns(&pattern);
for similar in similar_patterns { println!("Similar query: {} (similarity: {:.2})", similar.sql, similar.similarity_score);}Cost Estimation
Estimate query cost based on historical execution:
// Get cost estimatelet cost = analyzer.estimate_cost(&pattern);
// Cost includes:// - Average execution time (ms)// - Average rows scanned// - Average memory usage (bytes)
println!("Expected execution time: {:.2}ms", cost.avg_time_ms);println!("Expected rows scanned: {}", cost.avg_rows_scanned);println!("Expected memory: {} bytes", cost.avg_memory_bytes);Integration with Query Optimizer
use heliosdb_compute::QueryOptimizer;use heliosdb_workload::PatternAnalyzer;
// Initialize optimizer with pattern analyzerlet mut optimizer = QueryOptimizer::new();let mut analyzer = PatternAnalyzer::new(config);
// Analyze query patternlet pattern = analyzer.analyze_query(&sql)?;
// Check for similar historical queriesif let Some(cached_plan) = analyzer.find_cached_plan(&pattern) { // Reuse cached query plan return Ok(cached_plan);}
// Generate new plan with cost estimationlet estimated_cost = analyzer.estimate_cost(&pattern);let plan = optimizer.generate_plan(&sql, estimated_cost)?;
// Cache the plan for future useanalyzer.cache_plan(&pattern, &plan)?;Performance Monitoring
Track pattern analyzer performance:
// Get analyzer statisticslet stats = analyzer.get_statistics();
println!("Total patterns: {}", stats.total_patterns);println!("Cache hit rate: {:.2}%", stats.cache_hit_rate * 100.0);println!("Average similarity score: {:.2}", stats.avg_similarity);println!("Average cost estimation accuracy: {:.2}%", stats.cost_estimation_accuracy * 100.0);Best Practices
1. Cache Size Tuning
Choose max_patterns based on workload:
- Low traffic (<100 QPS): 1,000-5,000 patterns
- Medium traffic (100-1,000 QPS): 10,000-20,000 patterns
- High traffic (>1,000 QPS): 50,000+ patterns
2. Similarity Threshold
Adjust based on query diversity:
- High diversity (many unique queries): Lower threshold (0.7-0.75)
- Medium diversity: Default threshold (0.8)
- Low diversity (repetitive queries): Higher threshold (0.85-0.9)
3. Statistics Update Frequency
Balance accuracy vs performance:
- Real-time systems: 100-500ms updates
- Standard systems: 1,000ms updates (default)
- Batch systems: 5,000-10,000ms updates
TPC-H Validation
The pattern analyzer has been validated with 16 TPC-H queries:
# Run TPC-H validation testscargo test tpch_validation --features tpchAll 16 TPC-H queries are correctly classified and optimized.
SQL Parser Integration
Enhanced SQL parser with AST-based analysis:
// Parse SQL and extract tableslet parsed = analyzer.parse_sql(&sql)?;
println!("Query type: {:?}", parsed.query_type);println!("Tables: {:?}", parsed.tables);println!("Predicates: {:?}", parsed.predicates);Related Documentation
Troubleshooting
Low Cache Hit Rate
If cache hit rate is <50%:
- Increase
max_patterns - Lower
similarity_threshold - Verify query parameterization
Inaccurate Cost Estimation
If cost estimates are off by >20%:
- Wait for more historical data (100+ executions per pattern)
- Verify statistics are being updated
- Check for outliers in execution times
High Memory Usage
If analyzer uses too much memory:
- Reduce
max_patterns - Decrease
update_interval_ms - Enable more aggressive LRU eviction
Support
For issues or questions:
- GitHub: https://github.com/heliosdb/heliosdb
- Documentation: USER_GUIDE_INDEX.md
- Examples: heliosdb-workload/examples/
Last Updated: November 1, 2025
Version: 1.0
Status: Production Ready