HeliosDB Query Optimization Troubleshooting Guide
HeliosDB Query Optimization Troubleshooting Guide
Version: 1.0 Last Updated: 2025-11-30 Status: Complete
Table of Contents
- Overview
- Diagnosing Slow Queries
- Query Execution Plans
- Optimization Techniques
- Index Strategies
- Join Optimization
- Aggregation Optimization
- Subquery Optimization
- Advanced Troubleshooting
- Performance Monitoring
Overview
Query optimization is critical for database performance. HeliosDB includes 15+ optimization crates with neural network, quantum-inspired, and machine learning-based query planning.
Query Optimization Strategy
1. Measure (Identify slow queries) ↓2. Analyze (Understand execution plan) ↓3. Diagnose (Find bottlenecks) ↓4. Optimize (Apply techniques) ↓5. Verify (Confirm improvements) ↓6. Monitor (Track performance)Diagnosing Slow Queries
Step 1: Identify Slow Queries
-- Query statistics viewSELECT query_id, query_text, execution_count, total_time_ms, avg_time_ms, max_time_ms, rows_returnedFROM query_statisticsORDER BY total_time_ms DESCLIMIT 20;Step 2: Measure Query Performance
-- Run query and get timing\timing on
SELECT * FROM ordersWHERE customer_id = 123AND order_date > '2025-01-01';
-- Result: Execution time: 2543.456 msStep 3: Enable Query Profiling
-- Get execution plan and statisticsEXPLAIN ANALYZESELECT o.order_id, o.amount, c.customer_nameFROM orders oJOIN customers c ON o.customer_id = c.idWHERE o.order_date > '2025-01-01';
/*Output:Nested Loop (cost=0.29..35.50 rows=1 width=40) → Seq Scan on customers c (cost=0.00..10.00 rows=1) Filter: (id = 123) → Index Scan on orders o (cost=0.29..25.50 rows=1) Index Cond: (customer_id = 123) Filter: (order_date > '2025-01-01')*/Query Execution Plans
Understanding Plan Output
Operation (cost=startup..total rows=estimated width=bytes)├─ Seq Scan: Full table scan (slowest for large tables)├─ Index Scan: Using index (faster when available)├─ Nested Loop: Simple join (slow for large sets)├─ Hash Join: Memory-based join (faster for large joins)├─ Merge Join: Sorted join (good for ordered data)└─ Aggregate: GROUP BY, SUM, COUNT (may be expensive)Common Performance Issues in Plans
| Issue | Sign | Solution |
|---|---|---|
| Missing Index | Seq Scan on large table | Create index |
| Wrong Join Type | Nested Loop with large rows | Use Hash/Merge Join |
| Filter After Join | Filter appears after join | Move to WHERE clause |
| Cartesian Product | rows >> than expected | Check JOIN conditions |
| N+1 Problem | Multiple Seq Scans in loop | Use single join query |
Optimization Techniques
Technique 1: Create Appropriate Indexes
-- Single column indexCREATE INDEX idx_orders_customer ON orders(customer_id);
-- Multi-column index (composite)CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Partial index (only certain rows)CREATE INDEX idx_active_orders ON orders(customer_id)WHERE status = 'ACTIVE';
-- Covering index (includes all columns needed)CREATE INDEX idx_orders_covering ON orders(customer_id, order_date)INCLUDE (amount, status);Technique 2: Rewrite Queries
-- ❌ BAD: Correlated subquery (N+1 problem)SELECT c.name, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id)FROM customers c;
-- GOOD: Use JOIN with GROUP BYSELECT c.name, COUNT(o.order_id) as order_countFROM customers cLEFT JOIN orders o ON c.id = o.customer_idGROUP BY c.name;Technique 3: Use Query Hints
-- Suggest index to useSELECT /*+ INDEX(orders idx_orders_customer_date) */ order_id, amountFROM ordersWHERE customer_id = 123AND order_date > '2025-01-01';
-- Suggest join orderSELECT /*+ LEADING(o c) */ o.order_id, c.nameFROM orders oJOIN customers c ON o.customer_id = c.id;
-- Parallel execution hintSELECT /*+ PARALLEL(4) */ SUM(amount)FROM orders;Technique 4: Limit Result Sets
-- ❌ BAD: Retrieve all rows then filter in applicationSELECT * FROM orders; -- Returns 1 million rows
-- GOOD: Filter in databaseSELECT order_id, amountFROM ordersWHERE order_date > '2025-01-01'LIMIT 100;Technique 5: Use Appropriate Data Types
-- ❌ BAD: Store numbers as stringsCREATE TABLE orders (order_id VARCHAR(10), amount VARCHAR(20));
-- GOOD: Use numeric typesCREATE TABLE orders (order_id INTEGER, amount DECIMAL(10, 2));Index Strategies
When to Create Indexes
-- Index on:-- 1. Column in WHERE clauseCREATE INDEX idx_status ON orders(status);
-- 2. Column in JOIN conditionCREATE INDEX idx_customer_id ON orders(customer_id);
-- 3. Column in ORDER BYCREATE INDEX idx_created_date ON orders(created_date DESC);
-- 4. Low cardinality + frequent filters (partial index)CREATE INDEX idx_active ON orders(status) WHERE status = 'ACTIVE';When NOT to Create Indexes
-- ❌ Don't index:-- 1. High cardinality column with low selectivityCREATE INDEX idx_uuid ON orders(unique_id); -- Every value different
-- 2. Very small tablesCREATE INDEX idx_config ON config(key); -- Only 10 rows
-- 3. Frequently updated columnsCREATE INDEX idx_modified ON orders(last_modified); -- Updates change index
-- 4. Boolean columns (use partial index instead)CREATE INDEX idx_is_active ON orders(is_active)WHERE is_active = true; -- Better than full indexComposite Index Ordering
-- Optimal: Equality columns first, then range columns-- GOOD: Supports WHERE customer_id = ? AND order_date > ?CREATE INDEX idx_good ON orders(customer_id, order_date);
-- ❌ SUBOPTIMAL: Wrong order reduces effectivenessCREATE INDEX idx_bad ON orders(order_date, customer_id);Maintenance
-- Check index usageSELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetchFROM pg_stat_user_indexesORDER BY idx_scan DESC;
-- Remove unused indexesDROP INDEX idx_unused;
-- Rebuild fragmented indexREINDEX INDEX idx_orders_customer;
-- Update statisticsANALYZE orders;Join Optimization
Join Type Selection
-- INNER JOIN: Returns only matching rows (fastest)SELECT o.order_id, c.nameFROM orders oINNER JOIN customers c ON o.customer_id = c.id;
-- LEFT JOIN: Keeps all left table rowsSELECT o.order_id, c.nameFROM orders oLEFT JOIN customers c ON o.customer_id = c.id;
-- Natural join (avoid in complex queries)SELECT *FROM orders NATURAL JOIN customers; -- Uses common columnsMulti-Table Joins
-- Optimize with proper index placementCREATE INDEX idx_orders_customer ON orders(customer_id);CREATE INDEX idx_order_items_order ON order_items(order_id);CREATE INDEX idx_products_id ON products(id);
-- Good query: Filters earlySELECT c.name, COUNT(oi.id) as item_countFROM customers cJOIN orders o ON c.id = o.customer_idJOIN order_items oi ON o.id = oi.order_idWHERE c.status = 'ACTIVE'AND o.order_date > '2025-01-01'GROUP BY c.name;Join Hints for Complex Queries
-- Specify join orderSELECT /*+ LEADING(c o oi) */ c.name, COUNT(*) as countFROM customers cJOIN orders o ON c.id = o.customer_idJOIN order_items oi ON o.id = oi.order_id;
-- Specify join algorithmSELECT /*+ USE_HASH(o c) */ -- Hash join for o and c o.order_id, c.nameFROM orders oJOIN customers c ON o.customer_id = c.id;Aggregation Optimization
GROUP BY Optimization
-- ❌ BAD: GROUP BY after large joinSELECT c.name, COUNT(oi.id)FROM customers cJOIN orders o ON c.id = o.customer_idJOIN order_items oi ON o.id = oi.order_idGROUP BY c.name; -- Too much data grouped
-- GOOD: Filter before groupingSELECT c.name, COUNT(oi.id)FROM customers cJOIN orders o ON c.id = o.customer_idJOIN order_items oi ON o.id = oi.order_idWHERE o.order_date > '2025-01-01' -- Filter earlyGROUP BY c.name;Using Approximate Aggregates
-- Fast approximate countSELECT approx_count(*) FROM orders; -- Uses HyperLogLog
-- Approximate percentilesSELECT approx_percentile(amount, 0.95) -- 95th percentileFROM orders;Pre-Aggregated Tables
-- Create materialized view for common aggregationsCREATE MATERIALIZED VIEW daily_sales_summary ASSELECT DATE(order_date) as sale_date, COUNT(*) as order_count, SUM(amount) as total_amount, AVG(amount) as avg_amountFROM ordersGROUP BY DATE(order_date);
-- Refresh periodicallyREFRESH MATERIALIZED VIEW daily_sales_summary;
-- Use in queriesSELECT * FROM daily_sales_summaryWHERE sale_date >= '2025-01-01';Subquery Optimization
Rewrite Subqueries as Joins
-- ❌ BAD: Correlated subquerySELECT o.order_id, o.amountFROM orders oWHERE o.amount > ( SELECT AVG(amount) FROM orders WHERE customer_id = o.customer_id);
-- GOOD: Use window functionSELECT order_id, amountFROM ( SELECT order_id, amount, AVG(amount) OVER (PARTITION BY customer_id) as avg_amount FROM orders) tWHERE amount > avg_amount;IN vs EXISTS
-- Use EXISTS (more efficient for large subquery)SELECT c.nameFROM customers cWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.order_date > '2025-01-01');
-- Instead of IN (less efficient)SELECT c.nameFROM customers cWHERE c.id IN ( SELECT DISTINCT customer_id FROM orders WHERE order_date > '2025-01-01');Advanced Troubleshooting
Using EXPLAIN ANALYZE Output
-- Full analysis with actual executionEXPLAIN (ANALYZE, BUFFERS, VERBOSE)SELECT * FROM orders WHERE customer_id = 123;
/*Key metrics to examine:- Actual Rows vs Estimated Rows (bad estimates = bad planning)- Buffers: Shared hits (cached) vs reads (disk)- Execution time (first row vs total)- Filter ratio (rows filtered vs total)*/Optimizer Statistics Issues
-- Update table statisticsANALYZE orders;
-- Rebuild statisticsALTER TABLE orders ANALYZE;
-- Check statisticsSELECT schemaname, tablename, n_live_tup as row_count, n_dead_tup as dead_rows, last_vacuum, last_analyzeFROM pg_stat_user_tablesWHERE tablename = 'orders';Optimizer Hints System
-- Suggest specific optimizer settingsSET optimizer = 'neural_planner'; -- Use neural network plannerSET optimizer = 'quantum_optimizer'; -- Use quantum-inspired optimizerSET optimizer = 'cost_based'; -- Traditional cost-based
-- For specific querySELECT /*+ OPTIMIZER(neural_planner) */ ...Parameter Tuning
-- Adjust optimizer settings for specific queriesSET random_page_cost = 1.0; -- Assume fast disk I/OSET cpu_tuple_cost = 0.01; -- Increase CPU cost weightSET work_mem = '256MB'; -- More memory for sorts/hashesSET max_parallel_workers = 4; -- More parallel workersPerformance Monitoring
Real-Time Monitoring
-- Monitor running queriesSELECT pid, usename, query, query_start, EXTRACT(EPOCH FROM (NOW() - query_start)) as secondsFROM pg_stat_activityWHERE query NOT LIKE '%pg_stat_activity%'ORDER BY query_start DESC;Query Performance History
-- Track performance over timeSELECT query_id, query_text, execution_count, total_time_ms, avg_time_ms, CURRENT_TIMESTAMP as measured_atFROM query_performance_historyWHERE query_id = '...'ORDER BY measured_at DESC;Automated Optimization Advisor
-- Get optimization recommendationsSELECT * FROM query_optimization_advisor( query='SELECT ...', sample_size=1000);
-- Returns suggestions like:-- 1. Add index on customer_id-- 2. Rewrite correlated subquery-- 3. Increase work_mem for better performanceSummary
Effective query optimization requires:
- Measurement - Use EXPLAIN ANALYZE to understand execution
- Analysis - Identify bottlenecks (missing indexes, poor joins)
- Optimization - Apply techniques (indexes, rewrites, hints)
- Verification - Confirm improvements with benchmarks
- Monitoring - Track performance over time
HeliosDB provides 15+ optimization techniques including neural network planning and quantum-inspired optimization for maximum performance.
Related Documentation: