Skip to content

HeliosDB Query Optimization Troubleshooting Guide

HeliosDB Query Optimization Troubleshooting Guide

Version: 1.0 Last Updated: 2025-11-30 Status: Complete


Table of Contents

  1. Overview
  2. Diagnosing Slow Queries
  3. Query Execution Plans
  4. Optimization Techniques
  5. Index Strategies
  6. Join Optimization
  7. Aggregation Optimization
  8. Subquery Optimization
  9. Advanced Troubleshooting
  10. Performance Monitoring

Overview

Query optimization is critical for database performance. HeliosDB includes 15+ optimization crates with neural network, quantum-inspired, and machine learning-based query planning.

Query Optimization Strategy

1. Measure (Identify slow queries)
2. Analyze (Understand execution plan)
3. Diagnose (Find bottlenecks)
4. Optimize (Apply techniques)
5. Verify (Confirm improvements)
6. Monitor (Track performance)

Diagnosing Slow Queries

Step 1: Identify Slow Queries

-- Query statistics view
SELECT
query_id,
query_text,
execution_count,
total_time_ms,
avg_time_ms,
max_time_ms,
rows_returned
FROM query_statistics
ORDER BY total_time_ms DESC
LIMIT 20;

Step 2: Measure Query Performance

-- Run query and get timing
\timing on
SELECT * FROM orders
WHERE customer_id = 123
AND order_date > '2025-01-01';
-- Result: Execution time: 2543.456 ms

Step 3: Enable Query Profiling

-- Get execution plan and statistics
EXPLAIN ANALYZE
SELECT o.order_id, o.amount, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2025-01-01';
/*
Output:
Nested Loop (cost=0.29..35.50 rows=1 width=40)
→ Seq Scan on customers c (cost=0.00..10.00 rows=1)
Filter: (id = 123)
→ Index Scan on orders o (cost=0.29..25.50 rows=1)
Index Cond: (customer_id = 123)
Filter: (order_date > '2025-01-01')
*/

Query Execution Plans

Understanding Plan Output

Operation (cost=startup..total rows=estimated width=bytes)
├─ Seq Scan: Full table scan (slowest for large tables)
├─ Index Scan: Using index (faster when available)
├─ Nested Loop: Simple join (slow for large sets)
├─ Hash Join: Memory-based join (faster for large joins)
├─ Merge Join: Sorted join (good for ordered data)
└─ Aggregate: GROUP BY, SUM, COUNT (may be expensive)

Common Performance Issues in Plans

IssueSignSolution
Missing IndexSeq Scan on large tableCreate index
Wrong Join TypeNested Loop with large rowsUse Hash/Merge Join
Filter After JoinFilter appears after joinMove to WHERE clause
Cartesian Productrows >> than expectedCheck JOIN conditions
N+1 ProblemMultiple Seq Scans in loopUse single join query

Optimization Techniques

Technique 1: Create Appropriate Indexes

-- Single column index
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Multi-column index (composite)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Partial index (only certain rows)
CREATE INDEX idx_active_orders ON orders(customer_id)
WHERE status = 'ACTIVE';
-- Covering index (includes all columns needed)
CREATE INDEX idx_orders_covering ON orders(customer_id, order_date)
INCLUDE (amount, status);

Technique 2: Rewrite Queries

-- ❌ BAD: Correlated subquery (N+1 problem)
SELECT c.name, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id)
FROM customers c;
-- GOOD: Use JOIN with GROUP BY
SELECT c.name, COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;

Technique 3: Use Query Hints

-- Suggest index to use
SELECT /*+ INDEX(orders idx_orders_customer_date) */
order_id, amount
FROM orders
WHERE customer_id = 123
AND order_date > '2025-01-01';
-- Suggest join order
SELECT /*+ LEADING(o c) */
o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Parallel execution hint
SELECT /*+ PARALLEL(4) */
SUM(amount)
FROM orders;

Technique 4: Limit Result Sets

-- ❌ BAD: Retrieve all rows then filter in application
SELECT * FROM orders; -- Returns 1 million rows
-- GOOD: Filter in database
SELECT order_id, amount
FROM orders
WHERE order_date > '2025-01-01'
LIMIT 100;

Technique 5: Use Appropriate Data Types

-- ❌ BAD: Store numbers as strings
CREATE TABLE orders (order_id VARCHAR(10), amount VARCHAR(20));
-- GOOD: Use numeric types
CREATE TABLE orders (order_id INTEGER, amount DECIMAL(10, 2));

Index Strategies

When to Create Indexes

-- Index on:
-- 1. Column in WHERE clause
CREATE INDEX idx_status ON orders(status);
-- 2. Column in JOIN condition
CREATE INDEX idx_customer_id ON orders(customer_id);
-- 3. Column in ORDER BY
CREATE INDEX idx_created_date ON orders(created_date DESC);
-- 4. Low cardinality + frequent filters (partial index)
CREATE INDEX idx_active ON orders(status) WHERE status = 'ACTIVE';

When NOT to Create Indexes

-- ❌ Don't index:
-- 1. High cardinality column with low selectivity
CREATE INDEX idx_uuid ON orders(unique_id); -- Every value different
-- 2. Very small tables
CREATE INDEX idx_config ON config(key); -- Only 10 rows
-- 3. Frequently updated columns
CREATE INDEX idx_modified ON orders(last_modified); -- Updates change index
-- 4. Boolean columns (use partial index instead)
CREATE INDEX idx_is_active ON orders(is_active)
WHERE is_active = true; -- Better than full index

Composite Index Ordering

-- Optimal: Equality columns first, then range columns
-- GOOD: Supports WHERE customer_id = ? AND order_date > ?
CREATE INDEX idx_good ON orders(customer_id, order_date);
-- ❌ SUBOPTIMAL: Wrong order reduces effectiveness
CREATE INDEX idx_bad ON orders(order_date, customer_id);

Maintenance

-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Remove unused indexes
DROP INDEX idx_unused;
-- Rebuild fragmented index
REINDEX INDEX idx_orders_customer;
-- Update statistics
ANALYZE orders;

Join Optimization

Join Type Selection

-- INNER JOIN: Returns only matching rows (fastest)
SELECT o.order_id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
-- LEFT JOIN: Keeps all left table rows
SELECT o.order_id, c.name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id;
-- Natural join (avoid in complex queries)
SELECT *
FROM orders NATURAL JOIN customers; -- Uses common columns

Multi-Table Joins

-- Optimize with proper index placement
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_products_id ON products(id);
-- Good query: Filters early
SELECT c.name, COUNT(oi.id) as item_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE c.status = 'ACTIVE'
AND o.order_date > '2025-01-01'
GROUP BY c.name;

Join Hints for Complex Queries

-- Specify join order
SELECT /*+ LEADING(c o oi) */
c.name, COUNT(*) as count
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id;
-- Specify join algorithm
SELECT /*+ USE_HASH(o c) */ -- Hash join for o and c
o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

Aggregation Optimization

GROUP BY Optimization

-- ❌ BAD: GROUP BY after large join
SELECT c.name, COUNT(oi.id)
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.name; -- Too much data grouped
-- GOOD: Filter before grouping
SELECT c.name, COUNT(oi.id)
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE o.order_date > '2025-01-01' -- Filter early
GROUP BY c.name;

Using Approximate Aggregates

-- Fast approximate count
SELECT approx_count(*) FROM orders; -- Uses HyperLogLog
-- Approximate percentiles
SELECT approx_percentile(amount, 0.95) -- 95th percentile
FROM orders;

Pre-Aggregated Tables

-- Create materialized view for common aggregations
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
DATE(order_date) as sale_date,
COUNT(*) as order_count,
SUM(amount) as total_amount,
AVG(amount) as avg_amount
FROM orders
GROUP BY DATE(order_date);
-- Refresh periodically
REFRESH MATERIALIZED VIEW daily_sales_summary;
-- Use in queries
SELECT * FROM daily_sales_summary
WHERE sale_date >= '2025-01-01';

Subquery Optimization

Rewrite Subqueries as Joins

-- ❌ BAD: Correlated subquery
SELECT o.order_id, o.amount
FROM orders o
WHERE o.amount > (
SELECT AVG(amount)
FROM orders
WHERE customer_id = o.customer_id
);
-- GOOD: Use window function
SELECT order_id, amount
FROM (
SELECT
order_id,
amount,
AVG(amount) OVER (PARTITION BY customer_id) as avg_amount
FROM orders
) t
WHERE amount > avg_amount;

IN vs EXISTS

-- Use EXISTS (more efficient for large subquery)
SELECT c.name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
AND o.order_date > '2025-01-01'
);
-- Instead of IN (less efficient)
SELECT c.name
FROM customers c
WHERE c.id IN (
SELECT DISTINCT customer_id FROM orders
WHERE order_date > '2025-01-01'
);

Advanced Troubleshooting

Using EXPLAIN ANALYZE Output

-- Full analysis with actual execution
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders WHERE customer_id = 123;
/*
Key metrics to examine:
- Actual Rows vs Estimated Rows (bad estimates = bad planning)
- Buffers: Shared hits (cached) vs reads (disk)
- Execution time (first row vs total)
- Filter ratio (rows filtered vs total)
*/

Optimizer Statistics Issues

-- Update table statistics
ANALYZE orders;
-- Rebuild statistics
ALTER TABLE orders ANALYZE;
-- Check statistics
SELECT
schemaname,
tablename,
n_live_tup as row_count,
n_dead_tup as dead_rows,
last_vacuum,
last_analyze
FROM pg_stat_user_tables
WHERE tablename = 'orders';

Optimizer Hints System

-- Suggest specific optimizer settings
SET optimizer = 'neural_planner'; -- Use neural network planner
SET optimizer = 'quantum_optimizer'; -- Use quantum-inspired optimizer
SET optimizer = 'cost_based'; -- Traditional cost-based
-- For specific query
SELECT /*+ OPTIMIZER(neural_planner) */ ...

Parameter Tuning

-- Adjust optimizer settings for specific queries
SET random_page_cost = 1.0; -- Assume fast disk I/O
SET cpu_tuple_cost = 0.01; -- Increase CPU cost weight
SET work_mem = '256MB'; -- More memory for sorts/hashes
SET max_parallel_workers = 4; -- More parallel workers

Performance Monitoring

Real-Time Monitoring

-- Monitor running queries
SELECT
pid,
usename,
query,
query_start,
EXTRACT(EPOCH FROM (NOW() - query_start)) as seconds
FROM pg_stat_activity
WHERE query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start DESC;

Query Performance History

-- Track performance over time
SELECT
query_id,
query_text,
execution_count,
total_time_ms,
avg_time_ms,
CURRENT_TIMESTAMP as measured_at
FROM query_performance_history
WHERE query_id = '...'
ORDER BY measured_at DESC;

Automated Optimization Advisor

-- Get optimization recommendations
SELECT * FROM query_optimization_advisor(
query='SELECT ...',
sample_size=1000
);
-- Returns suggestions like:
-- 1. Add index on customer_id
-- 2. Rewrite correlated subquery
-- 3. Increase work_mem for better performance

Summary

Effective query optimization requires:

  1. Measurement - Use EXPLAIN ANALYZE to understand execution
  2. Analysis - Identify bottlenecks (missing indexes, poor joins)
  3. Optimization - Apply techniques (indexes, rewrites, hints)
  4. Verification - Confirm improvements with benchmarks
  5. Monitoring - Track performance over time

HeliosDB provides 15+ optimization techniques including neural network planning and quantum-inspired optimization for maximum performance.


Related Documentation: