HeliosDB Caching Troubleshooting Guide
HeliosDB Caching Troubleshooting Guide
Version: 7.0 Last Updated: 2025-01-04
Diagnostic Overview
This guide helps diagnose and resolve common caching issues in HeliosDB. For each problem, we provide symptoms, diagnostic steps, root causes, and solutions.
Quick Diagnostic Commands
-- Overall cache health summarySELECT * FROM cache_diagnostic_report;
-- Cache statisticsSELECT cache_hits, cache_misses, ROUND(hit_rate * 100, 2) as hit_rate_pct, eviction_count, invalidation_count, memory_used_mb, memory_max_mbFROM cache_statistics;
-- Recent cache eventsSELECT * FROM cache_event_logORDER BY event_time DESCLIMIT 50;
-- Cache entry detailsSELECT query_hash, table_names, entry_size_kb, access_count, last_accessed, ttl_remaining_secondsFROM cache_entriesORDER BY last_accessed DESCLIMIT 20;Problem: Low Cache Hit Rate
Symptoms
- Hit rate below 70-80%
- High query latency despite caching enabled
- Database CPU utilization higher than expected
Diagnostic Steps
-- Check current hit rate trendSELECT time_bucket('5 minutes', recorded_at) as period, SUM(hits) as total_hits, SUM(misses) as total_misses, ROUND(SUM(hits)::numeric / NULLIF(SUM(hits) + SUM(misses), 0) * 100, 2) as hit_rate_pctFROM cache_metricsWHERE recorded_at > NOW() - INTERVAL '1 hour'GROUP BY periodORDER BY period DESC;
-- Check cache size vs working setSELECT (SELECT SUM(entry_size_bytes) FROM cache_entries) / 1024 / 1024 as current_cache_mb, (SELECT setting_value::int FROM database_settings WHERE setting_name = 'cache_max_size_mb') as max_cache_mb, (SELECT COUNT(*) FROM cache_entries) as entry_count, (SELECT setting_value::int FROM database_settings WHERE setting_name = 'cache_max_entries') as max_entries;
-- Identify frequently missed queriesSELECT query_hash, query_text_preview, miss_count, last_miss_timeFROM cache_miss_logGROUP BY query_hash, query_text_previewORDER BY miss_count DESCLIMIT 20;Common Causes and Solutions
Cause 1: Cache Too Small
-- Check if cache is fullSELECT memory_used_mb, memory_max_mb, ROUND(memory_used_mb / memory_max_mb * 100, 2) as utilization_pctFROM cache_statistics;
-- Solution: Increase cache sizeALTER DATABASE SET CACHE_PARAMS ( max_size_mb = 4096 -- Double or triple current size);Cause 2: Working Set Exceeds Cache
-- Analyze query access patternsSELECT COUNT(DISTINCT query_hash) as unique_queries, SUM(avg_result_size_bytes) / 1024 / 1024 as estimated_working_set_mbFROM query_access_patternsWHERE last_accessed > NOW() - INTERVAL '1 hour';
-- Solution: Either increase cache or focus on hot queriesALTER DATABASE SET CACHE_PARAMS ( max_size_mb = 8192, min_access_count_before_cache = 3 -- Only cache after 3 accesses);Cause 3: Query Variations Preventing Hits
-- Check for similar but different queriesSELECT query_template, COUNT(*) as variation_count, SUM(cache_misses) as total_missesFROM query_normalization_analysisGROUP BY query_templateORDER BY variation_count DESCLIMIT 10;
-- Solution: Enable query normalizationALTER DATABASE SET CACHE_PARAMS ( statement_normalization = true);
-- Or use prepared statements consistentlyCause 4: Short TTL
-- Check TTL expirationsSELECT COUNT(*) as expired_count, AVG(original_ttl_seconds) as avg_ttlFROM cache_eviction_logWHERE eviction_reason = 'TTL_EXPIRED' AND eviction_time > NOW() - INTERVAL '1 hour';
-- Solution: Increase TTLALTER DATABASE SET CACHE_PARAMS ( ttl_seconds = 3600 -- Increase from default);Cause 5: High Invalidation Rate
-- Check invalidation frequencySELECT table_name, COUNT(*) as invalidation_count, SUM(affected_entries) as total_affectedFROM cache_invalidation_logWHERE invalidation_time > NOW() - INTERVAL '1 hour'GROUP BY table_nameORDER BY invalidation_count DESC;
-- Solution: For write-heavy tables, consider disabling cacheALTER TABLE high_write_table DISABLE CACHING;Problem: High Memory Usage
Symptoms
- Cache memory utilization > 95%
- Frequent evictions
- System memory pressure warnings
Diagnostic Steps
-- Check memory distributionSELECT cache_type, entries, size_bytes / 1024 / 1024 as size_mb, avg_entry_size_bytes / 1024 as avg_entry_kbFROM cache_memory_breakdown;
-- Find large cache entriesSELECT query_hash, query_text_preview, entry_size_bytes / 1024 / 1024 as size_mb, access_count, last_accessedFROM cache_entriesORDER BY entry_size_bytes DESCLIMIT 20;
-- Check for memory leaksSELECT date_trunc('hour', recorded_at) as hour, MAX(memory_used_mb) as max_memory_mb, AVG(entry_count) as avg_entriesFROM cache_metricsWHERE recorded_at > NOW() - INTERVAL '24 hours'GROUP BY hourORDER BY hour;Solutions
Solution 1: Reduce Cache Size
ALTER DATABASE SET CACHE_PARAMS ( max_size_mb = 2048, -- Reduce total size per_query_max_mb = 128 -- Limit per-query size);Solution 2: Enable Compression
ALTER DATABASE SET CACHE_PARAMS ( compression_enabled = true, compression_algorithm = 'LZ4', -- Fast compression compression_min_size_bytes = 4096);Solution 3: Exclude Large Results
ALTER DATABASE SET CACHE_PARAMS ( max_result_size_bytes = 10485760 -- Don't cache results > 10 MB);Solution 4: More Aggressive Eviction
ALTER DATABASE SET CACHE_PARAMS ( memory_pressure_low_threshold = 0.6, -- Start evicting earlier memory_pressure_high_threshold = 0.8);Problem: Stale Data (Consistency Issues)
Symptoms
- Users seeing outdated information
- Data discrepancies between cache and database
- Inconsistent query results
Diagnostic Steps
-- Check invalidation statusSELECT invalidation_time, table_name, operation_type, affected_queries, propagation_statusFROM cache_invalidation_logWHERE invalidation_time > NOW() - INTERVAL '1 hour'ORDER BY invalidation_time DESC;
-- Verify cache freshnessSELECT query_hash, cached_at, last_validated, age_seconds, ttl_remaining_secondsFROM cache_entriesWHERE age_seconds > 300 -- Older than 5 minutesORDER BY age_seconds DESCLIMIT 20;
-- Check for invalidation failuresSELECT failure_time, table_name, failure_reason, retry_countFROM cache_invalidation_failuresWHERE failure_time > NOW() - INTERVAL '24 hours';Solutions
Solution 1: Enable Stronger Consistency
-- For critical tablesALTER TABLE financial_data SET CACHE_PARAMS ( consistency_level = 'STRONG');
-- For user-facing dataALTER TABLE user_profiles SET CACHE_PARAMS ( consistency_level = 'READ_YOUR_WRITES');Solution 2: Reduce TTL
ALTER TABLE frequently_updated SET CACHE_PARAMS ( ttl_seconds = 30 -- Very short TTL);Solution 3: Force Immediate Invalidation
-- Clear potentially stale cacheINVALIDATE CACHE WHERE table_name = 'problematic_table';
-- Enable synchronous invalidationALTER DATABASE SET CACHE_PARAMS ( invalidation_mode = 'SYNCHRONOUS');Solution 4: Verify Invalidation Pipeline
-- Check invalidation is enabledSELECT setting_valueFROM database_settingsWHERE setting_name = 'semantic_invalidation';
-- Should be 'true'ALTER DATABASE SET CACHE_PARAMS ( semantic_invalidation = true, invalidation_logging = true);Problem: Cache Invalidation Issues
Symptoms
- Cache entries not being invalidated after updates
- Partial invalidation (some entries stale, others fresh)
- Invalidation delays
Diagnostic Steps
-- Check invalidation pipeline healthSELECT * FROM cache_invalidation_health;
-- Monitor invalidation latencySELECT table_name, AVG(propagation_time_ms) as avg_latency_ms, MAX(propagation_time_ms) as max_latency_ms, COUNT(*) as invalidation_countFROM cache_invalidation_logWHERE invalidation_time > NOW() - INTERVAL '1 hour'GROUP BY table_name;
-- Check for backed up invalidationsSELECT queue_depth, oldest_message_age_ms, processing_rate_per_secondFROM cache_invalidation_queue_stats;Solutions
Solution 1: Clear Invalidation Backlog
-- Force process pending invalidationsFLUSH CACHE INVALIDATION QUEUE;
-- Increase processing capacityALTER DATABASE SET CACHE_PARAMS ( invalidation_batch_size = 500, invalidation_workers = 4);Solution 2: Fix Table Tracking
-- Verify tables are tracked for invalidationSELECT table_name, invalidation_enabledFROM table_cache_settings;
-- Enable for missing tablesALTER TABLE missing_table SET CACHE_PARAMS ( invalidation_enabled = true);Solution 3: Rebuild Invalidation Index
-- Rebuild query-to-table mappingREBUILD CACHE INVALIDATION INDEX;Problem: Statement Cache Issues
Symptoms
- Low statement cache hit rate
- Prepared statements not being reused
- Memory growth in statement cache
Diagnostic Steps
// Check statement cache statslet stats = pool.statement_cache_stats();println!("Cache size: {}", stats.current_size);println!("Hit rate: {:.2}%", stats.hit_rate * 100.0);println!("Memory: {} MB", stats.total_memory_bytes / 1024 / 1024);println!("Evictions: {}", stats.total_evictions);-- From SQLSELECT statement_count, hit_rate, memory_used_mb, eviction_countFROM statement_cache_stats;Solutions
Solution 1: Use Consistent SQL Formatting
// BAD: Dynamic SQL with variationslet sql = format!("SELECT * FROM users WHERE id = {}", id);let stmt = pool.prepare(&sql)?; // Different stmt for each id!
// GOOD: Parameterized querieslet stmt = pool.prepare("SELECT * FROM users WHERE id = $1")?;let params = Parameters::Positional(vec![ParameterValue::Int64(id)]);pool.execute(&stmt, params).await?;Solution 2: Prepare Once, Execute Many
// BAD: Prepare inside loopfor id in ids { let stmt = pool.prepare("SELECT * FROM users WHERE id = $1")?; pool.execute(&stmt, params).await?;}
// GOOD: Prepare once outside looplet stmt = pool.prepare("SELECT * FROM users WHERE id = $1")?;for id in ids { let params = Parameters::Positional(vec![ParameterValue::Int64(id)]); pool.execute(&stmt, params).await?;}Solution 3: Increase Statement Cache Size
ALTER DATABASE SET CACHE_PARAMS ( statement_cache_size = 20000, statement_cache_memory_mb = 200);Problem: Distributed Cache Issues
Symptoms
- Inconsistencies between nodes
- High invalidation latency
- Network partition effects
Diagnostic Steps
-- Check cluster healthSELECT node_id, is_alive, is_reachable, last_heartbeat, missed_heartbeatsFROM cache_cluster_status;
-- Check sync statusSELECT source_node, target_node, pending_messages, avg_sync_latency_ms, last_sync_timeFROM cache_sync_status;
-- Check for partition eventsSELECT event_time, event_type, affected_nodes, resolution_statusFROM cache_partition_logORDER BY event_time DESCLIMIT 20;Solutions
Solution 1: Verify Network Connectivity
-- Test inter-node connectivitySELECT * FROM cache_cluster_ping_test;
-- Check for blocked ports-- Ensure ports 5432 (data) and 5433 (cache sync) are openSolution 2: Adjust Failure Detection
ALTER DATABASE SET CACHE_PARAMS ( heartbeat_interval_ms = 500, -- More frequent heartbeats failure_detection_timeout_ms = 5000 -- Faster detection);Solution 3: Force Resync
-- Force full cache resync with clusterRESYNC CACHE WITH CLUSTER;
-- Reconcile version vectorsRECONCILE CACHE VERSION VECTORS;Problem: Performance Degradation
Symptoms
- Cache operations slow
- High cache access latency
- Lock contention warnings
Diagnostic Steps
-- Check cache operation latencySELECT operation_type, AVG(latency_us) as avg_latency_us, PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_us) as p99_latency_us, COUNT(*) as operation_countFROM cache_operation_logWHERE operation_time > NOW() - INTERVAL '10 minutes'GROUP BY operation_type;
-- Check for lock contentionSELECT lock_type, contention_count, avg_wait_time_us, max_wait_time_usFROM cache_lock_stats;Solutions
Solution 1: Increase Concurrency
ALTER DATABASE SET CACHE_PARAMS ( read_concurrency = 128, write_concurrency = 16, lock_stripe_count = 512);Solution 2: Reduce Cache Entry Size
ALTER DATABASE SET CACHE_PARAMS ( compression_enabled = true, max_result_size_bytes = 1048576 -- 1 MB max);Solution 3: Optimize Eviction
ALTER DATABASE SET CACHE_PARAMS ( eviction_batch_size = 50, -- Smaller batches eviction_batch_delay_ms = 5 -- More frequent, smaller evictions);Performance Debugging Tools
Cache Profiler
-- Enable detailed profilingALTER DATABASE SET CACHE_PARAMS ( profiling_enabled = true, profile_sample_rate = 0.01 -- Sample 1% of operations);
-- View profiling resultsSELECT * FROM cache_profile_report;
-- Disable when doneALTER DATABASE SET CACHE_PARAMS ( profiling_enabled = false);Query-Specific Analysis
-- Analyze specific query's cache behaviorSELECT query_hash, cache_hits, cache_misses, avg_execution_time_ms, avg_cached_time_ms, invalidation_count, avg_result_size_kbFROM query_cache_analysisWHERE query_text LIKE '%products%'ORDER BY cache_misses DESC;Real-Time Monitoring
-- Live cache event streamSELECT * FROM cache_event_stream;
-- Cache operation traceALTER DATABASE SET CACHE_PARAMS ( trace_enabled = true, trace_query_pattern = '%products%' -- Only trace matching queries);Emergency Procedures
Clear All Caches
-- Nuclear option: clear everythingFLUSH ALL CACHES;
-- Verify cache is emptySELECT memory_used_mb, entry_count FROM cache_statistics;Disable Caching Temporarily
-- Disable all cachingALTER DATABASE DISABLE ALL_CACHES;
-- Or disable for specific tableALTER TABLE problematic_table DISABLE CACHING;Reset Cache Configuration
-- Reset to defaultsALTER DATABASE RESET CACHE_PARAMS;
-- Restart cache subsystemRESTART CACHE SERVICE;Recovery from Corrupted Cache
-- Validate cache integritySELECT * FROM cache_integrity_check;
-- If corruption detectedREBUILD CACHE;
-- Verify after rebuildSELECT * FROM cache_diagnostic_report;Common Error Messages
”Cache memory exhausted"
-- Increase cache size or enable more aggressive evictionALTER DATABASE SET CACHE_PARAMS ( max_size_mb = 4096, memory_pressure_high_threshold = 0.8);"Statement cache full"
-- Increase statement cache or check for SQL variationsALTER DATABASE SET CACHE_PARAMS ( statement_cache_size = 20000);"Invalidation queue overflow"
-- Process backlog and increase capacityFLUSH CACHE INVALIDATION QUEUE;ALTER DATABASE SET CACHE_PARAMS ( invalidation_queue_size = 100000, invalidation_workers = 8);"Cache sync timeout"
-- Check network and adjust timeoutsALTER DATABASE SET CACHE_PARAMS ( sync_timeout_ms = 10000, consistency_level = 'EVENTUAL' -- Temporarily reduce consistency);"Version vector conflict”
-- Reconcile distributed stateRECONCILE CACHE VERSION VECTORS;Getting Help
If issues persist after trying these solutions:
-
Collect Diagnostics:
SELECT * FROM cache_full_diagnostic_report; -
Export Cache State:
EXPORT CACHE DIAGNOSTICS TO '/tmp/cache_debug.json'; -
Contact Support with:
- Diagnostic report output
- Error messages from logs
- Configuration settings
- Workload description
Related Documentation
- Caching Overview - Architecture and concepts
- Quick Start Guide - Get started with caching
- Configuration Guide - Detailed configuration options