Skip to content

HeliosDB Caching Troubleshooting Guide

HeliosDB Caching Troubleshooting Guide

Version: 7.0 Last Updated: 2025-01-04


Diagnostic Overview

This guide helps diagnose and resolve common caching issues in HeliosDB. For each problem, we provide symptoms, diagnostic steps, root causes, and solutions.


Quick Diagnostic Commands

-- Overall cache health summary
SELECT * FROM cache_diagnostic_report;
-- Cache statistics
SELECT
cache_hits,
cache_misses,
ROUND(hit_rate * 100, 2) as hit_rate_pct,
eviction_count,
invalidation_count,
memory_used_mb,
memory_max_mb
FROM cache_statistics;
-- Recent cache events
SELECT * FROM cache_event_log
ORDER BY event_time DESC
LIMIT 50;
-- Cache entry details
SELECT
query_hash,
table_names,
entry_size_kb,
access_count,
last_accessed,
ttl_remaining_seconds
FROM cache_entries
ORDER BY last_accessed DESC
LIMIT 20;

Problem: Low Cache Hit Rate

Symptoms

  • Hit rate below 70-80%
  • High query latency despite caching enabled
  • Database CPU utilization higher than expected

Diagnostic Steps

-- Check current hit rate trend
SELECT
time_bucket('5 minutes', recorded_at) as period,
SUM(hits) as total_hits,
SUM(misses) as total_misses,
ROUND(SUM(hits)::numeric / NULLIF(SUM(hits) + SUM(misses), 0) * 100, 2) as hit_rate_pct
FROM cache_metrics
WHERE recorded_at > NOW() - INTERVAL '1 hour'
GROUP BY period
ORDER BY period DESC;
-- Check cache size vs working set
SELECT
(SELECT SUM(entry_size_bytes) FROM cache_entries) / 1024 / 1024 as current_cache_mb,
(SELECT setting_value::int FROM database_settings WHERE setting_name = 'cache_max_size_mb') as max_cache_mb,
(SELECT COUNT(*) FROM cache_entries) as entry_count,
(SELECT setting_value::int FROM database_settings WHERE setting_name = 'cache_max_entries') as max_entries;
-- Identify frequently missed queries
SELECT
query_hash,
query_text_preview,
miss_count,
last_miss_time
FROM cache_miss_log
GROUP BY query_hash, query_text_preview
ORDER BY miss_count DESC
LIMIT 20;

Common Causes and Solutions

Cause 1: Cache Too Small

-- Check if cache is full
SELECT
memory_used_mb,
memory_max_mb,
ROUND(memory_used_mb / memory_max_mb * 100, 2) as utilization_pct
FROM cache_statistics;
-- Solution: Increase cache size
ALTER DATABASE SET CACHE_PARAMS (
max_size_mb = 4096 -- Double or triple current size
);

Cause 2: Working Set Exceeds Cache

-- Analyze query access patterns
SELECT
COUNT(DISTINCT query_hash) as unique_queries,
SUM(avg_result_size_bytes) / 1024 / 1024 as estimated_working_set_mb
FROM query_access_patterns
WHERE last_accessed > NOW() - INTERVAL '1 hour';
-- Solution: Either increase cache or focus on hot queries
ALTER DATABASE SET CACHE_PARAMS (
max_size_mb = 8192,
min_access_count_before_cache = 3 -- Only cache after 3 accesses
);

Cause 3: Query Variations Preventing Hits

-- Check for similar but different queries
SELECT
query_template,
COUNT(*) as variation_count,
SUM(cache_misses) as total_misses
FROM query_normalization_analysis
GROUP BY query_template
ORDER BY variation_count DESC
LIMIT 10;
-- Solution: Enable query normalization
ALTER DATABASE SET CACHE_PARAMS (
statement_normalization = true
);
-- Or use prepared statements consistently

Cause 4: Short TTL

-- Check TTL expirations
SELECT
COUNT(*) as expired_count,
AVG(original_ttl_seconds) as avg_ttl
FROM cache_eviction_log
WHERE eviction_reason = 'TTL_EXPIRED'
AND eviction_time > NOW() - INTERVAL '1 hour';
-- Solution: Increase TTL
ALTER DATABASE SET CACHE_PARAMS (
ttl_seconds = 3600 -- Increase from default
);

Cause 5: High Invalidation Rate

-- Check invalidation frequency
SELECT
table_name,
COUNT(*) as invalidation_count,
SUM(affected_entries) as total_affected
FROM cache_invalidation_log
WHERE invalidation_time > NOW() - INTERVAL '1 hour'
GROUP BY table_name
ORDER BY invalidation_count DESC;
-- Solution: For write-heavy tables, consider disabling cache
ALTER TABLE high_write_table DISABLE CACHING;

Problem: High Memory Usage

Symptoms

  • Cache memory utilization > 95%
  • Frequent evictions
  • System memory pressure warnings

Diagnostic Steps

-- Check memory distribution
SELECT
cache_type,
entries,
size_bytes / 1024 / 1024 as size_mb,
avg_entry_size_bytes / 1024 as avg_entry_kb
FROM cache_memory_breakdown;
-- Find large cache entries
SELECT
query_hash,
query_text_preview,
entry_size_bytes / 1024 / 1024 as size_mb,
access_count,
last_accessed
FROM cache_entries
ORDER BY entry_size_bytes DESC
LIMIT 20;
-- Check for memory leaks
SELECT
date_trunc('hour', recorded_at) as hour,
MAX(memory_used_mb) as max_memory_mb,
AVG(entry_count) as avg_entries
FROM cache_metrics
WHERE recorded_at > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;

Solutions

Solution 1: Reduce Cache Size

ALTER DATABASE SET CACHE_PARAMS (
max_size_mb = 2048, -- Reduce total size
per_query_max_mb = 128 -- Limit per-query size
);

Solution 2: Enable Compression

ALTER DATABASE SET CACHE_PARAMS (
compression_enabled = true,
compression_algorithm = 'LZ4', -- Fast compression
compression_min_size_bytes = 4096
);

Solution 3: Exclude Large Results

ALTER DATABASE SET CACHE_PARAMS (
max_result_size_bytes = 10485760 -- Don't cache results > 10 MB
);

Solution 4: More Aggressive Eviction

ALTER DATABASE SET CACHE_PARAMS (
memory_pressure_low_threshold = 0.6, -- Start evicting earlier
memory_pressure_high_threshold = 0.8
);

Problem: Stale Data (Consistency Issues)

Symptoms

  • Users seeing outdated information
  • Data discrepancies between cache and database
  • Inconsistent query results

Diagnostic Steps

-- Check invalidation status
SELECT
invalidation_time,
table_name,
operation_type,
affected_queries,
propagation_status
FROM cache_invalidation_log
WHERE invalidation_time > NOW() - INTERVAL '1 hour'
ORDER BY invalidation_time DESC;
-- Verify cache freshness
SELECT
query_hash,
cached_at,
last_validated,
age_seconds,
ttl_remaining_seconds
FROM cache_entries
WHERE age_seconds > 300 -- Older than 5 minutes
ORDER BY age_seconds DESC
LIMIT 20;
-- Check for invalidation failures
SELECT
failure_time,
table_name,
failure_reason,
retry_count
FROM cache_invalidation_failures
WHERE failure_time > NOW() - INTERVAL '24 hours';

Solutions

Solution 1: Enable Stronger Consistency

-- For critical tables
ALTER TABLE financial_data SET CACHE_PARAMS (
consistency_level = 'STRONG'
);
-- For user-facing data
ALTER TABLE user_profiles SET CACHE_PARAMS (
consistency_level = 'READ_YOUR_WRITES'
);

Solution 2: Reduce TTL

ALTER TABLE frequently_updated SET CACHE_PARAMS (
ttl_seconds = 30 -- Very short TTL
);

Solution 3: Force Immediate Invalidation

-- Clear potentially stale cache
INVALIDATE CACHE WHERE table_name = 'problematic_table';
-- Enable synchronous invalidation
ALTER DATABASE SET CACHE_PARAMS (
invalidation_mode = 'SYNCHRONOUS'
);

Solution 4: Verify Invalidation Pipeline

-- Check invalidation is enabled
SELECT setting_value
FROM database_settings
WHERE setting_name = 'semantic_invalidation';
-- Should be 'true'
ALTER DATABASE SET CACHE_PARAMS (
semantic_invalidation = true,
invalidation_logging = true
);

Problem: Cache Invalidation Issues

Symptoms

  • Cache entries not being invalidated after updates
  • Partial invalidation (some entries stale, others fresh)
  • Invalidation delays

Diagnostic Steps

-- Check invalidation pipeline health
SELECT * FROM cache_invalidation_health;
-- Monitor invalidation latency
SELECT
table_name,
AVG(propagation_time_ms) as avg_latency_ms,
MAX(propagation_time_ms) as max_latency_ms,
COUNT(*) as invalidation_count
FROM cache_invalidation_log
WHERE invalidation_time > NOW() - INTERVAL '1 hour'
GROUP BY table_name;
-- Check for backed up invalidations
SELECT
queue_depth,
oldest_message_age_ms,
processing_rate_per_second
FROM cache_invalidation_queue_stats;

Solutions

Solution 1: Clear Invalidation Backlog

-- Force process pending invalidations
FLUSH CACHE INVALIDATION QUEUE;
-- Increase processing capacity
ALTER DATABASE SET CACHE_PARAMS (
invalidation_batch_size = 500,
invalidation_workers = 4
);

Solution 2: Fix Table Tracking

-- Verify tables are tracked for invalidation
SELECT table_name, invalidation_enabled
FROM table_cache_settings;
-- Enable for missing tables
ALTER TABLE missing_table SET CACHE_PARAMS (
invalidation_enabled = true
);

Solution 3: Rebuild Invalidation Index

-- Rebuild query-to-table mapping
REBUILD CACHE INVALIDATION INDEX;

Problem: Statement Cache Issues

Symptoms

  • Low statement cache hit rate
  • Prepared statements not being reused
  • Memory growth in statement cache

Diagnostic Steps

// Check statement cache stats
let stats = pool.statement_cache_stats();
println!("Cache size: {}", stats.current_size);
println!("Hit rate: {:.2}%", stats.hit_rate * 100.0);
println!("Memory: {} MB", stats.total_memory_bytes / 1024 / 1024);
println!("Evictions: {}", stats.total_evictions);
-- From SQL
SELECT
statement_count,
hit_rate,
memory_used_mb,
eviction_count
FROM statement_cache_stats;

Solutions

Solution 1: Use Consistent SQL Formatting

// BAD: Dynamic SQL with variations
let sql = format!("SELECT * FROM users WHERE id = {}", id);
let stmt = pool.prepare(&sql)?; // Different stmt for each id!
// GOOD: Parameterized queries
let stmt = pool.prepare("SELECT * FROM users WHERE id = $1")?;
let params = Parameters::Positional(vec![ParameterValue::Int64(id)]);
pool.execute(&stmt, params).await?;

Solution 2: Prepare Once, Execute Many

// BAD: Prepare inside loop
for id in ids {
let stmt = pool.prepare("SELECT * FROM users WHERE id = $1")?;
pool.execute(&stmt, params).await?;
}
// GOOD: Prepare once outside loop
let stmt = pool.prepare("SELECT * FROM users WHERE id = $1")?;
for id in ids {
let params = Parameters::Positional(vec![ParameterValue::Int64(id)]);
pool.execute(&stmt, params).await?;
}

Solution 3: Increase Statement Cache Size

ALTER DATABASE SET CACHE_PARAMS (
statement_cache_size = 20000,
statement_cache_memory_mb = 200
);

Problem: Distributed Cache Issues

Symptoms

  • Inconsistencies between nodes
  • High invalidation latency
  • Network partition effects

Diagnostic Steps

-- Check cluster health
SELECT
node_id,
is_alive,
is_reachable,
last_heartbeat,
missed_heartbeats
FROM cache_cluster_status;
-- Check sync status
SELECT
source_node,
target_node,
pending_messages,
avg_sync_latency_ms,
last_sync_time
FROM cache_sync_status;
-- Check for partition events
SELECT
event_time,
event_type,
affected_nodes,
resolution_status
FROM cache_partition_log
ORDER BY event_time DESC
LIMIT 20;

Solutions

Solution 1: Verify Network Connectivity

-- Test inter-node connectivity
SELECT * FROM cache_cluster_ping_test;
-- Check for blocked ports
-- Ensure ports 5432 (data) and 5433 (cache sync) are open

Solution 2: Adjust Failure Detection

ALTER DATABASE SET CACHE_PARAMS (
heartbeat_interval_ms = 500, -- More frequent heartbeats
failure_detection_timeout_ms = 5000 -- Faster detection
);

Solution 3: Force Resync

-- Force full cache resync with cluster
RESYNC CACHE WITH CLUSTER;
-- Reconcile version vectors
RECONCILE CACHE VERSION VECTORS;

Problem: Performance Degradation

Symptoms

  • Cache operations slow
  • High cache access latency
  • Lock contention warnings

Diagnostic Steps

-- Check cache operation latency
SELECT
operation_type,
AVG(latency_us) as avg_latency_us,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_us) as p99_latency_us,
COUNT(*) as operation_count
FROM cache_operation_log
WHERE operation_time > NOW() - INTERVAL '10 minutes'
GROUP BY operation_type;
-- Check for lock contention
SELECT
lock_type,
contention_count,
avg_wait_time_us,
max_wait_time_us
FROM cache_lock_stats;

Solutions

Solution 1: Increase Concurrency

ALTER DATABASE SET CACHE_PARAMS (
read_concurrency = 128,
write_concurrency = 16,
lock_stripe_count = 512
);

Solution 2: Reduce Cache Entry Size

ALTER DATABASE SET CACHE_PARAMS (
compression_enabled = true,
max_result_size_bytes = 1048576 -- 1 MB max
);

Solution 3: Optimize Eviction

ALTER DATABASE SET CACHE_PARAMS (
eviction_batch_size = 50, -- Smaller batches
eviction_batch_delay_ms = 5 -- More frequent, smaller evictions
);

Performance Debugging Tools

Cache Profiler

-- Enable detailed profiling
ALTER DATABASE SET CACHE_PARAMS (
profiling_enabled = true,
profile_sample_rate = 0.01 -- Sample 1% of operations
);
-- View profiling results
SELECT * FROM cache_profile_report;
-- Disable when done
ALTER DATABASE SET CACHE_PARAMS (
profiling_enabled = false
);

Query-Specific Analysis

-- Analyze specific query's cache behavior
SELECT
query_hash,
cache_hits,
cache_misses,
avg_execution_time_ms,
avg_cached_time_ms,
invalidation_count,
avg_result_size_kb
FROM query_cache_analysis
WHERE query_text LIKE '%products%'
ORDER BY cache_misses DESC;

Real-Time Monitoring

-- Live cache event stream
SELECT * FROM cache_event_stream;
-- Cache operation trace
ALTER DATABASE SET CACHE_PARAMS (
trace_enabled = true,
trace_query_pattern = '%products%' -- Only trace matching queries
);

Emergency Procedures

Clear All Caches

-- Nuclear option: clear everything
FLUSH ALL CACHES;
-- Verify cache is empty
SELECT memory_used_mb, entry_count FROM cache_statistics;

Disable Caching Temporarily

-- Disable all caching
ALTER DATABASE DISABLE ALL_CACHES;
-- Or disable for specific table
ALTER TABLE problematic_table DISABLE CACHING;

Reset Cache Configuration

-- Reset to defaults
ALTER DATABASE RESET CACHE_PARAMS;
-- Restart cache subsystem
RESTART CACHE SERVICE;

Recovery from Corrupted Cache

-- Validate cache integrity
SELECT * FROM cache_integrity_check;
-- If corruption detected
REBUILD CACHE;
-- Verify after rebuild
SELECT * FROM cache_diagnostic_report;

Common Error Messages

”Cache memory exhausted"

-- Increase cache size or enable more aggressive eviction
ALTER DATABASE SET CACHE_PARAMS (
max_size_mb = 4096,
memory_pressure_high_threshold = 0.8
);

"Statement cache full"

-- Increase statement cache or check for SQL variations
ALTER DATABASE SET CACHE_PARAMS (
statement_cache_size = 20000
);

"Invalidation queue overflow"

-- Process backlog and increase capacity
FLUSH CACHE INVALIDATION QUEUE;
ALTER DATABASE SET CACHE_PARAMS (
invalidation_queue_size = 100000,
invalidation_workers = 8
);

"Cache sync timeout"

-- Check network and adjust timeouts
ALTER DATABASE SET CACHE_PARAMS (
sync_timeout_ms = 10000,
consistency_level = 'EVENTUAL' -- Temporarily reduce consistency
);

"Version vector conflict”

-- Reconcile distributed state
RECONCILE CACHE VERSION VECTORS;

Getting Help

If issues persist after trying these solutions:

  1. Collect Diagnostics:

    SELECT * FROM cache_full_diagnostic_report;
  2. Export Cache State:

    EXPORT CACHE DIAGNOSTICS TO '/tmp/cache_debug.json';
  3. Contact Support with:

    • Diagnostic report output
    • Error messages from logs
    • Configuration settings
    • Workload description