Skip to content

Resource Leak Prevention - Quick Reference Guide

Resource Leak Prevention - Quick Reference Guide

For: DevOps, SRE, Database Administrators Version: 1.0 Date: 2025-11-10

Quick Configuration

Essential Settings (Production)

# Minimal safe configuration for production
[connection_pool]
min_connections = 10
max_connections = 500
acquire_timeout = "30s"
leak_detection_enabled = true
leak_detection_timeout = "10m"
[timeouts]
default_query_timeout = "30s"
transaction_timeout = "10m"
[resource_limits]
max_client_connections = 10000
max_memory_per_query_mb = 1024
max_total_memory_mb = 32768
memory_pressure_threshold = 0.85

Common Issues & Solutions

Issue 1: Connection Leak Alert

Symptom: Alert “Connection leaked after 30 minutes”

Immediate Actions:

Terminal window
# 1. Check leaked connections
heliosdb-cli metrics | grep "connections_leaked"
# 2. Review recent queries from user
heliosdb-cli query "SELECT * FROM pg_stat_activity WHERE usename='<user>'"
# 3. Force reclaim if needed
heliosdb-cli admin force-reclaim-connection <connection_id>

Root Cause Analysis:

  • Check application code for missing connection.close() calls
  • Review stack traces in leak alerts
  • Check for transaction rollback failures

Prevention:

  • Use connection pooling properly
  • Always use try-with-resources / defer / RAII patterns
  • Enable leak detection in development

Issue 2: High Memory Pressure

Symptom: Alert “Memory pressure: HIGH (88%)”

Immediate Actions:

Terminal window
# 1. Check memory usage
heliosdb-cli metrics | grep "memory_usage_mb"
# 2. Check top memory-consuming queries
heliosdb-cli query "SELECT query_id, memory_mb FROM active_queries ORDER BY memory_mb DESC LIMIT 10"
# 3. Force garbage collection
heliosdb-cli admin force-gc
# 4. If critical, enable degradation
heliosdb-cli admin enable-degradation --type reduce-memory-limits

Root Cause Analysis:

  • Large result sets being buffered
  • Memory-intensive aggregations
  • Cache size too large
  • Memory leak in application

Prevention:

  • Set appropriate query memory limits
  • Use streaming for large results
  • Monitor cache hit rates
  • Regular memory profiling

Issue 3: Connection Pool Exhausted

Symptom: Error “Connection pool exhausted, timeout after 30s”

Immediate Actions:

Terminal window
# 1. Check pool utilization
heliosdb-cli metrics | grep "pool_utilization"
# 2. Check active connections
heliosdb-cli query "SELECT COUNT(*) FROM pg_stat_activity WHERE state='active'"
# 3. Identify long-running queries
heliosdb-cli query "SELECT pid, now() - query_start as duration, query FROM pg_stat_activity WHERE state='active' ORDER BY duration DESC"
# 4. Kill long-running queries if needed
heliosdb-cli admin kill-query <query_id>
# 5. Temporarily increase pool size
heliosdb-cli admin resize-pool --size 1000

Root Cause Analysis:

  • Too many concurrent connections
  • Connections not being released
  • Long-running queries holding connections
  • Pool size too small for workload

Prevention:

  • Right-size connection pool
  • Implement connection timeouts
  • Use connection pooling middleware
  • Monitor pool utilization

Issue 4: Circuit Breaker Open

Symptom: Error “Circuit breaker is open for connection pool”

Immediate Actions:

Terminal window
# 1. Check circuit breaker state
heliosdb-cli metrics | grep "circuit_breaker_state"
# 2. Check recent failures
heliosdb-cli metrics | grep "circuit_breaker_failures"
# 3. Check backend health
heliosdb-cli health-check
# 4. Manually reset circuit breaker if safe
heliosdb-cli admin reset-circuit-breaker --name connection_pool

Root Cause Analysis:

  • Backend database failures
  • Network connectivity issues
  • Resource exhaustion on backend
  • Configuration errors

Prevention:

  • Monitor backend health
  • Implement proper error handling
  • Configure appropriate thresholds
  • Have redundant backends

Issue 5: Query Timeout

Symptom: Error “Query timeout exceeded: 30000ms”

Immediate Actions:

Terminal window
# 1. Check if query is still running
heliosdb-cli query "SELECT * FROM pg_stat_activity WHERE query_id='<query_id>'"
# 2. Review query plan
heliosdb-cli explain "SELECT ..."
# 3. Check for locks
heliosdb-cli query "SELECT * FROM pg_locks WHERE NOT granted"
# 4. Increase timeout for this query type if appropriate
heliosdb-cli config set timeouts.long_query_timeout 600s

Root Cause Analysis:

  • Inefficient query
  • Missing indexes
  • Lock contention
  • Large data volume
  • Timeout too aggressive

Prevention:

  • Optimize queries
  • Add appropriate indexes
  • Use query hints for complex queries
  • Set per-query-type timeouts

Monitoring Checklist

Daily Checks

  • Connection pool utilization < 80%
  • Memory pressure level: Normal or Elevated
  • No connection leak alerts in last 24h
  • Circuit breaker state: Closed
  • Query timeout rate < 1%

Weekly Checks

  • Review resource limit violations
  • Check for degradation activations
  • Analyze slow query log
  • Review memory growth trends
  • Validate backup/restore procedures

Monthly Checks

  • Connection pool sizing review
  • Timeout configuration review
  • Resource limit adjustment
  • Circuit breaker threshold tuning
  • Leak detection effectiveness

Key Metrics

Connection Pool Metrics

connections_total # Total connections in pool
connections_active # Connections currently in use
connections_idle # Connections available
connections_leaked # Connections that leaked
pool_utilization # Active / Total (should be < 0.8)
connection_lifetime_ms # Average connection lifetime
acquire_timeout_rate # % of acquire attempts that timeout

Resource Limit Metrics

memory_usage_mb # Current memory usage
memory_limit_mb # Configured memory limit
memory_pressure_level # Normal/Elevated/High/Critical
query_memory_violations # Queries rejected for memory
connection_limit_violations # Connections rejected
file_descriptor_usage # Open file count

Timeout Metrics

operation_timeouts # Total timeout events
timeout_by_type # Breakdown by operation type
average_query_time_ms # Average query execution time
p95_query_time_ms # 95th percentile query time
p99_query_time_ms # 99th percentile query time

Circuit Breaker Metrics

circuit_breaker_state # Closed/Open/Half-Open
circuit_breaker_failures # Failure count
circuit_breaker_successes # Success count in half-open
circuit_open_events # Times circuit opened
circuit_close_events # Times circuit recovered

Alert Severity Levels

P1 (Critical - Immediate Action)

  • Memory pressure: CRITICAL (> 95%)
  • Massive leak detected (> 50 leaked connections)
  • Circuit breaker open for > 5 minutes
  • Connection pool exhausted for > 1 minute
  • Database unavailable

Response Time: 15 minutes Escalation: Page on-call immediately

P2 (High - Urgent Action)

  • Memory pressure: HIGH (> 85%)
  • Connection leak detected (> 30 min)
  • Query timeout rate > 5%
  • Resource limit violations increasing
  • Degradation activated

Response Time: 1 hour Escalation: Alert on-call during business hours

P3 (Medium - Standard Action)

  • Memory pressure: ELEVATED (> 70%)
  • Connection held warning (> 10 min)
  • Pool utilization > 80%
  • Slow query detected
  • Resource usage trending up

Response Time: 4 hours Escalation: Create ticket for investigation

P4 (Low - Informational)

  • Memory pressure: NORMAL
  • Connection recycled (age limit)
  • Configuration change applied
  • Health check passed

Response Time: Best effort Escalation: Log for review


Emergency Procedures

Emergency Memory Recovery

Terminal window
# 1. Check current state
heliosdb-cli metrics | grep memory
# 2. Force garbage collection
heliosdb-cli admin force-gc
# 3. Clear caches
heliosdb-cli admin clear-cache --type query
heliosdb-cli admin clear-cache --type result
# 4. Close idle connections
heliosdb-cli admin close-idle-connections --age 1m
# 5. Kill low-priority queries
heliosdb-cli admin kill-queries --priority low
# 6. Enable degradation
heliosdb-cli admin enable-degradation --type reduce-memory-limits
# 7. If still critical, enable read-only mode
heliosdb-cli admin enable-read-only-mode

Emergency Connection Recovery

Terminal window
# 1. Check connection state
heliosdb-cli metrics | grep connections
# 2. Close idle connections
heliosdb-cli admin close-idle-connections --age 30s
# 3. Force reclaim leaked connections
heliosdb-cli admin force-reclaim-all-leaked
# 4. Kill long-running queries
heliosdb-cli query "SELECT pid FROM pg_stat_activity WHERE state='active' AND now() - query_start > interval '5 minutes'" | xargs -I {} heliosdb-cli admin kill-query {}
# 5. Reject new connections temporarily
heliosdb-cli admin enable-degradation --type reject-new-connections
# 6. Restart connection pool (last resort)
heliosdb-cli admin restart-pool --graceful

Emergency Shutdown

Terminal window
# 1. Enable read-only mode
heliosdb-cli admin enable-read-only-mode
# 2. Stop accepting new connections
heliosdb-cli admin stop-accepting-connections
# 3. Wait for active queries to complete (30s timeout)
heliosdb-cli admin wait-for-queries --timeout 30s
# 4. Kill remaining queries
heliosdb-cli admin kill-all-queries
# 5. Graceful shutdown (30s timeout)
heliosdb-cli admin shutdown --graceful --timeout 30s
# 6. Force shutdown if needed
heliosdb-cli admin shutdown --force

Configuration Tuning Guide

Small Deployment (< 100 users)

[connection_pool]
min_connections = 5
max_connections = 50
[resource_limits]
max_client_connections = 1000
max_total_memory_mb = 4096

Medium Deployment (100-1000 users)

[connection_pool]
min_connections = 10
max_connections = 200
[resource_limits]
max_client_connections = 5000
max_total_memory_mb = 16384

Large Deployment (> 1000 users)

[connection_pool]
min_connections = 20
max_connections = 500
[resource_limits]
max_client_connections = 10000
max_total_memory_mb = 32768

High-Performance (Low Latency)

[connection_pool]
min_connections = 50
max_connections = 1000
acquire_timeout = "5s"
[timeouts]
default_query_timeout = "10s"
lock_timeout = "500ms"

High-Throughput (Batch Processing)

[connection_pool]
min_connections = 10
max_connections = 100
[timeouts]
default_query_timeout = "5m"
transaction_timeout = "30m"
[resource_limits]
max_memory_per_query_mb = 4096

Troubleshooting Commands

Terminal window
# Connection pool status
heliosdb-cli pool status
# Active connections
heliosdb-cli pool connections --active
# Leaked connections
heliosdb-cli pool connections --leaked
# Resource usage
heliosdb-cli resources usage
# Resource pressure
heliosdb-cli resources pressure
# Circuit breaker status
heliosdb-cli circuit-breaker status --all
# Recent timeouts
heliosdb-cli timeouts recent --limit 10
# Query resource usage
heliosdb-cli query-resources --top 10
# Health check
heliosdb-cli health-check --verbose
# Configuration dump
heliosdb-cli config dump --section resource_limits

Best Practices

Application Development

  1. Always close connections

    let conn = pool.acquire().await?;
    defer! { pool.release(conn).await; }
  2. Set query timeouts

    query.with_timeout(Duration::from_secs(30))
  3. Stream large results

    let stream = query.execute_streaming().await?;
  4. Check resource availability

    if !pool.can_acquire() {
    return Err("Pool exhausted");
    }

Operations

  1. Monitor continuously

    • Set up alerts for all P1/P2 conditions
    • Dashboard with key metrics
    • Regular log review
  2. Test failure scenarios

    • Connection leak injection
    • Memory pressure simulation
    • Circuit breaker testing
  3. Document incidents

    • Root cause analysis
    • Remediation steps
    • Prevention measures
  4. Regular maintenance

    • Weekly metric review
    • Monthly configuration tuning
    • Quarterly load testing

Support Contacts


Last Updated: 2025-11-10 Next Review: 2025-12-10