CDC Webhooks Troubleshooting Guide
CDC Webhooks Troubleshooting Guide
Last Updated: January 4, 2026
This guide covers common issues encountered when working with CDC webhooks and their solutions.
Table of Contents
- Events Not Being Captured
- Webhook Delivery Failures
- High Latency
- Circuit Breaker Issues
- Duplicate Events
- Queue Backlog
- Security Issues
- Monitoring and Diagnostics
Events Not Being Captured
Symptom
Database changes are not triggering webhook events.
Diagnosis
-- Check if CDC stream exists and is activeSHOW CDC STREAMS;
-- Check stream statusSHOW CDC STREAM STATUS your_stream_name;Solutions
1. Stream is paused
-- Resume the streamALTER CDC STREAM your_stream_name RESUME;2. Table not included in stream
-- Check stream configurationSHOW CDC STREAM STATUS your_stream_name;
-- Create new stream with correct tableCREATE CHANGE DATA CAPTURE ON your_table TO KAFKA 'localhost:9092' TOPIC 'your-topic' FORMAT JSON;3. Filter excluding events
# Check filter configurationfilters: - name: your_filter tables: - your_table # Ensure table is included operations: - INSERT - UPDATE - DELETE # Ensure operation is included4. WAL not being read
// Check WAL reader statuslet stats = processor.get_stats().await?;println!("Last WAL position: {}", stats.last_wal_position);println!("Events captured: {}", stats.events_captured);Webhook Delivery Failures
Symptom
Events are captured but not delivered to webhook endpoints.
Diagnosis
-- Check stream status for errorsSHOW CDC STREAM STATUS your_stream_name;# Check Prometheus metricsrate(heliosdb_webhook_delivery_total{status="failure"}[5m])Solutions
1. Endpoint unreachable
# Test endpoint connectivitycurl -X POST https://your-endpoint.com/webhooks \ -H "Content-Type: application/json" \ -d '{"test": true}'# Verify endpoint URL in configurationwebhooks: - url: https://your-endpoint.com/webhooks # Check URL2. SSL/TLS certificate issues
// For development/testing, disable certificate verificationlet config = WebhookConfig { tls_verify: false, // NOT for production! ..Default::default()};For production, ensure certificates are valid:
# Check certificateopenssl s_client -connect your-endpoint.com:4433. Endpoint returning errors
Check your endpoint logs for error details.
# Ensure endpoint returns 2xx statusdef webhook_handler(request): try: process_event(request.json) return {"status": "success"}, 200 # Return 2xx except Exception as e: return {"error": str(e)}, 500 # Will trigger retry4. Timeout issues
# Increase timeout if endpoint is slowdelivery: timeout_secs: 60 # Increase from default 305. Retry exhausted
# Increase retriesdelivery: max_retries: 5 # Increase from default 3High Latency
Symptom
Events are being delivered but with high latency (>100ms P99).
Diagnosis
# Check latency metricshistogram_quantile(0.99, rate(heliosdb_webhook_delivery_duration_seconds_bucket[5m]))Solutions
1. Endpoint is slow
# Measure endpoint response timetime curl -X POST https://your-endpoint.com/webhooks \ -H "Content-Type: application/json" \ -d '{"test": true}'Consider:
- Optimizing endpoint processing
- Async processing (acknowledge quickly, process later)
- Adding more endpoint instances
2. Network latency
# Check network latencyping your-endpoint.comtraceroute your-endpoint.comConsider:
- Deploying HeliosDB closer to endpoints
- Using a faster network path
- Adding regional endpoint replicas
3. Worker pool too small
# Increase worker poolworker_pool: num_workers: 200 # Increase from default max_concurrent_requests: 20004. Queue backlog
# Check queue sizeheliosdb_event_queue_sizeSee Queue Backlog section.
Circuit Breaker Issues
Symptom
Circuit breaker is open, blocking delivery to healthy endpoints.
Diagnosis
# Check circuit stateheliosdb_circuit_breaker_state == 2 # 2 = Openlet state = circuit_breaker.get_state("https://your-endpoint.com").await?;println!("Circuit state: {:?}", state);Solutions
1. Circuit opened due to transient failures
Wait for timeout period, circuit will transition to half-open:
circuit_breaker: timeout_secs: 60 # Circuit will try again after 60s2. False positives from timeout
# Increase timeout before marking as failuredelivery: timeout_secs: 60 # Give endpoint more time3. Adjust circuit breaker sensitivity
circuit_breaker: failure_threshold: 10 # Increase from 5 success_threshold: 3 # Require more successes to close4. Reset circuit breaker manually
// Force reset (use with caution)circuit_breaker.reset("https://your-endpoint.com").await?;Duplicate Events
Symptom
Same events are being delivered multiple times.
Diagnosis
Check if your endpoint is receiving duplicate event IDs.
Solutions
1. Verify exactly-once is enabled
exactly_once: delivery_window_secs: 86400 # 24 hour window2. Implement idempotency in receiver
seen_events = set()
def handle_webhook(event): event_id = event['event_id']
if event_id in seen_events: return {"status": "duplicate"}, 200
seen_events.add(event_id) process_event(event)
return {"status": "success"}, 2003. Use database for deduplication
-- Create dedup tableCREATE TABLE webhook_events ( event_id VARCHAR(255) PRIMARY KEY, processed_at TIMESTAMP DEFAULT NOW());
-- Check before processingINSERT INTO webhook_events (event_id)VALUES ('evt_123')ON CONFLICT (event_id) DO NOTHINGRETURNING event_id;4. Check for retry storm
# High retry rate indicates issuesrate(heliosdb_webhook_retry_total[5m]) > 100Queue Backlog
Symptom
Events are queuing up faster than they can be delivered.
Diagnosis
# Check queue sizeheliosdb_event_queue_size{queue_type="fast"}heliosdb_event_queue_size{queue_type="overflow"}Solutions
1. Increase worker pool
worker_pool: num_workers: 200 max_concurrent_requests: 5000 connection_pool_size: 10002. Add more endpoints (load balance)
webhooks: - url: https://endpoint-1.example.com/webhooks - url: https://endpoint-2.example.com/webhooks - url: https://endpoint-3.example.com/webhooks3. Reduce event volume with filters
filters: - name: reduce_volume tables: - important_table # Only essential tables operations: - INSERT # Skip updates if not needed4. Batch events (if supported by receiver)
delivery: batch_size: 100 # Send 100 events per request batch_timeout_ms: 1000 # Or after 1 second5. Scale horizontally
Deploy multiple CDC processors with table sharding.
Security Issues
Symptom
Webhook signature verification failing or security alerts.
Diagnosis
# Check signature headersignature = request.headers.get('X-HeliosDB-Signature')print(f"Received signature: {signature}")Solutions
1. Signature verification failing
import hmacimport hashlib
def verify_signature(payload, signature, secret): # Ensure payload is bytes if isinstance(payload, str): payload = payload.encode('utf-8')
expected = 'sha256=' + hmac.new( secret.encode('utf-8'), payload, hashlib.sha256 ).hexdigest()
# Use constant-time comparison return hmac.compare_digest(signature, expected)2. Wrong secret key
# Verify secret matches between sender and receiversecurity: hmac_secret_env: WEBHOOK_SECRET # Check this environment variable3. Payload modification (proxy/load balancer)
Ensure no proxy or load balancer is modifying the request body.
4. TLS certificate issues
# Verify certificateopenssl verify -CAfile ca.pem cert.pem
# Check certificate chainopenssl s_client -connect your-endpoint.com:443 -showcertsMonitoring and Diagnostics
Check Overall Health
-- Stream overviewSHOW CDC STREAMS;# Overall delivery success ratesum(rate(heliosdb_webhook_delivery_total{status="success"}[5m])) /sum(rate(heliosdb_webhook_delivery_total[5m])) * 100View Event Processing Stats
let stats = processor.get_stats().await?;
println!("Events captured: {}", stats.events_captured);println!("Events delivered: {}", stats.events_delivered);println!("Events failed: {}", stats.events_failed);println!("Queue size: {}", stats.queue_size);println!("Avg latency: {}ms", stats.avg_latency_ms);Check Specific Endpoint
# Delivery rate for specific endpointrate(heliosdb_webhook_delivery_total{webhook_url="https://example.com"}[5m])
# Error rate for specific endpointrate(heliosdb_webhook_delivery_total{webhook_url="https://example.com",status="failure"}[5m])Debug Logging
# Enable debug logginglogging: level: debug modules: heliosdb_cdc: debug heliosdb_webhooks: debugTrace Individual Events
// Enable tracing for specific eventsprocessor.enable_event_tracing("evt_123").await?;
// Later, check tracelet trace = processor.get_event_trace("evt_123").await?;for step in trace.steps { println!("{}: {} ({}ms)", step.timestamp, step.action, step.duration_ms);}Common Error Messages
| Error | Cause | Solution |
|---|---|---|
Connection refused | Endpoint not listening | Check endpoint is running |
Connection timed out | Network/firewall issue | Check network connectivity |
SSL certificate error | Invalid/expired cert | Update certificate |
429 Too Many Requests | Rate limited | Reduce delivery rate |
503 Service Unavailable | Endpoint overloaded | Scale endpoint |
Circuit breaker open | Too many failures | Wait for recovery |
Queue overflow | Backlog too large | Scale workers |
Related Documentation
- README.md - Feature overview
- USER_GUIDE.md - Comprehensive guide
- Design Document - Technical architecture