Skip to content

CDC Webhooks Troubleshooting Guide

CDC Webhooks Troubleshooting Guide

Last Updated: January 4, 2026


This guide covers common issues encountered when working with CDC webhooks and their solutions.

Table of Contents

  1. Events Not Being Captured
  2. Webhook Delivery Failures
  3. High Latency
  4. Circuit Breaker Issues
  5. Duplicate Events
  6. Queue Backlog
  7. Security Issues
  8. Monitoring and Diagnostics

Events Not Being Captured

Symptom

Database changes are not triggering webhook events.

Diagnosis

-- Check if CDC stream exists and is active
SHOW CDC STREAMS;
-- Check stream status
SHOW CDC STREAM STATUS your_stream_name;

Solutions

1. Stream is paused

-- Resume the stream
ALTER CDC STREAM your_stream_name RESUME;

2. Table not included in stream

-- Check stream configuration
SHOW CDC STREAM STATUS your_stream_name;
-- Create new stream with correct table
CREATE CHANGE DATA CAPTURE ON your_table
TO KAFKA 'localhost:9092' TOPIC 'your-topic'
FORMAT JSON;

3. Filter excluding events

# Check filter configuration
filters:
- name: your_filter
tables:
- your_table # Ensure table is included
operations:
- INSERT
- UPDATE
- DELETE # Ensure operation is included

4. WAL not being read

// Check WAL reader status
let stats = processor.get_stats().await?;
println!("Last WAL position: {}", stats.last_wal_position);
println!("Events captured: {}", stats.events_captured);

Webhook Delivery Failures

Symptom

Events are captured but not delivered to webhook endpoints.

Diagnosis

-- Check stream status for errors
SHOW CDC STREAM STATUS your_stream_name;
# Check Prometheus metrics
rate(heliosdb_webhook_delivery_total{status="failure"}[5m])

Solutions

1. Endpoint unreachable

Terminal window
# Test endpoint connectivity
curl -X POST https://your-endpoint.com/webhooks \
-H "Content-Type: application/json" \
-d '{"test": true}'
# Verify endpoint URL in configuration
webhooks:
- url: https://your-endpoint.com/webhooks # Check URL

2. SSL/TLS certificate issues

// For development/testing, disable certificate verification
let config = WebhookConfig {
tls_verify: false, // NOT for production!
..Default::default()
};

For production, ensure certificates are valid:

Terminal window
# Check certificate
openssl s_client -connect your-endpoint.com:443

3. Endpoint returning errors

Check your endpoint logs for error details.

# Ensure endpoint returns 2xx status
def webhook_handler(request):
try:
process_event(request.json)
return {"status": "success"}, 200 # Return 2xx
except Exception as e:
return {"error": str(e)}, 500 # Will trigger retry

4. Timeout issues

# Increase timeout if endpoint is slow
delivery:
timeout_secs: 60 # Increase from default 30

5. Retry exhausted

# Increase retries
delivery:
max_retries: 5 # Increase from default 3

High Latency

Symptom

Events are being delivered but with high latency (>100ms P99).

Diagnosis

# Check latency metrics
histogram_quantile(0.99, rate(heliosdb_webhook_delivery_duration_seconds_bucket[5m]))

Solutions

1. Endpoint is slow

Terminal window
# Measure endpoint response time
time curl -X POST https://your-endpoint.com/webhooks \
-H "Content-Type: application/json" \
-d '{"test": true}'

Consider:

  • Optimizing endpoint processing
  • Async processing (acknowledge quickly, process later)
  • Adding more endpoint instances

2. Network latency

Terminal window
# Check network latency
ping your-endpoint.com
traceroute your-endpoint.com

Consider:

  • Deploying HeliosDB closer to endpoints
  • Using a faster network path
  • Adding regional endpoint replicas

3. Worker pool too small

# Increase worker pool
worker_pool:
num_workers: 200 # Increase from default
max_concurrent_requests: 2000

4. Queue backlog

# Check queue size
heliosdb_event_queue_size

See Queue Backlog section.


Circuit Breaker Issues

Symptom

Circuit breaker is open, blocking delivery to healthy endpoints.

Diagnosis

# Check circuit state
heliosdb_circuit_breaker_state == 2 # 2 = Open
let state = circuit_breaker.get_state("https://your-endpoint.com").await?;
println!("Circuit state: {:?}", state);

Solutions

1. Circuit opened due to transient failures

Wait for timeout period, circuit will transition to half-open:

circuit_breaker:
timeout_secs: 60 # Circuit will try again after 60s

2. False positives from timeout

# Increase timeout before marking as failure
delivery:
timeout_secs: 60 # Give endpoint more time

3. Adjust circuit breaker sensitivity

circuit_breaker:
failure_threshold: 10 # Increase from 5
success_threshold: 3 # Require more successes to close

4. Reset circuit breaker manually

// Force reset (use with caution)
circuit_breaker.reset("https://your-endpoint.com").await?;

Duplicate Events

Symptom

Same events are being delivered multiple times.

Diagnosis

Check if your endpoint is receiving duplicate event IDs.

Solutions

1. Verify exactly-once is enabled

exactly_once:
delivery_window_secs: 86400 # 24 hour window

2. Implement idempotency in receiver

seen_events = set()
def handle_webhook(event):
event_id = event['event_id']
if event_id in seen_events:
return {"status": "duplicate"}, 200
seen_events.add(event_id)
process_event(event)
return {"status": "success"}, 200

3. Use database for deduplication

-- Create dedup table
CREATE TABLE webhook_events (
event_id VARCHAR(255) PRIMARY KEY,
processed_at TIMESTAMP DEFAULT NOW()
);
-- Check before processing
INSERT INTO webhook_events (event_id)
VALUES ('evt_123')
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;

4. Check for retry storm

# High retry rate indicates issues
rate(heliosdb_webhook_retry_total[5m]) > 100

Queue Backlog

Symptom

Events are queuing up faster than they can be delivered.

Diagnosis

# Check queue size
heliosdb_event_queue_size{queue_type="fast"}
heliosdb_event_queue_size{queue_type="overflow"}

Solutions

1. Increase worker pool

worker_pool:
num_workers: 200
max_concurrent_requests: 5000
connection_pool_size: 1000

2. Add more endpoints (load balance)

webhooks:
- url: https://endpoint-1.example.com/webhooks
- url: https://endpoint-2.example.com/webhooks
- url: https://endpoint-3.example.com/webhooks

3. Reduce event volume with filters

filters:
- name: reduce_volume
tables:
- important_table # Only essential tables
operations:
- INSERT # Skip updates if not needed

4. Batch events (if supported by receiver)

delivery:
batch_size: 100 # Send 100 events per request
batch_timeout_ms: 1000 # Or after 1 second

5. Scale horizontally

Deploy multiple CDC processors with table sharding.


Security Issues

Symptom

Webhook signature verification failing or security alerts.

Diagnosis

# Check signature header
signature = request.headers.get('X-HeliosDB-Signature')
print(f"Received signature: {signature}")

Solutions

1. Signature verification failing

import hmac
import hashlib
def verify_signature(payload, signature, secret):
# Ensure payload is bytes
if isinstance(payload, str):
payload = payload.encode('utf-8')
expected = 'sha256=' + hmac.new(
secret.encode('utf-8'),
payload,
hashlib.sha256
).hexdigest()
# Use constant-time comparison
return hmac.compare_digest(signature, expected)

2. Wrong secret key

# Verify secret matches between sender and receiver
security:
hmac_secret_env: WEBHOOK_SECRET # Check this environment variable

3. Payload modification (proxy/load balancer)

Ensure no proxy or load balancer is modifying the request body.

4. TLS certificate issues

Terminal window
# Verify certificate
openssl verify -CAfile ca.pem cert.pem
# Check certificate chain
openssl s_client -connect your-endpoint.com:443 -showcerts

Monitoring and Diagnostics

Check Overall Health

-- Stream overview
SHOW CDC STREAMS;
# Overall delivery success rate
sum(rate(heliosdb_webhook_delivery_total{status="success"}[5m])) /
sum(rate(heliosdb_webhook_delivery_total[5m])) * 100

View Event Processing Stats

let stats = processor.get_stats().await?;
println!("Events captured: {}", stats.events_captured);
println!("Events delivered: {}", stats.events_delivered);
println!("Events failed: {}", stats.events_failed);
println!("Queue size: {}", stats.queue_size);
println!("Avg latency: {}ms", stats.avg_latency_ms);

Check Specific Endpoint

# Delivery rate for specific endpoint
rate(heliosdb_webhook_delivery_total{webhook_url="https://example.com"}[5m])
# Error rate for specific endpoint
rate(heliosdb_webhook_delivery_total{webhook_url="https://example.com",status="failure"}[5m])

Debug Logging

# Enable debug logging
logging:
level: debug
modules:
heliosdb_cdc: debug
heliosdb_webhooks: debug

Trace Individual Events

// Enable tracing for specific events
processor.enable_event_tracing("evt_123").await?;
// Later, check trace
let trace = processor.get_event_trace("evt_123").await?;
for step in trace.steps {
println!("{}: {} ({}ms)", step.timestamp, step.action, step.duration_ms);
}

Common Error Messages

ErrorCauseSolution
Connection refusedEndpoint not listeningCheck endpoint is running
Connection timed outNetwork/firewall issueCheck network connectivity
SSL certificate errorInvalid/expired certUpdate certificate
429 Too Many RequestsRate limitedReduce delivery rate
503 Service UnavailableEndpoint overloadedScale endpoint
Circuit breaker openToo many failuresWait for recovery
Queue overflowBacklog too largeScale workers