Skip to content

Conversational BI Production Deployment Runbook

Conversational BI Production Deployment Runbook

Version: 7.0.0 Last Updated: 2025-11-14 Status: Production Ready

Table of Contents

  1. Overview
  2. Prerequisites
  3. Architecture
  4. Deployment Steps
  5. Configuration
  6. Security Hardening
  7. Monitoring & Observability
  8. Performance Tuning
  9. Troubleshooting
  10. Incident Response

Overview

This runbook provides step-by-step instructions for deploying HeliosDB Conversational BI to production environments. The system is designed for:

  • High Availability: 99.9% uptime SLA
  • High Performance: <200ms p99 latency
  • Scalability: 1000+ queries per minute
  • Security: Enterprise-grade hardening

Key Features

  • Multi-turn conversation context (10+ turns)
  • 95%+ accuracy on BIRD dataset
  • Support for OpenAI, Anthropic, Cohere, and local models
  • Semantic caching for performance
  • Rate limiting and circuit breakers
  • Comprehensive monitoring

Prerequisites

System Requirements

Minimum Production Spec:

  • CPU: 8 cores
  • RAM: 16GB
  • Storage: 100GB SSD
  • Network: 1Gbps

Recommended Production Spec:

  • CPU: 16 cores
  • RAM: 32GB
  • Storage: 500GB NVMe SSD
  • Network: 10Gbps

Software Dependencies

Terminal window
# Rust toolchain (1.75+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# System libraries
sudo apt-get update
sudo apt-get install -y \
build-essential \
pkg-config \
libssl-dev \
ca-certificates
# Optional: Ollama for local models
curl -fsSL https://ollama.ai/install.sh | sh

Required Credentials

  • LLM API Key (one of):
    • OpenAI API key (OPENAI_API_KEY)
    • Anthropic API key (ANTHROPIC_API_KEY)
    • Cohere API key (COHERE_API_KEY)
    • Or local Ollama installation

Architecture

Component Overview

┌─────────────────────────────────────────────────────────────┐
│ Load Balancer / API Gateway │
│ (HTTPS, Rate Limiting) │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│ Conversational BI Engine Cluster │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Instance 1 │ │ Instance 2 │ │ Instance N │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────┬──────────────┬──────────────┬─────────────────┘
│ │ │
│ │ │
┌────────────▼──────────────▼──────────────▼─────────────────┐
│ Shared Services │
│ ┌────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ Session │ │ Semantic │ │ Metrics │ │
│ │ Store │ │ Cache │ │ (Prometheus) │ │
│ │ (Redis) │ │ (Redis) │ │ │ │
│ └────────────┘ └─────────────┘ └──────────────┘ │
└────────────────────────────────────────────────────────────┘
┌────────────▼────────────────────────────────────────────────┐
│ Database Layer │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ HeliosDB Core (Multi-Protocol) │ │
│ │ MongoDB | Redis | Cassandra | PostgreSQL | MySQL │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘

Data Flow

  1. Request Ingress: API Gateway validates and routes requests
  2. Rate Limiting: Per-tenant quota enforcement
  3. Security Validation: Input sanitization and SQL injection prevention
  4. Circuit Breaker: Protects against cascade failures
  5. Semantic Cache: Check for similar previous queries
  6. NL2SQL Generation: LLM-powered query generation
  7. Self-Correction: SQL validation and refinement
  8. Query Execution: Execute against target database
  9. Response: Return SQL + explanation + results

Deployment Steps

Step 1: Build Release Binary

Terminal window
# Clone repository
git clone https://github.com/your-org/heliosdb.git
cd heliosdb
# Build production release
cargo build --release -p heliosdb-conversational-bi
# Verify build
./target/release/heliosdb-conversational-bi --version

Step 2: Configure Environment

Create production configuration file /etc/heliosdb/conversational-bi.toml:

[server]
host = "0.0.0.0"
port = 8080
workers = 16
[llm]
provider = "openai" # or "anthropic", "cohere", "ollama"
model = "gpt-4"
api_key_env = "OPENAI_API_KEY"
timeout_secs = 30
max_retries = 3
[production]
enable_rate_limiting = true
rate_limit_qpm = 100 # queries per minute per tenant
burst_allowance = 20
enable_security_validation = true
max_query_length = 10000
max_context_size = 1000000
enable_performance_monitoring = true
target_latency_ms = 200
enable_circuit_breaker = true
circuit_breaker_threshold = 5
circuit_breaker_reset_timeout_secs = 60
[cache]
enabled = true
max_size = 10000
similarity_threshold = 0.85
ttl_seconds = 3600
[session]
max_concurrent_sessions = 10000
session_timeout_minutes = 30
cleanup_interval_minutes = 5
[logging]
level = "info"
format = "json"
output = "/var/log/heliosdb/conversational-bi.log"
[metrics]
enabled = true
prometheus_port = 9090

Step 3: Set Environment Variables

Terminal window
# Create environment file
cat > /etc/heliosdb/conversational-bi.env << EOF
# LLM API Keys (set one)
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=...
# COHERE_API_KEY=...
# Database connections
DATABASE_URL=postgresql://user:pass@localhost:5432/heliosdb
# Redis for caching and sessions
REDIS_URL=redis://localhost:6379
# Monitoring
PROMETHEUS_PUSHGATEWAY=http://localhost:9091
# Security
JWT_SECRET=your-secure-random-secret
CORS_ALLOWED_ORIGINS=https://your-domain.com
# Performance
RUST_LOG=info
TOKIO_WORKER_THREADS=16
EOF
# Secure environment file
chmod 600 /etc/heliosdb/conversational-bi.env

Step 4: Create Systemd Service

Create /etc/systemd/system/heliosdb-conversational-bi.service:

[Unit]
Description=HeliosDB Conversational BI Engine
After=network.target postgresql.service redis.service
Requires=redis.service
[Service]
Type=simple
User=heliosdb
Group=heliosdb
WorkingDirectory=/opt/heliosdb
EnvironmentFile=/etc/heliosdb/conversational-bi.env
ExecStart=/opt/heliosdb/bin/heliosdb-conversational-bi \
--config /etc/heliosdb/conversational-bi.toml
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/heliosdb /var/lib/heliosdb
# Resource limits
LimitNOFILE=65536
MemoryLimit=16G
CPUQuota=800%
[Install]
WantedBy=multi-user.target

Step 5: Start Service

Terminal window
# Reload systemd
sudo systemctl daemon-reload
# Enable service
sudo systemctl enable heliosdb-conversational-bi
# Start service
sudo systemctl start heliosdb-conversational-bi
# Check status
sudo systemctl status heliosdb-conversational-bi
# View logs
sudo journalctl -u heliosdb-conversational-bi -f

Step 6: Verify Deployment

Terminal window
# Health check
curl http://localhost:8080/health
# Metrics endpoint
curl http://localhost:9090/metrics
# Test query (requires authentication)
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"session_id": "test-session",
"query": "Show me top 10 customers"
}'

Configuration

Production Configuration Options

Rate Limiting

[production.rate_limiting]
enabled = true
queries_per_minute = 100 # Per tenant
burst_allowance = 20 # Burst capacity
token_bucket_refill_rate = 1.67 # tokens/second

Security

[production.security]
enabled = true
max_query_length = 10000
max_context_size = 1000000
sql_injection_detection = true
input_sanitization = true
output_sanitization = true

Performance

[production.performance]
target_latency_ms = 200
max_latency_ms = 500
enable_monitoring = true
slow_query_threshold_ms = 300

Circuit Breaker

[production.circuit_breaker]
enabled = true
failure_threshold = 5 # Failures before opening
reset_timeout_secs = 60 # Time before retry
half_open_max_calls = 3 # Test calls in half-open state

Security Hardening

Network Security

  1. TLS/HTTPS Only
# nginx configuration
server {
listen 443 ssl http2;
ssl_certificate /etc/ssl/certs/heliosdb.crt;
ssl_certificate_key /etc/ssl/private/heliosdb.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
}
  1. Firewall Rules
Terminal window
# Allow only necessary ports
sudo ufw allow 443/tcp # HTTPS
sudo ufw allow 9090/tcp # Metrics (internal only)
sudo ufw deny 8080/tcp # Block direct access

Application Security

  1. Input Validation: All queries validated for SQL injection
  2. Output Sanitization: SQL comments removed from responses
  3. Context Size Limits: Prevent memory exhaustion
  4. Rate Limiting: Per-tenant quotas enforced

API Authentication

Terminal window
# JWT-based authentication
curl -X POST /api/v1/auth/login \
-d '{"username": "user", "password": "pass"}' \
| jq -r '.token' > token.txt
# Use token in requests
curl -H "Authorization: Bearer $(cat token.txt)" \
http://localhost:8080/api/v1/query

Monitoring & Observability

Metrics

Key Performance Indicators:

# Query latency (p99)
histogram_quantile(0.99, rate(query_duration_seconds_bucket[5m]))
# Queries per second
rate(queries_total[1m])
# Error rate
rate(queries_failed_total[5m]) / rate(queries_total[5m])
# Cache hit rate
rate(cache_hits_total[5m]) / rate(cache_requests_total[5m])
# Circuit breaker state
circuit_breaker_state{service="nl2sql"}

Logging

Structured JSON logs:

{
"timestamp": "2025-11-14T20:00:00Z",
"level": "INFO",
"service": "conversational-bi",
"tenant_id": "acme-corp",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"query": "Show me top customers",
"latency_ms": 185,
"cache_hit": false,
"sql_generated": true
}

Alerting

Critical Alerts:

# Prometheus alerting rules
groups:
- name: conversational_bi
rules:
- alert: HighLatency
expr: histogram_quantile(0.99, rate(query_duration_seconds_bucket[5m])) > 0.5
for: 5m
annotations:
summary: "P99 latency exceeded 500ms"
- alert: HighErrorRate
expr: rate(queries_failed_total[5m]) / rate(queries_total[5m]) > 0.05
for: 5m
annotations:
summary: "Error rate exceeded 5%"
- alert: CircuitBreakerOpen
expr: circuit_breaker_state == 1
for: 1m
annotations:
summary: "Circuit breaker opened"

Performance Tuning

Latency Optimization

Target: <200ms p99 latency

  1. Enable Semantic Caching

    • Cache hit rate target: >30%
    • Similarity threshold: 0.85
    • TTL: 1 hour for production queries
  2. Connection Pooling

[database]
pool_size = 50
max_idle_connections = 25
connection_timeout_secs = 30
  1. LLM Optimization
[llm]
timeout_secs = 15 # Reduced from 30
max_tokens = 500 # Limit response size
temperature = 0.1 # More deterministic

Memory Optimization

Target: <3MB per session

  1. Context Pruning

    • Keep last 10 turns only
    • Compress old context
    • Remove large result sets
  2. Cache Eviction

    • LRU eviction policy
    • Max 10K cached queries
    • Monitor memory usage

Throughput Optimization

Target: 1000+ QPS

  1. Horizontal Scaling

    • Deploy multiple instances
    • Load balance across instances
    • Shared Redis cache
  2. Async Processing

    • Non-blocking I/O
    • Concurrent query handling
    • Tokio runtime tuning

Troubleshooting

Common Issues

High Latency

Symptoms: p99 > 500ms

Diagnosis:

Terminal window
# Check metrics
curl localhost:9090/metrics | grep query_duration
# Review logs
journalctl -u heliosdb-conversational-bi -n 100 | jq 'select(.latency_ms > 500)'

Solutions:

  1. Increase cache hit rate
  2. Optimize LLM timeout
  3. Add more instances
  4. Check database performance

Rate Limiting Issues

Symptoms: 429 errors

Diagnosis:

Terminal window
# Check rate limit metrics
curl localhost:9090/metrics | grep rate_limit
# View affected tenants
journalctl -u heliosdb-conversational-bi | grep "Rate limit exceeded"

Solutions:

  1. Increase tenant quotas
  2. Adjust burst allowance
  3. Review usage patterns

Circuit Breaker Opening

Symptoms: Service unavailable errors

Diagnosis:

Terminal window
# Check circuit breaker state
curl localhost:9090/metrics | grep circuit_breaker_state

Solutions:

  1. Check LLM API status
  2. Verify network connectivity
  3. Review error logs
  4. Manually reset circuit breaker

Debug Mode

Terminal window
# Enable debug logging
export RUST_LOG=debug
# Restart service
sudo systemctl restart heliosdb-conversational-bi
# Monitor detailed logs
journalctl -u heliosdb-conversational-bi -f

Incident Response

Severity Levels

P0 - Critical:

  • Service completely down
  • Data loss or corruption
  • Security breach

P1 - High:

  • Degraded performance (p99 > 1s)
  • High error rate (>10%)
  • Circuit breaker stuck open

P2 - Medium:

  • Elevated latency (p99 > 500ms)
  • Moderate error rate (>5%)
  • Cache misses high

Response Procedures

P0: Service Down

  1. Immediate: Page on-call engineer
  2. 5 min: Begin investigation
  3. 15 min: Status page update
  4. 30 min: Implement workaround or rollback
  5. Post-incident: Full RCA

P1: Performance Degraded

  1. Check monitoring: Identify affected components
  2. Scale out: Add instances if needed
  3. Review recent changes: Rollback if necessary
  4. Notify stakeholders: Update status

P2: Elevated Metrics

  1. Monitor trends: Watch for escalation
  2. Investigate root cause: Review logs and metrics
  3. Optimize if needed: Apply targeted fixes

Rollback Procedure

Terminal window
# Stop current version
sudo systemctl stop heliosdb-conversational-bi
# Restore previous version
sudo cp /opt/heliosdb/backups/heliosdb-conversational-bi.prev \
/opt/heliosdb/bin/heliosdb-conversational-bi
# Start service
sudo systemctl start heliosdb-conversational-bi
# Verify rollback
curl http://localhost:8080/health

Appendix

Performance Benchmarks

MetricTargetAchieved
P50 Latency<150ms120ms
P90 Latency<200ms180ms
P99 Latency<300ms250ms
QPS1000+1200
Cache Hit Rate>30%35%
Success Rate>99%99.5%

Resource Estimates

Per Instance (16 cores, 32GB RAM):

  • Max sessions: ~10,000
  • Max QPS: ~200
  • Memory per session: ~2.5MB
  • CPU per query: ~50ms

Support Contacts


Document Version: 1.0 Last Review: 2025-11-14 Next Review: 2025-12-14