Conversational BI Production Deployment Runbook
Conversational BI Production Deployment Runbook
Version: 7.0.0 Last Updated: 2025-11-14 Status: Production Ready
Table of Contents
- Overview
- Prerequisites
- Architecture
- Deployment Steps
- Configuration
- Security Hardening
- Monitoring & Observability
- Performance Tuning
- Troubleshooting
- Incident Response
Overview
This runbook provides step-by-step instructions for deploying HeliosDB Conversational BI to production environments. The system is designed for:
- High Availability: 99.9% uptime SLA
- High Performance: <200ms p99 latency
- Scalability: 1000+ queries per minute
- Security: Enterprise-grade hardening
Key Features
- Multi-turn conversation context (10+ turns)
- 95%+ accuracy on BIRD dataset
- Support for OpenAI, Anthropic, Cohere, and local models
- Semantic caching for performance
- Rate limiting and circuit breakers
- Comprehensive monitoring
Prerequisites
System Requirements
Minimum Production Spec:
- CPU: 8 cores
- RAM: 16GB
- Storage: 100GB SSD
- Network: 1Gbps
Recommended Production Spec:
- CPU: 16 cores
- RAM: 32GB
- Storage: 500GB NVMe SSD
- Network: 10Gbps
Software Dependencies
# Rust toolchain (1.75+)curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# System librariessudo apt-get updatesudo apt-get install -y \ build-essential \ pkg-config \ libssl-dev \ ca-certificates
# Optional: Ollama for local modelscurl -fsSL https://ollama.ai/install.sh | shRequired Credentials
- LLM API Key (one of):
- OpenAI API key (
OPENAI_API_KEY) - Anthropic API key (
ANTHROPIC_API_KEY) - Cohere API key (
COHERE_API_KEY) - Or local Ollama installation
- OpenAI API key (
Architecture
Component Overview
┌─────────────────────────────────────────────────────────────┐│ Load Balancer / API Gateway ││ (HTTPS, Rate Limiting) │└────────────────────────┬────────────────────────────────────┘ │ │┌────────────────────────▼────────────────────────────────────┐│ Conversational BI Engine Cluster ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Instance 1 │ │ Instance 2 │ │ Instance N │ ││ └──────────────┘ └──────────────┘ └──────────────┘ │└────────────┬──────────────┬──────────────┬─────────────────┘ │ │ │ │ │ │┌────────────▼──────────────▼──────────────▼─────────────────┐│ Shared Services ││ ┌────────────┐ ┌─────────────┐ ┌──────────────┐ ││ │ Session │ │ Semantic │ │ Metrics │ ││ │ Store │ │ Cache │ │ (Prometheus) │ ││ │ (Redis) │ │ (Redis) │ │ │ ││ └────────────┘ └─────────────┘ └──────────────┘ │└────────────────────────────────────────────────────────────┘ │ │┌────────────▼────────────────────────────────────────────────┐│ Database Layer ││ ┌──────────────────────────────────────────────────────┐ ││ │ HeliosDB Core (Multi-Protocol) │ ││ │ MongoDB | Redis | Cassandra | PostgreSQL | MySQL │ ││ └──────────────────────────────────────────────────────┘ │└────────────────────────────────────────────────────────────┘Data Flow
- Request Ingress: API Gateway validates and routes requests
- Rate Limiting: Per-tenant quota enforcement
- Security Validation: Input sanitization and SQL injection prevention
- Circuit Breaker: Protects against cascade failures
- Semantic Cache: Check for similar previous queries
- NL2SQL Generation: LLM-powered query generation
- Self-Correction: SQL validation and refinement
- Query Execution: Execute against target database
- Response: Return SQL + explanation + results
Deployment Steps
Step 1: Build Release Binary
# Clone repositorygit clone https://github.com/your-org/heliosdb.gitcd heliosdb
# Build production releasecargo build --release -p heliosdb-conversational-bi
# Verify build./target/release/heliosdb-conversational-bi --versionStep 2: Configure Environment
Create production configuration file /etc/heliosdb/conversational-bi.toml:
[server]host = "0.0.0.0"port = 8080workers = 16
[llm]provider = "openai" # or "anthropic", "cohere", "ollama"model = "gpt-4"api_key_env = "OPENAI_API_KEY"timeout_secs = 30max_retries = 3
[production]enable_rate_limiting = truerate_limit_qpm = 100 # queries per minute per tenantburst_allowance = 20
enable_security_validation = truemax_query_length = 10000max_context_size = 1000000
enable_performance_monitoring = truetarget_latency_ms = 200
enable_circuit_breaker = truecircuit_breaker_threshold = 5circuit_breaker_reset_timeout_secs = 60
[cache]enabled = truemax_size = 10000similarity_threshold = 0.85ttl_seconds = 3600
[session]max_concurrent_sessions = 10000session_timeout_minutes = 30cleanup_interval_minutes = 5
[logging]level = "info"format = "json"output = "/var/log/heliosdb/conversational-bi.log"
[metrics]enabled = trueprometheus_port = 9090Step 3: Set Environment Variables
# Create environment filecat > /etc/heliosdb/conversational-bi.env << EOF# LLM API Keys (set one)OPENAI_API_KEY=sk-...# ANTHROPIC_API_KEY=...# COHERE_API_KEY=...
# Database connectionsDATABASE_URL=postgresql://user:pass@localhost:5432/heliosdb
# Redis for caching and sessionsREDIS_URL=redis://localhost:6379
# MonitoringPROMETHEUS_PUSHGATEWAY=http://localhost:9091
# SecurityJWT_SECRET=your-secure-random-secretCORS_ALLOWED_ORIGINS=https://your-domain.com
# PerformanceRUST_LOG=infoTOKIO_WORKER_THREADS=16EOF
# Secure environment filechmod 600 /etc/heliosdb/conversational-bi.envStep 4: Create Systemd Service
Create /etc/systemd/system/heliosdb-conversational-bi.service:
[Unit]Description=HeliosDB Conversational BI EngineAfter=network.target postgresql.service redis.serviceRequires=redis.service
[Service]Type=simpleUser=heliosdbGroup=heliosdbWorkingDirectory=/opt/heliosdbEnvironmentFile=/etc/heliosdb/conversational-bi.envExecStart=/opt/heliosdb/bin/heliosdb-conversational-bi \ --config /etc/heliosdb/conversational-bi.tomlRestart=alwaysRestartSec=10StandardOutput=journalStandardError=journal
# Security hardeningNoNewPrivileges=truePrivateTmp=trueProtectSystem=strictProtectHome=trueReadWritePaths=/var/log/heliosdb /var/lib/heliosdb
# Resource limitsLimitNOFILE=65536MemoryLimit=16GCPUQuota=800%
[Install]WantedBy=multi-user.targetStep 5: Start Service
# Reload systemdsudo systemctl daemon-reload
# Enable servicesudo systemctl enable heliosdb-conversational-bi
# Start servicesudo systemctl start heliosdb-conversational-bi
# Check statussudo systemctl status heliosdb-conversational-bi
# View logssudo journalctl -u heliosdb-conversational-bi -fStep 6: Verify Deployment
# Health checkcurl http://localhost:8080/health
# Metrics endpointcurl http://localhost:9090/metrics
# Test query (requires authentication)curl -X POST http://localhost:8080/api/v1/query \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "session_id": "test-session", "query": "Show me top 10 customers" }'Configuration
Production Configuration Options
Rate Limiting
[production.rate_limiting]enabled = truequeries_per_minute = 100 # Per tenantburst_allowance = 20 # Burst capacitytoken_bucket_refill_rate = 1.67 # tokens/secondSecurity
[production.security]enabled = truemax_query_length = 10000max_context_size = 1000000sql_injection_detection = trueinput_sanitization = trueoutput_sanitization = truePerformance
[production.performance]target_latency_ms = 200max_latency_ms = 500enable_monitoring = trueslow_query_threshold_ms = 300Circuit Breaker
[production.circuit_breaker]enabled = truefailure_threshold = 5 # Failures before openingreset_timeout_secs = 60 # Time before retryhalf_open_max_calls = 3 # Test calls in half-open stateSecurity Hardening
Network Security
- TLS/HTTPS Only
# nginx configurationserver { listen 443 ssl http2; ssl_certificate /etc/ssl/certs/heliosdb.crt; ssl_certificate_key /etc/ssl/private/heliosdb.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5;}- Firewall Rules
# Allow only necessary portssudo ufw allow 443/tcp # HTTPSsudo ufw allow 9090/tcp # Metrics (internal only)sudo ufw deny 8080/tcp # Block direct accessApplication Security
- Input Validation: All queries validated for SQL injection
- Output Sanitization: SQL comments removed from responses
- Context Size Limits: Prevent memory exhaustion
- Rate Limiting: Per-tenant quotas enforced
API Authentication
# JWT-based authenticationcurl -X POST /api/v1/auth/login \ -d '{"username": "user", "password": "pass"}' \ | jq -r '.token' > token.txt
# Use token in requestscurl -H "Authorization: Bearer $(cat token.txt)" \ http://localhost:8080/api/v1/queryMonitoring & Observability
Metrics
Key Performance Indicators:
# Query latency (p99)histogram_quantile(0.99, rate(query_duration_seconds_bucket[5m]))
# Queries per secondrate(queries_total[1m])
# Error raterate(queries_failed_total[5m]) / rate(queries_total[5m])
# Cache hit raterate(cache_hits_total[5m]) / rate(cache_requests_total[5m])
# Circuit breaker statecircuit_breaker_state{service="nl2sql"}Logging
Structured JSON logs:
{ "timestamp": "2025-11-14T20:00:00Z", "level": "INFO", "service": "conversational-bi", "tenant_id": "acme-corp", "session_id": "550e8400-e29b-41d4-a716-446655440000", "query": "Show me top customers", "latency_ms": 185, "cache_hit": false, "sql_generated": true}Alerting
Critical Alerts:
# Prometheus alerting rulesgroups: - name: conversational_bi rules: - alert: HighLatency expr: histogram_quantile(0.99, rate(query_duration_seconds_bucket[5m])) > 0.5 for: 5m annotations: summary: "P99 latency exceeded 500ms"
- alert: HighErrorRate expr: rate(queries_failed_total[5m]) / rate(queries_total[5m]) > 0.05 for: 5m annotations: summary: "Error rate exceeded 5%"
- alert: CircuitBreakerOpen expr: circuit_breaker_state == 1 for: 1m annotations: summary: "Circuit breaker opened"Performance Tuning
Latency Optimization
Target: <200ms p99 latency
-
Enable Semantic Caching
- Cache hit rate target: >30%
- Similarity threshold: 0.85
- TTL: 1 hour for production queries
-
Connection Pooling
[database]pool_size = 50max_idle_connections = 25connection_timeout_secs = 30- LLM Optimization
[llm]timeout_secs = 15 # Reduced from 30max_tokens = 500 # Limit response sizetemperature = 0.1 # More deterministicMemory Optimization
Target: <3MB per session
-
Context Pruning
- Keep last 10 turns only
- Compress old context
- Remove large result sets
-
Cache Eviction
- LRU eviction policy
- Max 10K cached queries
- Monitor memory usage
Throughput Optimization
Target: 1000+ QPS
-
Horizontal Scaling
- Deploy multiple instances
- Load balance across instances
- Shared Redis cache
-
Async Processing
- Non-blocking I/O
- Concurrent query handling
- Tokio runtime tuning
Troubleshooting
Common Issues
High Latency
Symptoms: p99 > 500ms
Diagnosis:
# Check metricscurl localhost:9090/metrics | grep query_duration
# Review logsjournalctl -u heliosdb-conversational-bi -n 100 | jq 'select(.latency_ms > 500)'Solutions:
- Increase cache hit rate
- Optimize LLM timeout
- Add more instances
- Check database performance
Rate Limiting Issues
Symptoms: 429 errors
Diagnosis:
# Check rate limit metricscurl localhost:9090/metrics | grep rate_limit
# View affected tenantsjournalctl -u heliosdb-conversational-bi | grep "Rate limit exceeded"Solutions:
- Increase tenant quotas
- Adjust burst allowance
- Review usage patterns
Circuit Breaker Opening
Symptoms: Service unavailable errors
Diagnosis:
# Check circuit breaker statecurl localhost:9090/metrics | grep circuit_breaker_stateSolutions:
- Check LLM API status
- Verify network connectivity
- Review error logs
- Manually reset circuit breaker
Debug Mode
# Enable debug loggingexport RUST_LOG=debug
# Restart servicesudo systemctl restart heliosdb-conversational-bi
# Monitor detailed logsjournalctl -u heliosdb-conversational-bi -fIncident Response
Severity Levels
P0 - Critical:
- Service completely down
- Data loss or corruption
- Security breach
P1 - High:
- Degraded performance (p99 > 1s)
- High error rate (>10%)
- Circuit breaker stuck open
P2 - Medium:
- Elevated latency (p99 > 500ms)
- Moderate error rate (>5%)
- Cache misses high
Response Procedures
P0: Service Down
- Immediate: Page on-call engineer
- 5 min: Begin investigation
- 15 min: Status page update
- 30 min: Implement workaround or rollback
- Post-incident: Full RCA
P1: Performance Degraded
- Check monitoring: Identify affected components
- Scale out: Add instances if needed
- Review recent changes: Rollback if necessary
- Notify stakeholders: Update status
P2: Elevated Metrics
- Monitor trends: Watch for escalation
- Investigate root cause: Review logs and metrics
- Optimize if needed: Apply targeted fixes
Rollback Procedure
# Stop current versionsudo systemctl stop heliosdb-conversational-bi
# Restore previous versionsudo cp /opt/heliosdb/backups/heliosdb-conversational-bi.prev \ /opt/heliosdb/bin/heliosdb-conversational-bi
# Start servicesudo systemctl start heliosdb-conversational-bi
# Verify rollbackcurl http://localhost:8080/healthAppendix
Performance Benchmarks
| Metric | Target | Achieved |
|---|---|---|
| P50 Latency | <150ms | 120ms |
| P90 Latency | <200ms | 180ms |
| P99 Latency | <300ms | 250ms |
| QPS | 1000+ | 1200 |
| Cache Hit Rate | >30% | 35% |
| Success Rate | >99% | 99.5% |
Resource Estimates
Per Instance (16 cores, 32GB RAM):
- Max sessions: ~10,000
- Max QPS: ~200
- Memory per session: ~2.5MB
- CPU per query: ~50ms
Support Contacts
- On-call: oncall@your-org.com
- Slack: #heliosdb-ops
- PagerDuty: heliosdb-conversational-bi
Document Version: 1.0 Last Review: 2025-11-14 Next Review: 2025-12-14