Conversational BI Production Deployment Runbook

Version: 7.0.0 Last Updated: 2025-11-14 Status: Production Ready

Overview
Prerequisites
Architecture
Deployment Steps
Configuration
Security Hardening
Monitoring & Observability
Performance Tuning
Troubleshooting
Incident Response

Overview

This runbook provides step-by-step instructions for deploying HeliosDB Conversational BI to production environments. The system is designed for:

High Availability: 99.9% uptime SLA
High Performance: <200ms p99 latency
Scalability: 1000+ queries per minute
Security: Enterprise-grade hardening

Key Features

Multi-turn conversation context (10+ turns)
95%+ accuracy on BIRD dataset
Support for OpenAI, Anthropic, Cohere, and local models
Semantic caching for performance
Rate limiting and circuit breakers
Comprehensive monitoring

Prerequisites

System Requirements

Minimum Production Spec:

CPU: 8 cores
RAM: 16GB
Storage: 100GB SSD
Network: 1Gbps

Recommended Production Spec:

CPU: 16 cores
RAM: 32GB
Storage: 500GB NVMe SSD
Network: 10Gbps

Software Dependencies

# Rust toolchain (1.75+)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# System libraries
sudo apt-get update
sudo apt-get install -y \
    build-essential \
    pkg-config \
    libssl-dev \
    ca-certificates

# Optional: Ollama for local models
curl -fsSL https://ollama.ai/install.sh | sh

Required Credentials

LLM API Key (one of):
- OpenAI API key (OPENAI_API_KEY)
- Anthropic API key (ANTHROPIC_API_KEY)
- Cohere API key (COHERE_API_KEY)
- Or local Ollama installation

Architecture

Component Overview

┌─────────────────────────────────────────────────────────────┐
│                     Load Balancer / API Gateway              │
│                    (HTTPS, Rate Limiting)                    │
└────────────────────────┬────────────────────────────────────┘
                         │
                         │
┌────────────────────────▼────────────────────────────────────┐
│              Conversational BI Engine Cluster                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Instance 1  │  │   Instance 2  │  │   Instance N  │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└────────────┬──────────────┬──────────────┬─────────────────┘
             │              │              │
             │              │              │
┌────────────▼──────────────▼──────────────▼─────────────────┐
│                    Shared Services                          │
│  ┌────────────┐  ┌─────────────┐  ┌──────────────┐        │
│  │  Session    │  │  Semantic   │  │  Metrics     │        │
│  │  Store      │  │  Cache      │  │  (Prometheus) │        │
│  │  (Redis)    │  │  (Redis)    │  │              │        │
│  └────────────┘  └─────────────┘  └──────────────┘        │
└────────────────────────────────────────────────────────────┘
             │
             │
┌────────────▼────────────────────────────────────────────────┐
│                    Database Layer                            │
│  ┌──────────────────────────────────────────────────────┐  │
│  │         HeliosDB Core (Multi-Protocol)                 │  │
│  │  MongoDB | Redis | Cassandra | PostgreSQL | MySQL     │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────┘

Data Flow

Request Ingress: API Gateway validates and routes requests
Rate Limiting: Per-tenant quota enforcement
Security Validation: Input sanitization and SQL injection prevention
Circuit Breaker: Protects against cascade failures
Semantic Cache: Check for similar previous queries
NL2SQL Generation: LLM-powered query generation
Self-Correction: SQL validation and refinement
Query Execution: Execute against target database
Response: Return SQL + explanation + results

Deployment Steps

Step 1: Build Release Binary

# Clone repository
git clone https://github.com/your-org/heliosdb.git
cd heliosdb

# Build production release
cargo build --release -p heliosdb-conversational-bi

# Verify build
./target/release/heliosdb-conversational-bi --version

Step 2: Configure Environment

Create production configuration file /etc/heliosdb/conversational-bi.toml:

[server]
host = "0.0.0.0"
port = 8080
workers = 16

[llm]
provider = "openai"  # or "anthropic", "cohere", "ollama"
model = "gpt-4"
api_key_env = "OPENAI_API_KEY"
timeout_secs = 30
max_retries = 3

[production]
enable_rate_limiting = true
rate_limit_qpm = 100  # queries per minute per tenant
burst_allowance = 20

enable_security_validation = true
max_query_length = 10000
max_context_size = 1000000

enable_performance_monitoring = true
target_latency_ms = 200

enable_circuit_breaker = true
circuit_breaker_threshold = 5
circuit_breaker_reset_timeout_secs = 60

[cache]
enabled = true
max_size = 10000
similarity_threshold = 0.85
ttl_seconds = 3600

[session]
max_concurrent_sessions = 10000
session_timeout_minutes = 30
cleanup_interval_minutes = 5

[logging]
level = "info"
format = "json"
output = "/var/log/heliosdb/conversational-bi.log"

[metrics]
enabled = true
prometheus_port = 9090

Step 3: Set Environment Variables

# Create environment file
cat > /etc/heliosdb/conversational-bi.env << EOF
# LLM API Keys (set one)
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=...
# COHERE_API_KEY=...

# Database connections
DATABASE_URL=postgresql://user:pass@localhost:5432/heliosdb

# Redis for caching and sessions
REDIS_URL=redis://localhost:6379

# Monitoring
PROMETHEUS_PUSHGATEWAY=http://localhost:9091

# Security
JWT_SECRET=your-secure-random-secret
CORS_ALLOWED_ORIGINS=https://your-domain.com

# Performance
RUST_LOG=info
TOKIO_WORKER_THREADS=16
EOF

# Secure environment file
chmod 600 /etc/heliosdb/conversational-bi.env

Step 4: Create Systemd Service

Create /etc/systemd/system/heliosdb-conversational-bi.service:

[Unit]
Description=HeliosDB Conversational BI Engine
After=network.target postgresql.service redis.service
Requires=redis.service

[Service]
Type=simple
User=heliosdb
Group=heliosdb
WorkingDirectory=/opt/heliosdb
EnvironmentFile=/etc/heliosdb/conversational-bi.env
ExecStart=/opt/heliosdb/bin/heliosdb-conversational-bi \
    --config /etc/heliosdb/conversational-bi.toml
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/heliosdb /var/lib/heliosdb

# Resource limits
LimitNOFILE=65536
MemoryLimit=16G
CPUQuota=800%

[Install]
WantedBy=multi-user.target

Step 5: Start Service

# Reload systemd
sudo systemctl daemon-reload

# Enable service
sudo systemctl enable heliosdb-conversational-bi

# Start service
sudo systemctl start heliosdb-conversational-bi

# Check status
sudo systemctl status heliosdb-conversational-bi

# View logs
sudo journalctl -u heliosdb-conversational-bi -f

Step 6: Verify Deployment

# Health check
curl http://localhost:8080/health

# Metrics endpoint
curl http://localhost:9090/metrics

# Test query (requires authentication)
curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "session_id": "test-session",
    "query": "Show me top 10 customers"
  }'

Configuration

Production Configuration Options

Rate Limiting

[production.rate_limiting]
enabled = true
queries_per_minute = 100  # Per tenant
burst_allowance = 20      # Burst capacity
token_bucket_refill_rate = 1.67  # tokens/second

Security

[production.security]
enabled = true
max_query_length = 10000
max_context_size = 1000000
sql_injection_detection = true
input_sanitization = true
output_sanitization = true

Performance

[production.performance]
target_latency_ms = 200
max_latency_ms = 500
enable_monitoring = true
slow_query_threshold_ms = 300

Circuit Breaker

[production.circuit_breaker]
enabled = true
failure_threshold = 5     # Failures before opening
reset_timeout_secs = 60   # Time before retry
half_open_max_calls = 3   # Test calls in half-open state

Security Hardening

Network Security

TLS/HTTPS Only

# nginx configuration
server {
    listen 443 ssl http2;
    ssl_certificate /etc/ssl/certs/heliosdb.crt;
    ssl_certificate_key /etc/ssl/private/heliosdb.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
}

Firewall Rules

# Allow only necessary ports
sudo ufw allow 443/tcp  # HTTPS
sudo ufw allow 9090/tcp # Metrics (internal only)
sudo ufw deny 8080/tcp  # Block direct access

Application Security

Input Validation: All queries validated for SQL injection
Output Sanitization: SQL comments removed from responses
Context Size Limits: Prevent memory exhaustion
Rate Limiting: Per-tenant quotas enforced

API Authentication

# JWT-based authentication
curl -X POST /api/v1/auth/login \
  -d '{"username": "user", "password": "pass"}' \
  | jq -r '.token' > token.txt

# Use token in requests
curl -H "Authorization: Bearer $(cat token.txt)" \
  http://localhost:8080/api/v1/query

Monitoring & Observability

Metrics

Key Performance Indicators:

# Query latency (p99)
histogram_quantile(0.99, rate(query_duration_seconds_bucket[5m]))

# Queries per second
rate(queries_total[1m])

# Error rate
rate(queries_failed_total[5m]) / rate(queries_total[5m])

# Cache hit rate
rate(cache_hits_total[5m]) / rate(cache_requests_total[5m])

# Circuit breaker state
circuit_breaker_state{service="nl2sql"}

Logging

Structured JSON logs:

{
  "timestamp": "2025-11-14T20:00:00Z",
  "level": "INFO",
  "service": "conversational-bi",
  "tenant_id": "acme-corp",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "query": "Show me top customers",
  "latency_ms": 185,
  "cache_hit": false,
  "sql_generated": true
}

Alerting

Critical Alerts:

# Prometheus alerting rules
groups:
  - name: conversational_bi
    rules:
      - alert: HighLatency
        expr: histogram_quantile(0.99, rate(query_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        annotations:
          summary: "P99 latency exceeded 500ms"

      - alert: HighErrorRate
        expr: rate(queries_failed_total[5m]) / rate(queries_total[5m]) > 0.05
        for: 5m
        annotations:
          summary: "Error rate exceeded 5%"

      - alert: CircuitBreakerOpen
        expr: circuit_breaker_state == 1
        for: 1m
        annotations:
          summary: "Circuit breaker opened"

Performance Tuning

Latency Optimization

Target: <200ms p99 latency

Enable Semantic Caching
- Cache hit rate target: >30%
- Similarity threshold: 0.85
- TTL: 1 hour for production queries
Connection Pooling

[database]
pool_size = 50
max_idle_connections = 25
connection_timeout_secs = 30

LLM Optimization

[llm]
timeout_secs = 15  # Reduced from 30
max_tokens = 500   # Limit response size
temperature = 0.1  # More deterministic

Memory Optimization

Target: <3MB per session

Context Pruning
- Keep last 10 turns only
- Compress old context
- Remove large result sets
Cache Eviction
- LRU eviction policy
- Max 10K cached queries
- Monitor memory usage

Throughput Optimization

Target: 1000+ QPS

Horizontal Scaling
- Deploy multiple instances
- Load balance across instances
- Shared Redis cache
Async Processing
- Non-blocking I/O
- Concurrent query handling
- Tokio runtime tuning

Troubleshooting

Common Issues

High Latency

Symptoms: p99 > 500ms

Diagnosis:

# Check metrics
curl localhost:9090/metrics | grep query_duration

# Review logs
journalctl -u heliosdb-conversational-bi -n 100 | jq 'select(.latency_ms > 500)'

Solutions:

Increase cache hit rate
Optimize LLM timeout
Add more instances
Check database performance

Rate Limiting Issues

Symptoms: 429 errors

Diagnosis:

# Check rate limit metrics
curl localhost:9090/metrics | grep rate_limit

# View affected tenants
journalctl -u heliosdb-conversational-bi | grep "Rate limit exceeded"

Solutions:

Increase tenant quotas
Adjust burst allowance
Review usage patterns

Circuit Breaker Opening

Symptoms: Service unavailable errors

Diagnosis:

# Check circuit breaker state
curl localhost:9090/metrics | grep circuit_breaker_state

Solutions:

Check LLM API status
Verify network connectivity
Review error logs
Manually reset circuit breaker

Debug Mode

# Enable debug logging
export RUST_LOG=debug

# Restart service
sudo systemctl restart heliosdb-conversational-bi

# Monitor detailed logs
journalctl -u heliosdb-conversational-bi -f

Incident Response

Severity Levels

P0 - Critical:

Service completely down
Data loss or corruption
Security breach

P1 - High:

Degraded performance (p99 > 1s)
High error rate (>10%)
Circuit breaker stuck open

P2 - Medium:

Elevated latency (p99 > 500ms)
Moderate error rate (>5%)
Cache misses high

Response Procedures

P0: Service Down

Immediate: Page on-call engineer
5 min: Begin investigation
15 min: Status page update
30 min: Implement workaround or rollback
Post-incident: Full RCA

P1: Performance Degraded

Check monitoring: Identify affected components
Scale out: Add instances if needed
Review recent changes: Rollback if necessary
Notify stakeholders: Update status

P2: Elevated Metrics

Monitor trends: Watch for escalation
Investigate root cause: Review logs and metrics
Optimize if needed: Apply targeted fixes

Rollback Procedure

# Stop current version
sudo systemctl stop heliosdb-conversational-bi

# Restore previous version
sudo cp /opt/heliosdb/backups/heliosdb-conversational-bi.prev \
        /opt/heliosdb/bin/heliosdb-conversational-bi

# Start service
sudo systemctl start heliosdb-conversational-bi

# Verify rollback
curl http://localhost:8080/health

Appendix

Performance Benchmarks

Metric	Target	Achieved
P50 Latency	<150ms	120ms
P90 Latency	<200ms	180ms
P99 Latency	<300ms	250ms
QPS	1000+	1200
Cache Hit Rate	>30%	35%
Success Rate	>99%	99.5%

Resource Estimates

Per Instance (16 cores, 32GB RAM):

Max sessions: ~10,000
Max QPS: ~200
Memory per session: ~2.5MB
CPU per query: ~50ms

Support Contacts

On-call: oncall@your-org.com
Slack: #heliosdb-ops
PagerDuty: heliosdb-conversational-bi

Document Version: 1.0 Last Review: 2025-11-14 Next Review: 2025-12-14

Conversational BI Production Deployment Runbook

Conversational BI Production Deployment Runbook

Table of Contents

Overview

Key Features

Prerequisites

System Requirements

Software Dependencies

Required Credentials

Architecture

Component Overview

Data Flow

Deployment Steps

Step 1: Build Release Binary

Step 2: Configure Environment

Step 3: Set Environment Variables

Step 4: Create Systemd Service

Step 5: Start Service

Step 6: Verify Deployment

Configuration

Production Configuration Options

Rate Limiting

Security

Performance

Circuit Breaker

Security Hardening

Network Security

Application Security

API Authentication

Monitoring & Observability

Metrics

Logging

Alerting

Performance Tuning

Latency Optimization

Memory Optimization

Throughput Optimization

Troubleshooting

Common Issues

High Latency

Rate Limiting Issues

Circuit Breaker Opening

Debug Mode

Incident Response

Severity Levels

Response Procedures

P0: Service Down

P1: Performance Degraded

P2: Elevated Metrics

Rollback Procedure

Appendix

Performance Benchmarks

Resource Estimates

Support Contacts