HeliosDB NL2SQL Production Deployment Guide

Version: 1.0 Last Updated: October 28, 2025 Status: Production Ready

Deployment Overview
Configuration
Installation & Setup
Monitoring & Observability
Operational Runbooks
Production Checklist
Scaling Guidelines
Troubleshooting Guide

1. Deployment Overview

1.1 Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENT LAYER                             │
│                  (Applications, Web UI, APIs)                    │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    NL2SQL ENGINE (Core)                          │
│  ┌───────────────┐  ┌──────────────┐  ┌────────────────────┐   │
│  │   Query       │  │  Agentic     │  │   Code Advisor     │   │
│  │  Complexity   │─▶│  Decomposer  │◀─│  (Performance      │   │
│  │  Assessment   │  │              │  │   Analysis)        │   │
│  └───────────────┘  └──────┬───────┘  └──────────┬─────────┘   │
│         │                   │                     │              │
│         ▼                   ▼                     │              │
│  ┌──────────────────────────────────────────┐    │              │
│  │      Schema Retriever & Cache            │    │              │
│  │  (Skeleton Cache + Metadata Integration) │    │              │
│  └────────────┬─────────────────────────────┘    │              │
│               │                                   │              │
│               ▼                                   ▼              │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              LLM Translation Layer                        │  │
│  │  (OpenAI, Anthropic, DeepSeek, Grok - Multi-provider)   │  │
│  └────────────┬─────────────────────────────────────────────┘  │
└───────────────┼────────────────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────────────────────────┐
│                   STATISTICS & METADATA LAYER                    │
│  ┌──────────────────┐  ┌─────────────────┐  ┌────────────────┐ │
│  │  Statistics      │  │  Metadata       │  │  Refresh       │ │
│  │  Service         │  │  Adapter        │  │  Manager       │ │
│  │  (Cardinality,   │  │  (Schema        │  │  (Priority     │ │
│  │   Selectivity)   │  │   Integration)  │  │   Queue)       │ │
│  └──────────────────┘  └─────────────────┘  └────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    DATABASE METADATA LAYER                       │
│             (heliosdb-metadata: Tables, Columns,                 │
│              Indexes, Constraints, Statistics)                   │
└─────────────────────────────────────────────────────────────────┘

1.2 Component Dependencies

Critical Dependencies:

heliosdb-metadata: Schema service, constraint metadata
heliosdb-nl2sql: Core NL2SQL engine, agentic decomposer, code advisor
LLM Provider APIs: OpenAI, Anthropic, DeepSeek, Grok
Statistics cache: In-memory (parking_lot RwLock) or Redis for distributed deployments

Data Flow:

Natural language query → Complexity assessment
Simple queries → Direct translation
Complex queries → Agentic decomposition → Task execution
Generated SQL → Code Advisor analysis → Performance recommendations
Statistics service → Real-time cardinality/selectivity estimates
Metadata adapter → Schema integration → Index recommendations

1.3 System Requirements

Minimum Production Requirements:

CPU: 4 cores (8 cores recommended for high throughput)
Memory: 8GB RAM (16GB+ for large schemas)
Storage: 50GB SSD (for caching, logs, statistics)
Network: 1Gbps (low latency to LLM APIs critical)
OS: Linux (Ubuntu 22.04 LTS, RHEL 8+, or equivalent)

Recommended Production Configuration:

CPU: 16+ cores for parallel task execution
Memory: 32GB+ RAM for large-scale statistics caching
Storage: 500GB+ NVMe SSD for high IOPS
Network: 10Gbps with dedicated LLM API egress
Database: PostgreSQL 14+, MySQL 8.0+, or HeliosDB native

1.4 Prerequisites

Software:

Rust 1.70+ (for compilation)
PostgreSQL 14+ or MySQL 8.0+ (for metadata storage)
Redis 7+ (optional, for distributed caching)
Prometheus + Grafana (for monitoring)
Docker + Docker Compose (optional, for containerized deployment)

API Keys:

OpenAI API key (GPT-4, GPT-3.5-turbo)
Anthropic API key (Claude 3 Opus/Sonnet)
DeepSeek API key (optional)
Grok API key (optional)

Network Access:

Outbound HTTPS to LLM provider endpoints
Inbound on application ports (default: 8080 for HTTP API)
Database connectivity (PostgreSQL: 5432, MySQL: 3306)

2. Configuration

2.1 Complete TOML Configuration Reference

/etc/heliosdb/nl2sql.toml:

# ===== NL2SQL Core Configuration =====
[nl2sql]
# Enable/disable NL2SQL engine
enabled = true

# Query timeout (seconds)
query_timeout = 30

# Maximum concurrent queries
max_concurrent_queries = 100

# Cache size (MB) for skeleton cache
cache_size_mb = 500

# Cache TTL (seconds)
cache_ttl = 3600

# Default LLM provider (openai, anthropic, deepseek, grok)
default_provider = "openai"

# Fallback providers (used if primary fails)
fallback_providers = ["anthropic", "deepseek"]

# ===== Agentic Decomposition =====
[nl2sql.agentic]
# Enable agentic task decomposition for complex queries
enabled = true

# Complexity threshold (1-10) - queries above this trigger decomposition
complexity_threshold = 5

# Maximum tasks per decomposition
max_tasks = 10

# Maximum execution stages
max_stages = 5

# Agentic execution timeout (seconds)
timeout_seconds = 30

# Enable parallel task execution
parallel_execution = true

# Maximum parallel tasks per stage
max_parallel_tasks = 4

# ===== Code Advisor =====
[nl2sql.advisor]
# Enable Code Advisor for query optimization
enabled = true

# Automatically recommend indexes
auto_recommend_indexes = true

# Minimum quality score threshold (0-100)
min_quality_score = 50

# Enable statistics integration
use_statistics = true

# Enable metadata integration
use_metadata = true

# Bottleneck detection threshold
bottleneck_severity_threshold = 6

# ===== Statistics Service =====
[nl2sql.statistics]
# Enable statistics service
enabled = true

# Statistics refresh interval (seconds)
refresh_interval_seconds = 3600

# Cache size for table statistics (MB)
cache_size_mb = 100

# Cache size for column statistics (MB)
column_cache_size_mb = 200

# Use histogram-based estimates
enable_histograms = true

# Histogram bucket count
histogram_buckets = 100

# Minimum table size (rows) for statistics collection
min_table_size = 1000

# ===== Metadata Adapter =====
[nl2sql.metadata]
# Enable metadata integration
enabled = true

# Metadata refresh interval (seconds)
refresh_interval = 3600

# Collect constraint metadata
collect_constraints = true

# Collect index metadata
collect_indexes = true

# Collect foreign key metadata
collect_foreign_keys = true

# Max concurrent metadata collection tasks
max_concurrent_collections = 10

# ===== Statistics Refresh Manager =====
[nl2sql.refresh_manager]
# Enable background statistics refresh
enabled = true

# Refresh queue capacity
queue_capacity = 1000

# Worker threads for refresh tasks
worker_threads = 4

# Priority levels (higher = more urgent)
high_priority_threshold = 8
medium_priority_threshold = 5

# Auto-refresh on query (adaptive)
auto_refresh_on_query = true

# Staleness threshold (seconds) - trigger refresh if stats older than this
staleness_threshold = 7200

# ===== LLM Provider Configuration =====
[llm.openai]
# API key (or use OPENAI_API_KEY environment variable)
api_key = "${OPENAI_API_KEY}"

# Model to use
model = "gpt-4-turbo-preview"

# API endpoint
endpoint = "https://api.openai.com/v1"

# Request timeout (seconds)
timeout = 30

# Max retries
max_retries = 3

# Rate limit (requests per minute)
rate_limit = 500

# Temperature (0.0-1.0)
temperature = 0.0

# Max tokens
max_tokens = 2000

[llm.anthropic]
api_key = "${ANTHROPIC_API_KEY}"
model = "claude-3-5-sonnet-20241022"
endpoint = "https://api.anthropic.com/v1"
timeout = 30
max_retries = 3
rate_limit = 100
temperature = 0.0
max_tokens = 4000

[llm.deepseek]
api_key = "${DEEPSEEK_API_KEY}"
model = "deepseek-chat"
endpoint = "https://api.deepseek.com/v1"
timeout = 30
max_retries = 3
rate_limit = 200
temperature = 0.0
max_tokens = 2000

[llm.grok]
api_key = "${GROK_API_KEY}"
model = "grok-1"
endpoint = "https://api.x.ai/v1"
timeout = 30
max_retries = 3
rate_limit = 100
temperature = 0.0
max_tokens = 2000

# ===== Database Connection =====
[database]
# Database type (postgres, mysql, heliosdb)
type = "postgres"

# Connection string
connection_string = "postgresql://heliosdb:password@localhost:5432/heliosdb_metadata"

# Connection pool size
pool_size = 20

# Connection timeout (seconds)
connection_timeout = 10

# Query timeout (seconds)
query_timeout = 30

# ===== Distributed Caching (Redis) =====
[cache.redis]
# Enable Redis for distributed caching
enabled = false

# Redis connection string
url = "redis://localhost:6379"

# Key prefix for NL2SQL cache entries
key_prefix = "nl2sql:"

# Connection pool size
pool_size = 10

# ===== Observability =====
[observability]
# Enable metrics collection
metrics_enabled = true

# Metrics port (Prometheus scrape endpoint)
metrics_port = 9090

# Metrics path
metrics_path = "/metrics"

# Enable distributed tracing
tracing_enabled = true

# Tracing exporter (jaeger, zipkin, otlp)
tracing_exporter = "jaeger"

# Jaeger agent endpoint
jaeger_agent_endpoint = "localhost:6831"

# Log level (trace, debug, info, warn, error)
log_level = "info"

# Structured logging format (json, text)
log_format = "json"

# Log output (stdout, file)
log_output = "stdout"

# Log file path (if log_output = "file")
log_file = "/var/log/heliosdb/nl2sql.log"

# ===== Security =====
[security]
# Enable API authentication
require_auth = true

# JWT secret (or use JWT_SECRET environment variable)
jwt_secret = "${JWT_SECRET}"

# JWT expiration (seconds)
jwt_expiration = 3600

# Enable TLS
enable_tls = true

# TLS certificate path
tls_cert = "/etc/heliosdb/certs/server.crt"

# TLS key path
tls_key = "/etc/heliosdb/certs/server.key"

# Rate limiting (requests per minute per client)
rate_limit_per_client = 100

# ===== Feature Flags =====
[features]
# Enable experimental features
experimental = false

# Enable A/B testing
ab_testing = false

# A/B test split (0.0-1.0)
ab_test_split = 0.5

# Enable verbose error messages (disable in production)
verbose_errors = false

2.2 Environment Variables

Required:

# LLM API Keys
export OPENAI_API_KEY="sk-proj-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DEEPSEEK_API_KEY="..."  # Optional
export GROK_API_KEY="..."      # Optional

# Database
export DATABASE_URL="postgresql://user:pass@localhost:5432/heliosdb"

# Security
export JWT_SECRET="your-256-bit-secret-key-here"

# Deployment
export RUST_LOG="info,heliosdb_nl2sql=debug"
export RUST_BACKTRACE="1"

Optional:

# Redis (for distributed caching)
export REDIS_URL="redis://localhost:6379"

# Observability
export JAEGER_AGENT_HOST="localhost"
export JAEGER_AGENT_PORT="6831"

# Performance tuning
export NL2SQL_WORKER_THREADS="16"
export NL2SQL_MAX_CONCURRENT_QUERIES="200"

2.3 Performance Tuning Parameters

CPU-Bound Workloads (complex decomposition):

[nl2sql.agentic]
max_parallel_tasks = 8  # Match CPU core count
parallel_execution = true
timeout_seconds = 60

[nl2sql.refresh_manager]
worker_threads = 8  # Match CPU core count

Memory-Bound Workloads (large schemas):

[nl2sql]
cache_size_mb = 2000  # Increase for large skeleton cache

[nl2sql.statistics]
cache_size_mb = 500
column_cache_size_mb = 1000
enable_histograms = true
histogram_buckets = 200

High-Throughput Workloads (many concurrent queries):

[nl2sql]
max_concurrent_queries = 500

[database]
pool_size = 50

[cache.redis]
enabled = true
pool_size = 20

Low-Latency Workloads (fast response times):

[nl2sql]
query_timeout = 10
cache_ttl = 7200  # Longer cache retention

[llm.openai]
model = "gpt-3.5-turbo"  # Faster, lower latency
timeout = 10

[nl2sql.agentic]
complexity_threshold = 7  # Avoid decomposition for most queries

2.4 Security Settings

Production Security Hardening:

[security]
require_auth = true
enable_tls = true
rate_limit_per_client = 50  # Stricter rate limit

# TLS 1.3 only
tls_min_version = "1.3"

# Disable verbose errors (prevent information disclosure)
[features]
verbose_errors = false

# Enable audit logging
[observability]
audit_log_enabled = true
audit_log_path = "/var/log/heliosdb/audit.log"

API Key Rotation:

# Store API keys in secret manager (AWS Secrets Manager, HashiCorp Vault)
export OPENAI_API_KEY=$(aws secretsmanager get-secret-value \
  --secret-id heliosdb/nl2sql/openai --query SecretString --output text)

3. Installation & Setup

3.1 Step-by-Step Deployment Procedure

Option A: Docker Deployment (Recommended for Quick Start)

Step 1: Build Docker image

cd /path/to/heliosdb
docker build -t heliosdb-nl2sql:latest -f docker/Dockerfile.nl2sql .

Step 2: Create configuration

mkdir -p /etc/heliosdb
cp config/nl2sql.toml.example /etc/heliosdb/nl2sql.toml
# Edit /etc/heliosdb/nl2sql.toml with your settings

Step 3: Run with Docker Compose

version: '3.8'

services:
  nl2sql:
    image: heliosdb-nl2sql:latest
    ports:
      - "8080:8080"
      - "9090:9090"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - DATABASE_URL=${DATABASE_URL}
      - JWT_SECRET=${JWT_SECRET}
      - RUST_LOG=info
    volumes:
      - /etc/heliosdb/nl2sql.toml:/app/config.toml:ro
      - nl2sql-cache:/var/cache/nl2sql
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  postgres:
    image: postgres:14
    environment:
      POSTGRES_DB: heliosdb_metadata
      POSTGRES_USER: heliosdb
      POSTGRES_PASSWORD: secure_password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

volumes:
  nl2sql-cache:
  postgres-data:
  redis-data:

docker-compose up -d

Option B: Native Deployment (Production)

Step 1: Compile HeliosDB with NL2SQL feature

cd /path/to/heliosdb
cargo build --release --features nl2sql,statistics,advisor

sudo cp target/release/heliosdb /usr/local/bin/

Step 2: Create system user

sudo useradd -r -s /bin/false -d /var/lib/heliosdb heliosdb
sudo mkdir -p /var/lib/heliosdb /var/log/heliosdb /etc/heliosdb
sudo chown -R heliosdb:heliosdb /var/lib/heliosdb /var/log/heliosdb

Step 3: Install configuration

sudo cp config/nl2sql.toml /etc/heliosdb/
sudo chmod 640 /etc/heliosdb/nl2sql.toml
sudo chown root:heliosdb /etc/heliosdb/nl2sql.toml

Step 4: Create systemd service

sudo tee /etc/systemd/system/heliosdb-nl2sql.service > /dev/null <<EOF
[Unit]
Description=HeliosDB NL2SQL Engine
After=network.target postgresql.service

[Service]
Type=simple
User=heliosdb
Group=heliosdb
WorkingDirectory=/var/lib/heliosdb
ExecStart=/usr/local/bin/heliosdb nl2sql --config /etc/heliosdb/nl2sql.toml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/heliosdb /var/log/heliosdb

# Environment
Environment="RUST_LOG=info,heliosdb_nl2sql=debug"
EnvironmentFile=/etc/heliosdb/nl2sql.env

[Install]
WantedBy=multi-user.target
EOF

Step 5: Create environment file

sudo tee /etc/heliosdb/nl2sql.env > /dev/null <<EOF
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...
DATABASE_URL=postgresql://heliosdb:password@localhost:5432/heliosdb_metadata
JWT_SECRET=your-256-bit-secret-here
EOF

sudo chmod 600 /etc/heliosdb/nl2sql.env
sudo chown root:heliosdb /etc/heliosdb/nl2sql.env

Step 6: Enable and start service

sudo systemctl daemon-reload
sudo systemctl enable heliosdb-nl2sql
sudo systemctl start heliosdb-nl2sql
sudo systemctl status heliosdb-nl2sql

3.2 Database Initialization

Step 1: Create metadata database

-- PostgreSQL
CREATE DATABASE heliosdb_metadata;
CREATE USER heliosdb WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE heliosdb_metadata TO heliosdb;

-- MySQL
CREATE DATABASE heliosdb_metadata CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER 'heliosdb'@'localhost' IDENTIFIED BY 'secure_password';
GRANT ALL PRIVILEGES ON heliosdb_metadata.* TO 'heliosdb'@'localhost';
FLUSH PRIVILEGES;

Step 2: Run schema migrations

heliosdb migrate --database $DATABASE_URL --migrations ./migrations/nl2sql

Step 3: Verify schema

-- PostgreSQL
\dt
\di

-- Expected tables:
-- - nl2sql_cache
-- - nl2sql_statistics_table
-- - nl2sql_statistics_column
-- - nl2sql_refresh_queue
-- - nl2sql_audit_log

3.3 Schema Setup

Step 1: Initialize schema metadata (for target database)

# Connect NL2SQL to your application database
heliosdb nl2sql schema sync \
  --source-db "postgresql://app_user:pass@localhost:5432/app_db" \
  --target-metadata "postgresql://heliosdb:pass@localhost:5432/heliosdb_metadata"

Step 2: Verify schema sync

SELECT table_name, column_count, last_updated
FROM metadata_tables
ORDER BY table_name;

Step 3: Configure schema refresh

[nl2sql.metadata]
enabled = true
refresh_interval = 3600  # Re-sync every hour

# Auto-detect schema changes
auto_sync_on_ddl = true

3.4 Initial Statistics Collection

Step 1: Collect baseline statistics

heliosdb nl2sql statistics collect --all-tables

Step 2: Verify statistics

SELECT
  ts.table_name,
  ts.row_count,
  ts.size_bytes,
  COUNT(cs.column_name) AS columns_with_stats
FROM nl2sql_statistics_table ts
LEFT JOIN nl2sql_statistics_column cs ON ts.table_name = cs.table_name
GROUP BY ts.table_name, ts.row_count, ts.size_bytes
ORDER BY ts.row_count DESC;

Step 3: Schedule periodic statistics refresh

# Cron job for daily statistics refresh
0 2 * * * /usr/local/bin/heliosdb nl2sql statistics refresh --priority auto

Step 4: Monitor statistics staleness

SELECT
  table_name,
  last_updated,
  NOW() - last_updated AS staleness
FROM nl2sql_statistics_table
WHERE NOW() - last_updated > INTERVAL '24 hours'
ORDER BY staleness DESC;

4. Monitoring & Observability

4.1 Key Metrics to Monitor

Query Performance Metrics:

# Query latency (histogram)
nl2sql_query_duration_seconds{quantile="0.5"}
nl2sql_query_duration_seconds{quantile="0.95"}
nl2sql_query_duration_seconds{quantile="0.99"}

# Query throughput
rate(nl2sql_queries_total[5m])

# Query success rate
rate(nl2sql_queries_total{status="success"}[5m]) /
rate(nl2sql_queries_total[5m])

# LLM latency
nl2sql_llm_request_duration_seconds{provider="openai"}
nl2sql_llm_request_duration_seconds{provider="anthropic"}

# Cache hit rate
rate(nl2sql_cache_hits[5m]) /
(rate(nl2sql_cache_hits[5m]) + rate(nl2sql_cache_misses[5m]))

Agentic Decomposition Metrics:

# Decomposition frequency
rate(nl2sql_agentic_decompositions_total[5m])

# Average task count per decomposition
nl2sql_agentic_tasks_per_query{quantile="0.95"}

# Stage execution time
nl2sql_agentic_stage_duration_seconds{stage="1"}
nl2sql_agentic_stage_duration_seconds{stage="2"}

# Parallel efficiency
nl2sql_agentic_parallel_speedup_ratio

Code Advisor Metrics:

# Advisor analysis frequency
rate(nl2sql_advisor_analyses_total[5m])

# Quality scores
nl2sql_advisor_quality_score{quantile="0.5"}

# Bottleneck detection rate
rate(nl2sql_advisor_bottlenecks_detected[5m])

# Index recommendation rate
rate(nl2sql_advisor_index_recommendations[5m])

Statistics Service Metrics:

# Statistics cache size
nl2sql_statistics_cache_entries{type="table"}
nl2sql_statistics_cache_entries{type="column"}

# Statistics refresh rate
rate(nl2sql_statistics_refreshes_total[5m])

# Statistics staleness
nl2sql_statistics_age_seconds{quantile="0.95"}

# Cardinality estimation accuracy (if ground truth available)
nl2sql_statistics_estimation_error_ratio

System Resource Metrics:

# Memory usage
process_resident_memory_bytes
nl2sql_cache_memory_bytes

# CPU utilization
rate(process_cpu_seconds_total[5m])

# Connection pool usage
nl2sql_db_connections_active
nl2sql_db_connections_idle

4.2 Grafana Dashboard Configuration

Dashboard JSON (/etc/grafana/dashboards/nl2sql.json):

{
  "dashboard": {
    "title": "HeliosDB NL2SQL Production Dashboard",
    "panels": [
      {
        "title": "Query Throughput",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(nl2sql_queries_total[5m])",
            "legendFormat": "Total Queries/sec"
          },
          {
            "expr": "rate(nl2sql_queries_total{status=\"success\"}[5m])",
            "legendFormat": "Success"
          },
          {
            "expr": "rate(nl2sql_queries_total{status=\"error\"}[5m])",
            "legendFormat": "Errors"
          }
        ]
      },
      {
        "title": "Query Latency (p50, p95, p99)",
        "type": "graph",
        "targets": [
          {
            "expr": "nl2sql_query_duration_seconds{quantile=\"0.5\"}",
            "legendFormat": "p50"
          },
          {
            "expr": "nl2sql_query_duration_seconds{quantile=\"0.95\"}",
            "legendFormat": "p95"
          },
          {
            "expr": "nl2sql_query_duration_seconds{quantile=\"0.99\"}",
            "legendFormat": "p99"
          }
        ]
      },
      {
        "title": "Cache Hit Rate",
        "type": "gauge",
        "targets": [
          {
            "expr": "rate(nl2sql_cache_hits[5m]) / (rate(nl2sql_cache_hits[5m]) + rate(nl2sql_cache_misses[5m]))",
            "legendFormat": "Hit Rate"
          }
        ],
        "thresholds": [
          { "value": 0.7, "color": "green" },
          { "value": 0.5, "color": "yellow" },
          { "value": 0, "color": "red" }
        ]
      },
      {
        "title": "LLM Provider Latency",
        "type": "graph",
        "targets": [
          {
            "expr": "nl2sql_llm_request_duration_seconds{provider=\"openai\", quantile=\"0.95\"}",
            "legendFormat": "OpenAI p95"
          },
          {
            "expr": "nl2sql_llm_request_duration_seconds{provider=\"anthropic\", quantile=\"0.95\"}",
            "legendFormat": "Anthropic p95"
          }
        ]
      },
      {
        "title": "Agentic Decomposition Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(nl2sql_agentic_decompositions_total[5m])",
            "legendFormat": "Decompositions/sec"
          },
          {
            "expr": "nl2sql_agentic_tasks_per_query{quantile=\"0.95\"}",
            "legendFormat": "Tasks per Query (p95)"
          }
        ]
      },
      {
        "title": "Statistics Cache Health",
        "type": "stat",
        "targets": [
          {
            "expr": "nl2sql_statistics_cache_entries{type=\"table\"}",
            "legendFormat": "Table Stats"
          },
          {
            "expr": "nl2sql_statistics_cache_entries{type=\"column\"}",
            "legendFormat": "Column Stats"
          }
        ]
      }
    ]
  }
}

Quick Setup:

# Import dashboard
curl -X POST http://admin:admin@localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -d @/etc/grafana/dashboards/nl2sql.json

4.3 Alert Thresholds

Prometheus Alerting Rules (/etc/prometheus/rules/nl2sql.yml):

groups:
  - name: nl2sql_alerts
    interval: 30s
    rules:
      # High error rate
      - alert: NL2SQLHighErrorRate
        expr: |
          rate(nl2sql_queries_total{status="error"}[5m]) /
          rate(nl2sql_queries_total[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "NL2SQL error rate above 5%"
          description: "Error rate is {{ $value | humanizePercentage }}"

      # Query latency too high
      - alert: NL2SQLHighLatency
        expr: nl2sql_query_duration_seconds{quantile="0.95"} > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "NL2SQL p95 latency above 10s"
          description: "p95 latency is {{ $value }}s"

      # LLM provider failure
      - alert: NL2SQLLLMProviderDown
        expr: |
          rate(nl2sql_llm_requests_total{status="error"}[5m]) /
          rate(nl2sql_llm_requests_total[5m]) > 0.5
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "LLM provider {{ $labels.provider }} has high failure rate"
          description: "Failure rate is {{ $value | humanizePercentage }}"

      # Cache hit rate too low
      - alert: NL2SQLLowCacheHitRate
        expr: |
          rate(nl2sql_cache_hits[5m]) /
          (rate(nl2sql_cache_hits[5m]) + rate(nl2sql_cache_misses[5m])) < 0.5
        for: 10m
        labels:
          severity: info
        annotations:
          summary: "NL2SQL cache hit rate below 50%"
          description: "Hit rate is {{ $value | humanizePercentage }}"

      # Statistics stale
      - alert: NL2SQLStaleStatistics
        expr: nl2sql_statistics_age_seconds{quantile="0.95"} > 86400
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "NL2SQL statistics are stale (>24h)"
          description: "p95 age is {{ $value | humanizeDuration }}"

      # High memory usage
      - alert: NL2SQLHighMemoryUsage
        expr: nl2sql_cache_memory_bytes > 10737418240  # 10GB
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "NL2SQL cache memory usage above 10GB"
          description: "Current usage: {{ $value | humanize1024 }}B"

      # Connection pool exhaustion
      - alert: NL2SQLConnectionPoolExhausted
        expr: |
          nl2sql_db_connections_active /
          (nl2sql_db_connections_active + nl2sql_db_connections_idle) > 0.9
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "NL2SQL connection pool >90% utilized"
          description: "Utilization: {{ $value | humanizePercentage }}"

4.4 Log Aggregation Setup

Structured JSON Logging (configured in nl2sql.toml):

[observability]
log_format = "json"
log_output = "stdout"

Example JSON Log:

{
  "timestamp": "2025-10-28T10:23:45.123Z",
  "level": "INFO",
  "target": "heliosdb_nl2sql::engine",
  "message": "Query translation completed",
  "fields": {
    "query_id": "q-123456",
    "nl_query": "Show top 10 customers by revenue",
    "complexity": 7,
    "agentic_decomposition": true,
    "tasks": 4,
    "latency_ms": 1523,
    "llm_provider": "openai",
    "cache_hit": false,
    "confidence": 0.92
  },
  "span": {
    "name": "translate_query",
    "trace_id": "abc123def456"
  }
}

Elasticsearch Integration (via Filebeat):

filebeat.inputs:
  - type: container
    paths:
      - '/var/lib/docker/containers/*/*.log'
    json.keys_under_root: true
    json.add_error_key: true
    processors:
      - add_fields:
          target: ''
          fields:
            service: heliosdb-nl2sql

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "heliosdb-nl2sql-%{+yyyy.MM.dd}"

Log Queries (Elasticsearch/Kibana):

# Find slow queries
GET heliosdb-nl2sql-*/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "target": "heliosdb_nl2sql::engine" }},
        { "range": { "fields.latency_ms": { "gte": 5000 }}}
      ]
    }
  },
  "sort": [ { "fields.latency_ms": "desc" }]
}

# Analyze error patterns
GET heliosdb-nl2sql-*/_search
{
  "query": { "match": { "level": "ERROR" }},
  "aggs": {
    "error_types": {
      "terms": { "field": "fields.error_type.keyword" }
    }
  }
}

4.5 Health Check Endpoints

Liveness Probe (checks if service is running):

curl -f http://localhost:8080/health/liveness
# Response: { "status": "alive", "timestamp": "2025-10-28T10:23:45Z" }

Readiness Probe (checks if service can accept traffic):

curl -f http://localhost:8080/health/readiness
# Response:
# {
#   "status": "ready",
#   "checks": {
#     "database": "ok",
#     "llm_providers": "ok",
#     "statistics_cache": "ok"
#   },
#   "timestamp": "2025-10-28T10:23:45Z"
# }

Startup Probe (checks if service has started):

curl -f http://localhost:8080/health/startup
# Response: { "status": "started", "uptime_seconds": 3600 }

5. Operational Runbooks

5.1 Common Operational Tasks

Task 1: Refresh Statistics Manually

When to Use: After bulk data load, schema changes, or when statistics are stale.

Procedure:

# Refresh all tables
heliosdb nl2sql statistics refresh --all

# Refresh specific table
heliosdb nl2sql statistics refresh --table customers --priority high

# Refresh based on staleness
heliosdb nl2sql statistics refresh --stale-threshold 24h --priority auto

Verification:

SELECT table_name, last_updated
FROM nl2sql_statistics_table
WHERE table_name = 'customers';

Task 2: Adjust Refresh Priorities

When to Use: To optimize statistics freshness for critical tables.

Procedure:

# Set high priority for critical tables
heliosdb nl2sql refresh-manager set-priority \
  --table orders --priority 9

heliosdb nl2sql refresh-manager set-priority \
  --table customers --priority 8

# Lower priority for archival tables
heliosdb nl2sql refresh-manager set-priority \
  --table historical_data --priority 2

Configuration:

[nl2sql.refresh_manager]
high_priority_threshold = 8
medium_priority_threshold = 5

Task 3: Clear Cache (Emergency)

When to Use: After schema changes, corrupt cache, or to force re-translation.

Procedure:

# Clear entire cache
heliosdb nl2sql cache clear --all

# Clear specific table entries
heliosdb nl2sql cache clear --table customers

# Clear stale entries only
heliosdb nl2sql cache clear --older-than 7d

Redis Cache Clear:

redis-cli --scan --pattern "nl2sql:*" | xargs redis-cli DEL

Task 4: Update LLM Provider Priorities

When to Use: When primary provider is down or has degraded performance.

Procedure:

# Update config
[nl2sql]
default_provider = "anthropic"  # Switch from OpenAI
fallback_providers = ["openai", "deepseek"]

# Or via environment variable
export NL2SQL_DEFAULT_PROVIDER="anthropic"

Reload Configuration (without restart):

sudo systemctl reload heliosdb-nl2sql
# Or send SIGHUP
sudo kill -HUP $(pgrep heliosdb)

Task 5: Enable/Disable Agentic Decomposition

When to Use: To reduce latency or troubleshoot complex query issues.

Procedure:

# Disable agentic decomposition
[nl2sql.agentic]
enabled = false

# Or increase threshold to reduce usage
complexity_threshold = 8  # Only for very complex queries

Runtime Toggle (via API):

curl -X POST http://localhost:8080/admin/features \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{"agentic_enabled": false}'

5.2 Troubleshooting Procedures

Issue: High Query Latency

Symptoms:

p95 latency >5s
Timeout errors
User complaints

Decision Tree:

High Latency?
├─ Check LLM Provider Latency
│  ├─ >3s? → Switch provider or increase timeout
│  └─ <3s → Continue
├─ Check Agentic Decomposition
│  ├─ Many tasks (>6)? → Increase complexity_threshold
│  └─ Few tasks → Continue
├─ Check Statistics Staleness
│  ├─ >24h? → Refresh statistics
│  └─ Fresh → Continue
└─ Check Database Connection
   ├─ Pool exhausted? → Increase pool_size
   └─ OK → Check cache hit rate

Actions:

# 1. Check current latency breakdown
curl http://localhost:9090/metrics | grep nl2sql_query_duration

# 2. Identify bottleneck
# - If LLM latency high: Switch provider
# - If many tasks: Increase complexity threshold
# - If cache miss: Warm up cache

# 3. Temporarily disable agentic for latency-sensitive queries
curl -X POST http://localhost:8080/admin/features \
  -d '{"agentic_complexity_threshold": 10}'

Issue: LLM Provider Failures

Symptoms:

llm_request_error in logs
Increased error rate
Fallback provider usage spiking

Diagnosis:

# Check LLM provider status
curl http://localhost:9090/metrics | grep nl2sql_llm_requests_total

# Check error logs
journalctl -u heliosdb-nl2sql -n 100 | grep "llm_request_error"

Actions:

# 1. Verify API key validity
echo $OPENAI_API_KEY | wc -c  # Should be ~50+ chars

# 2. Test API connectivity
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

# 3. Switch to fallback provider
export NL2SQL_DEFAULT_PROVIDER="anthropic"
sudo systemctl reload heliosdb-nl2sql

# 4. Monitor fallback performance
watch -n 5 'curl -s http://localhost:9090/metrics | grep nl2sql_llm_requests_total'

Issue: Stale Statistics

Symptoms:

Poor query plans
Unexpected full table scans
Inaccurate cardinality estimates

Diagnosis:

SELECT
  table_name,
  last_updated,
  NOW() - last_updated AS staleness
FROM nl2sql_statistics_table
WHERE NOW() - last_updated > INTERVAL '24 hours'
ORDER BY staleness DESC;

Actions:

# 1. Identify stale tables
heliosdb nl2sql statistics list --stale-threshold 24h

# 2. Prioritize refresh
for table in orders customers products; do
  heliosdb nl2sql statistics refresh --table $table --priority high
done

# 3. Verify refresh
watch -n 10 'psql -c "SELECT table_name, last_updated FROM nl2sql_statistics_table WHERE table_name IN ('"'"'orders'"'"', '"'"'customers'"'"');"'

# 4. Update refresh schedule
# Edit cron job to run more frequently
crontab -e
# Change: 0 2 * * * → */6 * * * (every 6 hours)

Issue: Memory Exhaustion

Symptoms:

OOM kills
High swap usage
Cache eviction rate high

Diagnosis:

# Check memory usage
free -h
ps aux | grep heliosdb | awk '{print $2, $4, $6}' | sort -k3 -rn | head -1

# Check cache size
curl http://localhost:9090/metrics | grep nl2sql_cache_memory_bytes

Actions:

# 1. Reduce cache sizes
[nl2sql]
cache_size_mb = 250  # Reduce from 500

[nl2sql.statistics]
cache_size_mb = 50  # Reduce from 100
column_cache_size_mb = 100  # Reduce from 200

# 2. Enable Redis for distributed caching (offload memory)
[cache.redis]
enabled = true

# 3. Restart service
sudo systemctl restart heliosdb-nl2sql

# 4. Monitor memory
watch -n 5 'free -h && curl -s http://localhost:9090/metrics | grep cache_memory'

5.3 Performance Optimization

Optimization 1: Increase Cache Hit Rate

Target: >70% cache hit rate

Actions:

# 1. Increase cache TTL
[nl2sql]
cache_ttl = 7200  # 2 hours

# 2. Pre-warm cache with common queries
# Create warmup script

#!/bin/bash
QUERIES=(
  "Show all active customers"
  "List top 10 products by sales"
  "Count orders by status"
)

for query in "${QUERIES[@]}"; do
  curl -X POST http://localhost:8080/api/nl2sql/translate \
    -H "Content-Type: application/json" \
    -d "{\"query\": \"$query\"}"
done

Optimization 2: Reduce Agentic Decomposition Overhead

Target: <20% of queries use decomposition

Actions:

# Increase complexity threshold
[nl2sql.agentic]
complexity_threshold = 7  # Up from 5
max_tasks = 6  # Cap at 6 tasks

Monitor Impact:

# Before and after comparison
curl -s http://localhost:9090/metrics | grep nl2sql_agentic_decompositions_total

Optimization 3: Optimize Statistics Collection

Target: Statistics refresh <5 minutes

Actions:

# 1. Identify slow tables
heliosdb nl2sql statistics analyze --report

# 2. Disable histogram collection for large tables

[nl2sql.statistics]
enable_histograms = false  # Or reduce buckets
histogram_buckets = 50  # Down from 100
min_table_size = 10000  # Skip small tables

5.4 Capacity Planning

Metrics to Track:

# Query rate growth
rate(nl2sql_queries_total[1d]) - rate(nl2sql_queries_total[1d] offset 7d)

# P95 latency trend
nl2sql_query_duration_seconds{quantile="0.95"} offset 7d

# Memory growth rate
delta(nl2sql_cache_memory_bytes[1d])

Capacity Projections:

# Example: Estimate capacity needs
import pandas as pd
import numpy as np

# Historical data (queries per day)
data = pd.Series([1000, 1200, 1350, 1500, 1700, 2000, 2400])

# Linear regression for 30-day projection
from sklearn.linear_model import LinearRegression
X = np.array(range(len(data))).reshape(-1, 1)
y = data.values
model = LinearRegression().fit(X, y)

# Predict day 30
predicted = model.predict([[30]])[0]
print(f"Projected queries/day in 30 days: {predicted:.0f}")

# Calculate required resources
# Assume: 100 qps = 4 CPU cores, 8GB RAM
current_qps = 2400 / 86400  # ~28 qps
projected_qps = predicted / 86400
cpu_cores_needed = (projected_qps / 25) * 4
memory_gb_needed = (projected_qps / 25) * 8

print(f"Resources needed:")
print(f"  CPU cores: {cpu_cores_needed:.0f}")
print(f"  Memory: {memory_gb_needed:.0f}GB")

5.5 Backup and Recovery

Backup Strategy:

1. Configuration Backup:

#!/bin/bash
BACKUP_DIR="/var/backups/heliosdb-nl2sql"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p $BACKUP_DIR
tar -czf $BACKUP_DIR/config-$DATE.tar.gz \
  /etc/heliosdb/nl2sql.toml \
  /etc/heliosdb/nl2sql.env

# Retain 30 days
find $BACKUP_DIR -name "config-*.tar.gz" -mtime +30 -delete

2. Statistics Backup (PostgreSQL):

#!/bin/bash
BACKUP_DIR="/var/backups/heliosdb-nl2sql"
DATE=$(date +%Y%m%d_%H%M%S)

pg_dump -h localhost -U heliosdb \
  -t nl2sql_statistics_table \
  -t nl2sql_statistics_column \
  heliosdb_metadata | gzip > $BACKUP_DIR/statistics-$DATE.sql.gz

# Retain 7 days
find $BACKUP_DIR -name "statistics-*.sql.gz" -mtime +7 -delete

3. Cache Snapshot (Redis):

#!/bin/bash
redis-cli --rdb /var/backups/heliosdb-nl2sql/cache-$(date +%Y%m%d).rdb

Recovery Procedure:

# 1. Restore configuration
tar -xzf /var/backups/heliosdb-nl2sql/config-20251028_120000.tar.gz -C /

# 2. Restore statistics
gunzip < /var/backups/heliosdb-nl2sql/statistics-20251028.sql.gz | \
  psql -h localhost -U heliosdb heliosdb_metadata

# 3. Restart service
sudo systemctl restart heliosdb-nl2sql

# 4. Verify health
curl http://localhost:8080/health/readiness

6. Production Checklist

6.1 Pre-Deployment Validation

Configuration Validation:

#  Validate TOML syntax
heliosdb nl2sql config validate --config /etc/heliosdb/nl2sql.toml

#  Test database connectivity
heliosdb nl2sql config test-db --config /etc/heliosdb/nl2sql.toml

#  Test LLM provider connectivity
heliosdb nl2sql config test-llm --provider all --config /etc/heliosdb/nl2sql.toml

#  Verify API keys are set
env | grep -E '(OPENAI|ANTHROPIC|DEEPSEEK|GROK)_API_KEY'

#  Check filesystem permissions
ls -la /etc/heliosdb/nl2sql.toml  # Should be 640, root:heliosdb
ls -ld /var/lib/heliosdb           # Should be 755, heliosdb:heliosdb

Security Validation:

#  Verify TLS certificates
openssl x509 -in /etc/heliosdb/certs/server.crt -noout -dates

#  Test JWT secret strength
echo -n $JWT_SECRET | wc -c  # Should be ≥32 characters

#  Verify rate limiting
curl -I http://localhost:8080/health  # Check for X-RateLimit headers

#  Check firewall rules
sudo iptables -L -n | grep 8080

Performance Validation:

#  Load test with sample queries
ab -n 100 -c 10 -T 'application/json' \
  -p test-query.json \
  http://localhost:8080/api/nl2sql/translate

#  Verify connection pool size
curl http://localhost:9090/metrics | grep nl2sql_db_connections

#  Check memory limits
cat /proc/$(pgrep heliosdb)/limits | grep "Max resident set"

6.2 Post-Deployment Verification

Functional Tests:

#  Test simple query translation
curl -X POST http://localhost:8080/api/nl2sql/translate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{"query": "Show all customers"}' | jq .

#  Test complex query (agentic decomposition)
curl -X POST http://localhost:8080/api/nl2sql/translate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{"query": "Find top 10 customers by revenue in Q1 2024 with order counts"}' | jq .

#  Test Code Advisor
curl -X POST http://localhost:8080/api/nl2sql/advisor/analyze \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -d '{"sql": "SELECT * FROM orders WHERE customer_id = 123"}' | jq .

#  Test statistics endpoint
curl http://localhost:8080/api/nl2sql/statistics/tables | jq .

Monitoring Tests:

#  Verify metrics endpoint
curl http://localhost:9090/metrics | grep nl2sql_queries_total

#  Check health endpoints
curl http://localhost:8080/health/liveness
curl http://localhost:8080/health/readiness
curl http://localhost:8080/health/startup

#  Verify logs are being written
journalctl -u heliosdb-nl2sql -n 10 --no-pager

#  Test alerting (trigger error condition)
# Check that Prometheus alerts fire and PagerDuty notifications sent

Integration Tests:

#  Verify schema sync
heliosdb nl2sql schema list | grep -q "customers" && echo " Schema synced"

#  Verify statistics collection
psql -c "SELECT COUNT(*) FROM nl2sql_statistics_table;" | grep -q "10" && echo " Statistics collected"

#  Test cache functionality
# Run same query twice, verify second is faster
time1=$(curl -w "%{time_total}" -o /dev/null -s -X POST http://localhost:8080/api/nl2sql/translate -d '{"query":"test"}')
time2=$(curl -w "%{time_total}" -o /dev/null -s -X POST http://localhost:8080/api/nl2sql/translate -d '{"query":"test"}')
echo "First: ${time1}s, Second: ${time2}s (should be faster)"

6.3 Performance Baseline

Establish Baseline Metrics:

# Run benchmark suite
heliosdb nl2sql benchmark run --output baseline-$(date +%Y%m%d).json

# Example baseline results:
# {
#   "simple_query_p50": 0.12s,
#   "simple_query_p95": 0.35s,
#   "complex_query_p50": 1.8s,
#   "complex_query_p95": 3.2s,
#   "cache_hit_rate": 0.72,
#   "throughput_qps": 350,
#   "agentic_decomposition_rate": 0.18
# }

Compare Against Baseline:

# After 1 week, re-run benchmark
heliosdb nl2sql benchmark run --output week1-$(date +%Y%m%d).json

# Compare
heliosdb nl2sql benchmark compare \
  --baseline baseline-20251028.json \
  --current week1-20251104.json

6.4 Monitoring Setup Verification

Prometheus Targets:

#  Verify Prometheus is scraping NL2SQL metrics
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job=="heliosdb-nl2sql")'

# Should show:
# {
#   "health": "up",
#   "lastScrape": "2025-10-28T10:23:45Z",
#   "scrapeUrl": "http://heliosdb-nl2sql:9090/metrics"
# }

Grafana Dashboards:

#  Verify dashboard exists
curl -H "Authorization: Bearer $GRAFANA_API_KEY" \
  http://localhost:3000/api/search?query=NL2SQL | jq .

#  Test dashboard data
curl -H "Authorization: Bearer $GRAFANA_API_KEY" \
  'http://localhost:3000/api/datasources/proxy/1/api/v1/query?query=nl2sql_queries_total'

Alerting:

#  Verify alert rules loaded
curl http://localhost:9090/api/v1/rules | jq '.data.groups[] | select(.name=="nl2sql_alerts")'

#  Test alert firing (optional - trigger condition manually)
# Disable LLM provider temporarily to trigger NL2SQLLLMProviderDown alert

6.5 Documentation Handoff

Required Documentation:

 Architecture diagram (saved to /docs/nl2sql/PRODUCTION_DEPLOYMENT_GUIDE.md)
 Configuration reference (TOML + env vars)
 Runbook procedures (common tasks, troubleshooting)
 Monitoring dashboard URLs
 Alert contact information (PagerDuty, Slack)
 Backup/restore procedures
 Escalation matrix

Handoff Checklist:

- [ ] Operations team trained on deployment guide
- [ ] On-call rotation established
- [ ] Runbook procedures tested in staging
- [ ] Grafana dashboard access granted
- [ ] PagerDuty integration tested
- [ ] Backup/restore procedures verified
- [ ] Emergency contacts documented
- [ ] Knowledge transfer session completed

7. Scaling Guidelines

7.1 Horizontal Scaling Strategies

Load Balancer Configuration (NGINX):

upstream nl2sql_backend {
    least_conn;  # Use least-connections algorithm

    server nl2sql-1:8080 max_fails=3 fail_timeout=30s;
    server nl2sql-2:8080 max_fails=3 fail_timeout=30s;
    server nl2sql-3:8080 max_fails=3 fail_timeout=30s;

    keepalive 32;
}

server {
    listen 80;
    server_name nl2sql.heliosdb.com;

    location /api/nl2sql/ {
        proxy_pass http://nl2sql_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Increase timeout for complex queries
        proxy_read_timeout 60s;
        proxy_connect_timeout 10s;
    }

    location /health/ {
        proxy_pass http://nl2sql_backend;
        proxy_read_timeout 5s;
    }
}

Kubernetes Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: heliosdb-nl2sql
  labels:
    app: heliosdb-nl2sql
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: heliosdb-nl2sql
  template:
    metadata:
      labels:
        app: heliosdb-nl2sql
    spec:
      containers:
      - name: nl2sql
        image: heliosdb-nl2sql:latest
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9090
          name: metrics
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: llm-api-keys
              key: openai
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: url
        resources:
          requests:
            cpu: "4"
            memory: "8Gi"
          limits:
            cpu: "8"
            memory: "16Gi"
        livenessProbe:
          httpGet:
            path: /health/liveness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        volumeMounts:
        - name: config
          mountPath: /app/config.toml
          subPath: nl2sql.toml
      volumes:
      - name: config
        configMap:
          name: nl2sql-config
---
apiVersion: v1
kind: Service
metadata:
  name: heliosdb-nl2sql
spec:
  type: LoadBalancer
  selector:
    app: heliosdb-nl2sql
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: metrics
    port: 9090
    targetPort: 9090
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: heliosdb-nl2sql-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: heliosdb-nl2sql
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: nl2sql_queries_total
      target:
        type: AverageValue
        averageValue: "100"  # 100 qps per pod

7.2 Resource Allocation

Sizing Guide (per instance):

Workload Type	CPU Cores	Memory	Queries/sec
Light (simple queries)	2-4	4-8GB	50-100
Medium (mixed)	4-8	8-16GB	100-200
Heavy (complex, agentic)	8-16	16-32GB	200-400
Very Heavy (high concurrency)	16-32	32-64GB	400-800

Formula:

CPU_cores = (target_qps * avg_query_latency_seconds * 1.5_overhead) / concurrency_factor
Memory_GB = (cache_size_mb + column_cache_mb + 2048_base) / 1024

Example:

# Target: 300 qps, avg latency 1.5s
target_qps = 300
avg_latency = 1.5
overhead = 1.5
concurrency = 0.8  # 80% of queries can run in parallel

cpu_cores = (target_qps * avg_latency * overhead) / concurrency
# = (300 * 1.5 * 1.5) / 0.8 = 843 → ~12 cores

# Memory
cache_mb = 500
column_cache_mb = 200
base_mb = 2048

memory_gb = (cache_mb + column_cache_mb + base_mb) / 1024
# = 2748 / 1024 = ~3GB → Recommend 8GB with headroom

7.3 Load Balancing

Session Affinity (for cache locality):

# NGINX - IP hash for session affinity
upstream nl2sql_backend {
    ip_hash;  # Route same IP to same backend
    server nl2sql-1:8080;
    server nl2sql-2:8080;
    server nl2sql-3:8080;
}

Kubernetes Service Mesh (Istio):

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: nl2sql-lb
spec:
  host: heliosdb-nl2sql
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: "X-User-ID"  # Route based on user ID

Health Check-Based Routing:

# Remove unhealthy backends automatically
upstream nl2sql_backend {
    server nl2sql-1:8080 max_fails=3 fail_timeout=30s;
    server nl2sql-2:8080 max_fails=3 fail_timeout=30s backup;  # Backup server

    check interval=5000 rise=2 fall=3 timeout=3000 type=http;
    check_http_send "GET /health/readiness HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx http_3xx;
}

7.4 Multi-Region Deployment

Architecture:

                    ┌─────────────────┐
                    │  Global DNS     │
                    │  (Route 53)     │
                    └────────┬────────┘
                             │
            ┌────────────────┼────────────────┐
            ▼                ▼                ▼
     ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
     │  US-EAST-1   │ │   EU-WEST-1  │ │  AP-SOUTH-1  │
     │  (Primary)   │ │  (Secondary) │ │  (Secondary) │
     └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
            │                │                │
     ┌──────▼───────┐ ┌──────▼───────┐ ┌──────▼───────┐
     │  NL2SQL      │ │  NL2SQL      │ │  NL2SQL      │
     │  Cluster     │ │  Cluster     │ │  Cluster     │
     │  (3 nodes)   │ │  (3 nodes)   │ │  (3 nodes)   │
     └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
            │                │                │
     ┌──────▼───────────────────────────────────▼──────┐
     │       Global Statistics Replication              │
     │       (PostgreSQL Multi-Master / CockroachDB)    │
     └──────────────────────────────────────────────────┘

Configuration (per region):

[deployment]
region = "us-east-1"
primary_region = true

# Replicate statistics to other regions
[statistics.replication]
enabled = true
targets = [
    "eu-west-1.nl2sql.heliosdb.com",
    "ap-south-1.nl2sql.heliosdb.com"
]
sync_interval = 300  # 5 minutes

Failover Strategy:

# Route 53 health check
Type: HTTPS
ResourcePath: /health/readiness
FailureThreshold: 3
MeasureLatency: true
Regions:
  - us-east-1 (Primary - Weight: 100)
  - eu-west-1 (Failover - Weight: 0)
  - ap-south-1 (Failover - Weight: 0)

# Automatic failover if primary unhealthy for >1 minute

8. Troubleshooting Guide

8.1 Common Issues and Solutions

Issue 1: Query Translation Failures

Symptoms:

translation_error in response
Error: “Failed to parse SQL from LLM response”
Confidence score consistently low (<0.5)

Root Causes:

Ambiguous natural language query
LLM provider issue
Schema mismatch
Complex query beyond LLM capabilities

Solutions:

Step 1: Check query complexity

curl -X POST http://localhost:8080/api/nl2sql/analyze \
  -d '{"query": "YOUR_QUERY"}' | jq '.complexity'

# If complexity >8, query may be too complex

Step 2: Verify schema is synced

SELECT table_name, last_updated
FROM metadata_tables
WHERE table_name IN ('referenced_table_1', 'referenced_table_2');

-- If last_updated is old, re-sync

heliosdb nl2sql schema sync --force

Step 3: Try different LLM provider

curl -X POST http://localhost:8080/api/nl2sql/translate \
  -d '{"query": "YOUR_QUERY", "provider": "anthropic"}' | jq .

Step 4: Simplify query

# Instead of:
"Show me the top 10 customers by total revenue in Q1 2024, grouped by region, with running totals and percent of total"

# Try:
"Show top 10 customers by revenue in 2024"

Issue 2: Agentic Decomposition Timeouts

Symptoms:

agentic_timeout errors
Queries taking >30s
Many tasks (>8) in decomposition

Root Causes:

Too many tasks generated
Sequential execution (not parallelized)
LLM provider slow
Database connection slow

Solutions:

Step 1: Increase timeout temporarily

[nl2sql.agentic]
timeout_seconds = 60  # Increase from 30

Step 2: Enable parallel execution

[nl2sql.agentic]
parallel_execution = true
max_parallel_tasks = 6  # Increase from 4

Step 3: Reduce max tasks

[nl2sql.agentic]
max_tasks = 6  # Reduce from 10
complexity_threshold = 7  # Increase to reduce decomposition frequency

Step 4: Check LLM provider latency

curl http://localhost:9090/metrics | grep nl2sql_llm_request_duration_seconds

# If >3s, switch provider

Issue 3: Statistics Collection Failures

Symptoms:

statistics_collection_error in logs
get_table_stats returns None
Cardinality estimates are wrong

Root Causes:

Database permission issues
Table not in metadata
Statistics refresh manager queue full
Database connection timeout

Solutions:

Step 1: Check database permissions

-- PostgreSQL
GRANT SELECT ON ALL TABLES IN SCHEMA public TO heliosdb;
GRANT SELECT ON pg_statistic TO heliosdb;

-- MySQL
GRANT SELECT ON *.* TO 'heliosdb'@'localhost';
GRANT SELECT ON mysql.innodb_table_stats TO 'heliosdb'@'localhost';

Step 2: Verify table exists in metadata

SELECT * FROM metadata_tables WHERE table_name = 'your_table';

-- If not found, re-sync schema

Step 3: Check refresh queue

heliosdb nl2sql refresh-manager status

# If queue full, increase capacity

[nl2sql.refresh_manager]
queue_capacity = 2000  # Increase from 1000
worker_threads = 8  # Increase from 4

Step 4: Manual statistics collection

heliosdb nl2sql statistics refresh --table your_table --force

Issue 4: Code Advisor Performance Issues

Symptoms:

Advisor analysis taking >5s
Memory spikes during analysis
Incomplete recommendations

Root Causes:

Large SQL queries (>10KB)
Statistics lookup overhead
Metadata adapter slow
Too many index recommendations

Solutions:

Step 1: Disable statistics integration temporarily

[nl2sql.advisor]
use_statistics = false  # Temporarily disable

Step 2: Limit index recommendations

[nl2sql.advisor]
max_index_recommendations = 5  # Cap at 5

Step 3: Cache advisor results

# Enable Redis caching for advisor results

[cache.redis]
enabled = true
advisor_cache_ttl = 1800  # 30 minutes

Step 4: Increase advisor timeout

[nl2sql.advisor]
analysis_timeout_seconds = 10  # Increase from 5

8.2 Debug Procedures

Enable Debug Logging:

# Temporary (runtime)
export RUST_LOG="debug,heliosdb_nl2sql=trace"
sudo systemctl restart heliosdb-nl2sql

# Permanent (config)

[observability]
log_level = "debug"

Trace Specific Query:

# Enable tracing for single query
curl -X POST http://localhost:8080/api/nl2sql/translate \
  -H "X-Trace-Enabled: true" \
  -d '{"query": "YOUR_QUERY"}' | jq .

# View trace in Jaeger
open http://localhost:16686

Profile Memory Usage:

# Using heaptrack
heaptrack /usr/local/bin/heliosdb nl2sql --config /etc/heliosdb/nl2sql.toml

# Analyze
heaptrack_gui heaptrack.heliosdb.*.gz

SQL Query Inspection:

# Enable SQL query logging (PostgreSQL)
ALTER SYSTEM SET log_statement = 'all';
SELECT pg_reload_conf();

# View logs
tail -f /var/lib/postgresql/14/main/log/postgresql-*.log | grep heliosdb

8.3 Log Analysis

Find Slow Queries:

journalctl -u heliosdb-nl2sql --since "1 hour ago" | \
  jq -r 'select(.fields.latency_ms > 5000) | "\(.timestamp) \(.fields.query_id) \(.fields.latency_ms)ms"'

Analyze Error Patterns:

journalctl -u heliosdb-nl2sql --since "1 day ago" | \
  jq -r 'select(.level == "ERROR") | .fields.error_type' | \
  sort | uniq -c | sort -rn

Track LLM Provider Usage:

journalctl -u heliosdb-nl2sql --since "1 hour ago" | \
  jq -r 'select(.target == "heliosdb_nl2sql::llm") | .fields.provider' | \
  sort | uniq -c

8.4 Performance Tuning

Identify Bottlenecks:

# CPU profiling
perf record -F 99 -p $(pgrep heliosdb) -g -- sleep 30
perf report

# I/O profiling
iotop -p $(pgrep heliosdb)

# Network profiling
nethogs -p $(pgrep heliosdb)

Optimize Configuration:

# Benchmark different configurations
for threshold in 5 6 7 8; do
  echo "Testing complexity_threshold=$threshold"
  sed -i "s/complexity_threshold = .*/complexity_threshold = $threshold/" /etc/heliosdb/nl2sql.toml
  sudo systemctl reload heliosdb-nl2sql
  sleep 10
  heliosdb nl2sql benchmark run --duration 60s --output "benchmark-$threshold.json"
done

# Compare results
heliosdb nl2sql benchmark compare benchmark-*.json

Appendix

A. Configuration Template

See Section 2.1 for complete TOML configuration template.

B. Metrics Reference

Complete metrics list:

# Query metrics
nl2sql_queries_total{status="success|error"}
nl2sql_query_duration_seconds{quantile="0.5|0.95|0.99"}
nl2sql_query_complexity_score{quantile="0.5|0.95|0.99"}

# Agentic metrics
nl2sql_agentic_decompositions_total
nl2sql_agentic_tasks_per_query{quantile="0.5|0.95|0.99"}
nl2sql_agentic_stages_per_query{quantile="0.5|0.95|0.99"}
nl2sql_agentic_stage_duration_seconds{stage="1|2|3|..."}
nl2sql_agentic_parallel_speedup_ratio

# Code Advisor metrics
nl2sql_advisor_analyses_total
nl2sql_advisor_quality_score{quantile="0.5|0.95|0.99"}
nl2sql_advisor_bottlenecks_detected{kind="FullTableScan|MissingIndex|..."}
nl2sql_advisor_index_recommendations_total

# Statistics metrics
nl2sql_statistics_cache_entries{type="table|column"}
nl2sql_statistics_cache_hits
nl2sql_statistics_cache_misses
nl2sql_statistics_refreshes_total{priority="high|medium|low"}
nl2sql_statistics_refresh_duration_seconds
nl2sql_statistics_age_seconds{quantile="0.5|0.95|0.99"}

# LLM metrics
nl2sql_llm_requests_total{provider="openai|anthropic|...", status="success|error"}
nl2sql_llm_request_duration_seconds{provider="...", quantile="0.5|0.95|0.99"}
nl2sql_llm_tokens_used_total{provider="...", type="input|output"}

# Cache metrics
nl2sql_cache_hits
nl2sql_cache_misses
nl2sql_cache_entries
nl2sql_cache_memory_bytes
nl2sql_cache_evictions_total

# Database metrics
nl2sql_db_connections_active
nl2sql_db_connections_idle
nl2sql_db_query_duration_seconds{quantile="0.5|0.95|0.99"}

# System metrics
process_cpu_seconds_total
process_resident_memory_bytes
process_open_fds

C. API Endpoints

Translation API:

POST /api/nl2sql/translate - Translate natural language to SQL
POST /api/nl2sql/analyze - Analyze query complexity (without translation)

Code Advisor API:

POST /api/nl2sql/advisor/analyze - Analyze SQL query
POST /api/nl2sql/advisor/recommend-indexes - Get index recommendations

Statistics API:

GET /api/nl2sql/statistics/tables - List table statistics
GET /api/nl2sql/statistics/tables/{table} - Get table statistics
POST /api/nl2sql/statistics/refresh - Trigger statistics refresh

Admin API:

POST /admin/features - Toggle feature flags
POST /admin/cache/clear - Clear cache
GET /admin/config - Get current configuration

Health API:

GET /health/liveness - Liveness probe
GET /health/readiness - Readiness probe
GET /health/startup - Startup probe

D. Emergency Contacts

On-Call Rotation:

Primary: oncall-nl2sql@heliosdb.com
Secondary: oncall-platform@heliosdb.com
Escalation: engineering-leads@heliosdb.com

PagerDuty:

Service: HeliosDB-NL2SQL-Production
Escalation Policy: Engineering-NL2SQL

Slack Channels:

#heliosdb-nl2sql-prod (production alerts)
#heliosdb-nl2sql-oncall (on-call coordination)
#heliosdb-platform (general platform issues)

End of Production Deployment Guide

Version: 1.0 Maintained by: HeliosDB Platform Team Last Review: October 28, 2025 Next Review: January 28, 2026

HeliosDB NL2SQL Production Deployment Guide

HeliosDB NL2SQL Production Deployment Guide

Table of Contents

1. Deployment Overview

1.1 Architecture Diagram

1.2 Component Dependencies

1.3 System Requirements

1.4 Prerequisites

2. Configuration

2.1 Complete TOML Configuration Reference

2.2 Environment Variables

2.3 Performance Tuning Parameters

2.4 Security Settings

3. Installation & Setup

3.1 Step-by-Step Deployment Procedure

Option A: Docker Deployment (Recommended for Quick Start)

Option B: Native Deployment (Production)

3.2 Database Initialization

3.3 Schema Setup

3.4 Initial Statistics Collection

4. Monitoring & Observability

4.1 Key Metrics to Monitor

4.2 Grafana Dashboard Configuration

4.3 Alert Thresholds

4.4 Log Aggregation Setup

4.5 Health Check Endpoints

5. Operational Runbooks

5.1 Common Operational Tasks

Task 1: Refresh Statistics Manually

Task 2: Adjust Refresh Priorities

Task 3: Clear Cache (Emergency)

Task 4: Update LLM Provider Priorities

Task 5: Enable/Disable Agentic Decomposition

5.2 Troubleshooting Procedures

Issue: High Query Latency

Issue: LLM Provider Failures

Issue: Stale Statistics

Issue: Memory Exhaustion

5.3 Performance Optimization

Optimization 1: Increase Cache Hit Rate

Optimization 2: Reduce Agentic Decomposition Overhead

Optimization 3: Optimize Statistics Collection

5.4 Capacity Planning

5.5 Backup and Recovery

6. Production Checklist

6.1 Pre-Deployment Validation

6.2 Post-Deployment Verification

6.3 Performance Baseline

6.4 Monitoring Setup Verification

6.5 Documentation Handoff

7. Scaling Guidelines

7.1 Horizontal Scaling Strategies

7.2 Resource Allocation

7.3 Load Balancing

7.4 Multi-Region Deployment

8. Troubleshooting Guide

8.1 Common Issues and Solutions

Issue 1: Query Translation Failures

Issue 2: Agentic Decomposition Timeouts

Issue 3: Statistics Collection Failures

Issue 4: Code Advisor Performance Issues

8.2 Debug Procedures

8.3 Log Analysis

8.4 Performance Tuning

Appendix

A. Configuration Template

B. Metrics Reference

C. API Endpoints

D. Emergency Contacts