Phase 2 Production Deployment - Quick Reference Guide

Version: 1.0 Created: November 24, 2025 Target Audience: DevOps Engineers, SREs

One-Page Summary

Deployment Checklist

☐ Pre-Deployment
  ☐ Run pre-flight checks
  ☐ Create full backup
  ☐ Verify monitoring setup
  ☐ Review rollback plan
  ☐ Notify stakeholders

☐ Deployment
  ☐ Execute rolling deployment
  ☐ Monitor metrics continuously
  ☐ Run smoke tests
  ☐ Verify all nodes healthy

☐ Post-Deployment
  ☐ Run validation suite
  ☐ Update documentation
  ☐ Send completion notice
  ☐ Schedule retrospective

Key Commands

# Health check
heliosdb-cli health-check

# Deploy
./runbooks/rolling-deployment.sh v7.0.0 canary

# Backup
./runbooks/backup.sh full

# Rollback
./runbooks/rollback-deployment.sh v6.0.0

# Monitor
watch -n 5 'heliosdb-cli cluster-status'

Emergency Contacts

P0 Hotline: +1-XXX-XXX-XXXX
Slack: #incidents
PagerDuty: heliosdb-production

Quick Start Commands

Pre-Deployment

# Pre-flight checks
./runbooks/pre-deployment-checklist.sh

# Create backup
./runbooks/backup.sh full

# Verify monitoring
curl -sf http://localhost:9090/-/healthy
curl -sf http://localhost:3000/api/health

Deployment

# Canary deployment (recommended)
./runbooks/rolling-deployment.sh v7.0.0 canary

# Rolling deployment
./runbooks/rolling-deployment.sh v7.0.0 rolling

# Blue-green deployment
./runbooks/rolling-deployment.sh v7.0.0 blue-green

Post-Deployment

# Validation
./runbooks/post-deployment-validation.sh

# Check version
heliosdb-cli version --all-nodes

# Monitor
./scripts/monitor-deployment.sh 3600

Monitoring URLs

Service	URL	Port
Prometheus	http://localhost:9090	9090
Grafana	http://localhost:3000	3000
Alertmanager	http://localhost:9093	9093
HeliosDB Metrics	http://localhost:9091	9091

Key Dashboards

Cluster Overview: Grafana → Dashboards → HeliosDB Overview
Performance: Grafana → Dashboards → HeliosDB Performance
Reliability: Grafana → Dashboards → HeliosDB Reliability

Critical Metrics

Health Indicators

# Cluster health
heliosdb-cli health-check

# Node status
heliosdb-cli list-nodes --status

# Quorum status
heliosdb-cli quorum-status

# Replication lag
heliosdb-cli replication-lag

Performance Metrics

# Query latency
heliosdb-cli metrics --query "query_latency_p99"

# Throughput
heliosdb-cli metrics --query "qps"

# Resource usage
heliosdb-cli resource-usage --all-nodes

Common Incidents & Quick Fixes

P0: Cluster Down

# 1. Check status
heliosdb-cli cluster-status

# 2. Attempt recovery
./runbooks/incident-p0-cluster-outage.sh

# 3. If failed, DR failover
./runbooks/dr-failover.sh

P1: High Latency

# 1. Identify slow queries
heliosdb-cli slow-query-log --last 10m

# 2. Check resources
heliosdb-cli resource-usage

# 3. Apply fixes
./runbooks/incident-p1-high-latency.sh

P2: Single Node Down

# Restart node
heliosdb-cli restart-node <node-id>

# If restart fails, replace
heliosdb-cli replace-node <node-id>

P2: Disk Space Low

# Free up space
heliosdb-cli cleanup-old-logs <node-id>
heliosdb-cli cleanup-old-snapshots <node-id>

# Expand storage
heliosdb-cli expand-storage <node-id> --size +100GB

Backup & Restore

Create Backup

# Full backup
./runbooks/backup.sh full

# Incremental backup
./runbooks/backup.sh incremental

# WAL archive
./runbooks/backup.sh wal

Verify Backup

# Verify latest backup
heliosdb-backup verify --latest

# Verify specific backup
heliosdb-backup verify --path /backup/heliosdb/2025-11-24

Restore

# Full restore
./runbooks/restore.sh full 2025-11-24

# Point-in-time restore
./runbooks/pitr-restore.sh "2025-11-24T14:30:00Z"

Configuration Files

Locations

Main config: /etc/heliosdb/config.yaml
Secrets: /etc/heliosdb/secrets.yaml
TLS certs: /etc/heliosdb/tls/
Logs: /var/log/heliosdb/
Data: /var/lib/heliosdb/data/

Key Settings

cluster:
  name: production
  nodes: 3
  replication_factor: 3

performance:
  cache_size: 32GB
  max_connections: 1000
  query_timeout: 30s

reliability:
  backup_schedule: "0 2 * * *"
  wal_archive: true
  auto_failover: true

Alert Reference

Critical Alerts (P0)

Alert	Threshold	Action
ClusterQuorumLost	quorum < 2	Run incident-p0-cluster-outage.sh
NodeDown	up == 0 for 2min	Restart/replace node
DataCorruption	checksum_mismatches > 0	Run integrity check
BackupFailed	backup_failed > 0	Investigate, retry backup

Warning Alerts (P1)

Alert	Threshold	Action
HighQueryLatency	p99 > 100ms for 5min	Run incident-p1-high-latency.sh
HighCPUUsage	cpu > 85% for 10min	Check queries, scale if needed
DiskSpaceLow	free < 15%	Cleanup or expand storage
ReplicationLag	lag > 5min	Check network, resources

Rollback Procedures

Immediate Rollback

# Within 1 hour of deployment
./runbooks/rollback-deployment.sh v6.0.0

Delayed Rollback

# After 1 hour (requires data sync)
./runbooks/delayed-rollback.sh v6.0.0

Rollback Verification

# Check version
heliosdb-cli version --all-nodes

# Verify health
heliosdb-cli health-check

# Run tests
./tests/smoke-tests.sh

Migration Quick Reference

Zero-Downtime Migration

# Phase 1: Prepare
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER prepare

# Phase 2: Initial sync
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER initial-sync

# Phase 3: Delta sync
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER delta-sync

# Phase 4: Cutover
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER cutover

# Phase 5: Verify
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER verify

Migration Status

# Check progress
heliosdb-migrate status

# Check lag
heliosdb-migrate lag-check

# Verify data
./scripts/deployment/data-validation.sh

Performance Tuning

Quick Wins

# Increase cache size
heliosdb-cli set-config cache.size=64GB

# Increase connection pool
heliosdb-cli set-config connections.max=2000

# Enable query cache
heliosdb-cli set-config query_cache.enabled=true

# Update statistics
heliosdb-cli update-statistics --all-tables

Index Optimization

# Get recommendations
heliosdb-cli index-recommendations

# Create recommended indexes
heliosdb-cli create-recommended-indexes

# Rebuild indexes
heliosdb-cli rebuild-indexes --online

Security Quick Checks

Pre-Deployment Security

# Network security
sudo ufw status

# TLS certificates
openssl x509 -in /etc/heliosdb/tls/server.crt -noout -dates

# Encryption at rest
heliosdb-cli get-config encryption.at_rest

# Audit logging
heliosdb-cli get-config audit.enabled

Compliance Validation

# SOC2 controls
heliosdb-cli compliance-check --framework soc2

# HIPAA controls
heliosdb-cli compliance-check --framework hipaa

# GDPR controls
heliosdb-cli compliance-check --framework gdpr

Capacity Planning

Current Capacity

# Cluster capacity
heliosdb-cli capacity-report

# Resource usage
heliosdb-cli resource-usage --detailed

# Growth projection
heliosdb-cli capacity-forecast --days 90

Scaling Decisions

Scale Up When:

CPU > 70% sustained for 1 hour
Memory > 80% sustained for 30 minutes
Disk > 75% usage
P99 latency > 50ms sustained

Scale Down When:

CPU < 30% sustained for 7 days
Memory < 40% sustained for 7 days
Cost optimization opportunity identified

Troubleshooting Quick Tips

Slow Queries

# Identify
heliosdb-cli slow-query-log --last 1h

# Analyze
heliosdb-cli explain-slow-queries

# Fix
heliosdb-cli create-recommended-indexes

High CPU

# Identify culprit
heliosdb-cli queries --sort-by cpu

# Kill slow queries
heliosdb-cli kill-slow-queries --min-duration 30s

# Check for hotspots
heliosdb-cli hotspot-analysis

Memory Issues

# Check usage
heliosdb-cli memory-usage

# Flush cache
heliosdb-cli cache-flush

# Reduce cache size
heliosdb-cli set-config cache.size=16GB

Useful CLI Commands

Cluster Management

# Status
heliosdb-cli cluster-status
heliosdb-cli list-nodes
heliosdb-cli quorum-status

# Operations
heliosdb-cli cluster-start
heliosdb-cli cluster-stop --graceful
heliosdb-cli cluster-restart

Node Management

# Node operations
heliosdb-cli node-status <node-id>
heliosdb-cli restart-node <node-id>
heliosdb-cli drain-node <node-id>
heliosdb-cli replace-node <node-id>

Query Management

# Active queries
heliosdb-cli queries
heliosdb-cli slow-query-log

# Query operations
heliosdb-cli kill-query <query-id>
heliosdb-cli kill-slow-queries
heliosdb-cli explain-query <query-id>

Backup/Restore

# Backup
heliosdb-backup full --output /backup/today
heliosdb-backup incremental --base /backup/base
heliosdb-backup wal-archive

# Restore
heliosdb-restore full --input /backup/today
heliosdb-restore pitr --target-time "2025-11-24T14:00:00Z"

Log Locations

# Application logs
/var/log/heliosdb/heliosdb.log

# Slow query log
/var/log/heliosdb/slow-queries.log

# Audit log
/var/log/heliosdb/audit.log

# Error log
/var/log/heliosdb/error.log

# View recent logs
tail -f /var/log/heliosdb/heliosdb.log

# Search logs
grep "ERROR" /var/log/heliosdb/heliosdb.log

Support Resources

Documentation

Full Deployment Plan: PHASE2_PRODUCTION_DEPLOYMENT_PLAN.md
Operations Runbooks: ../operations/PHASE2_OPERATIONS_RUNBOOKS.md
Executive Summary: PHASE2_PRODUCTION_DEPLOYMENT_EXECUTIVE_SUMMARY.md

Contact

Slack: #production-support
Email: support@heliosdb.io
Hotline: +1-XXX-XXX-XXXX (24/7)
PagerDuty: heliosdb-production

On-Call

# Check on-call
heliosdb-cli oncall-status

# Page on-call
heliosdb-cli page-oncall "Brief description"

Keyboard Shortcuts for Common Tasks

# Add to ~/.bashrc for quick access

alias hdb-health='heliosdb-cli health-check'
alias hdb-status='heliosdb-cli cluster-status'
alias hdb-nodes='heliosdb-cli list-nodes'
alias hdb-backup='./runbooks/backup.sh full'
alias hdb-logs='tail -f /var/log/heliosdb/heliosdb.log'
alias hdb-queries='heliosdb-cli queries'
alias hdb-metrics='heliosdb-cli metrics'
alias hdb-incident='./runbooks/incident-p0-cluster-outage.sh'

Keep This Handy: Print or bookmark for quick reference during deployments and incidents.

Last Updated: November 24, 2025 Document Status: Production Ready