Skip to content

Phase 2 Production Deployment - Quick Reference Guide

Phase 2 Production Deployment - Quick Reference Guide

Version: 1.0 Created: November 24, 2025 Target Audience: DevOps Engineers, SREs


One-Page Summary

Deployment Checklist

☐ Pre-Deployment
☐ Run pre-flight checks
☐ Create full backup
☐ Verify monitoring setup
☐ Review rollback plan
☐ Notify stakeholders
☐ Deployment
☐ Execute rolling deployment
☐ Monitor metrics continuously
☐ Run smoke tests
☐ Verify all nodes healthy
☐ Post-Deployment
☐ Run validation suite
☐ Update documentation
☐ Send completion notice
☐ Schedule retrospective

Key Commands

Terminal window
# Health check
heliosdb-cli health-check
# Deploy
./runbooks/rolling-deployment.sh v7.0.0 canary
# Backup
./runbooks/backup.sh full
# Rollback
./runbooks/rollback-deployment.sh v6.0.0
# Monitor
watch -n 5 'heliosdb-cli cluster-status'

Emergency Contacts

  • P0 Hotline: +1-XXX-XXX-XXXX
  • Slack: #incidents
  • PagerDuty: heliosdb-production

Quick Start Commands

Pre-Deployment

Terminal window
# Pre-flight checks
./runbooks/pre-deployment-checklist.sh
# Create backup
./runbooks/backup.sh full
# Verify monitoring
curl -sf http://localhost:9090/-/healthy
curl -sf http://localhost:3000/api/health

Deployment

Terminal window
# Canary deployment (recommended)
./runbooks/rolling-deployment.sh v7.0.0 canary
# Rolling deployment
./runbooks/rolling-deployment.sh v7.0.0 rolling
# Blue-green deployment
./runbooks/rolling-deployment.sh v7.0.0 blue-green

Post-Deployment

Terminal window
# Validation
./runbooks/post-deployment-validation.sh
# Check version
heliosdb-cli version --all-nodes
# Monitor
./scripts/monitor-deployment.sh 3600

Monitoring URLs

ServiceURLPort
Prometheushttp://localhost:90909090
Grafanahttp://localhost:30003000
Alertmanagerhttp://localhost:90939093
HeliosDB Metricshttp://localhost:90919091

Key Dashboards

  • Cluster Overview: Grafana → Dashboards → HeliosDB Overview
  • Performance: Grafana → Dashboards → HeliosDB Performance
  • Reliability: Grafana → Dashboards → HeliosDB Reliability

Critical Metrics

Health Indicators

Terminal window
# Cluster health
heliosdb-cli health-check
# Node status
heliosdb-cli list-nodes --status
# Quorum status
heliosdb-cli quorum-status
# Replication lag
heliosdb-cli replication-lag

Performance Metrics

Terminal window
# Query latency
heliosdb-cli metrics --query "query_latency_p99"
# Throughput
heliosdb-cli metrics --query "qps"
# Resource usage
heliosdb-cli resource-usage --all-nodes

Common Incidents & Quick Fixes

P0: Cluster Down

Terminal window
# 1. Check status
heliosdb-cli cluster-status
# 2. Attempt recovery
./runbooks/incident-p0-cluster-outage.sh
# 3. If failed, DR failover
./runbooks/dr-failover.sh

P1: High Latency

Terminal window
# 1. Identify slow queries
heliosdb-cli slow-query-log --last 10m
# 2. Check resources
heliosdb-cli resource-usage
# 3. Apply fixes
./runbooks/incident-p1-high-latency.sh

P2: Single Node Down

Terminal window
# Restart node
heliosdb-cli restart-node <node-id>
# If restart fails, replace
heliosdb-cli replace-node <node-id>

P2: Disk Space Low

Terminal window
# Free up space
heliosdb-cli cleanup-old-logs <node-id>
heliosdb-cli cleanup-old-snapshots <node-id>
# Expand storage
heliosdb-cli expand-storage <node-id> --size +100GB

Backup & Restore

Create Backup

Terminal window
# Full backup
./runbooks/backup.sh full
# Incremental backup
./runbooks/backup.sh incremental
# WAL archive
./runbooks/backup.sh wal

Verify Backup

Terminal window
# Verify latest backup
heliosdb-backup verify --latest
# Verify specific backup
heliosdb-backup verify --path /backup/heliosdb/2025-11-24

Restore

Terminal window
# Full restore
./runbooks/restore.sh full 2025-11-24
# Point-in-time restore
./runbooks/pitr-restore.sh "2025-11-24T14:30:00Z"

Configuration Files

Locations

  • Main config: /etc/heliosdb/config.yaml
  • Secrets: /etc/heliosdb/secrets.yaml
  • TLS certs: /etc/heliosdb/tls/
  • Logs: /var/log/heliosdb/
  • Data: /var/lib/heliosdb/data/

Key Settings

/etc/heliosdb/config.yaml
cluster:
name: production
nodes: 3
replication_factor: 3
performance:
cache_size: 32GB
max_connections: 1000
query_timeout: 30s
reliability:
backup_schedule: "0 2 * * *"
wal_archive: true
auto_failover: true

Alert Reference

Critical Alerts (P0)

AlertThresholdAction
ClusterQuorumLostquorum < 2Run incident-p0-cluster-outage.sh
NodeDownup == 0 for 2minRestart/replace node
DataCorruptionchecksum_mismatches > 0Run integrity check
BackupFailedbackup_failed > 0Investigate, retry backup

Warning Alerts (P1)

AlertThresholdAction
HighQueryLatencyp99 > 100ms for 5minRun incident-p1-high-latency.sh
HighCPUUsagecpu > 85% for 10minCheck queries, scale if needed
DiskSpaceLowfree < 15%Cleanup or expand storage
ReplicationLaglag > 5minCheck network, resources

Rollback Procedures

Immediate Rollback

Terminal window
# Within 1 hour of deployment
./runbooks/rollback-deployment.sh v6.0.0

Delayed Rollback

Terminal window
# After 1 hour (requires data sync)
./runbooks/delayed-rollback.sh v6.0.0

Rollback Verification

Terminal window
# Check version
heliosdb-cli version --all-nodes
# Verify health
heliosdb-cli health-check
# Run tests
./tests/smoke-tests.sh

Migration Quick Reference

Zero-Downtime Migration

Terminal window
# Phase 1: Prepare
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER prepare
# Phase 2: Initial sync
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER initial-sync
# Phase 3: Delta sync
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER delta-sync
# Phase 4: Cutover
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER cutover
# Phase 5: Verify
./scripts/deployment/zero-downtime-migration.sh $DB $CLUSTER verify

Migration Status

Terminal window
# Check progress
heliosdb-migrate status
# Check lag
heliosdb-migrate lag-check
# Verify data
./scripts/deployment/data-validation.sh

Performance Tuning

Quick Wins

Terminal window
# Increase cache size
heliosdb-cli set-config cache.size=64GB
# Increase connection pool
heliosdb-cli set-config connections.max=2000
# Enable query cache
heliosdb-cli set-config query_cache.enabled=true
# Update statistics
heliosdb-cli update-statistics --all-tables

Index Optimization

Terminal window
# Get recommendations
heliosdb-cli index-recommendations
# Create recommended indexes
heliosdb-cli create-recommended-indexes
# Rebuild indexes
heliosdb-cli rebuild-indexes --online

Security Quick Checks

Pre-Deployment Security

Terminal window
# Network security
sudo ufw status
# TLS certificates
openssl x509 -in /etc/heliosdb/tls/server.crt -noout -dates
# Encryption at rest
heliosdb-cli get-config encryption.at_rest
# Audit logging
heliosdb-cli get-config audit.enabled

Compliance Validation

Terminal window
# SOC2 controls
heliosdb-cli compliance-check --framework soc2
# HIPAA controls
heliosdb-cli compliance-check --framework hipaa
# GDPR controls
heliosdb-cli compliance-check --framework gdpr

Capacity Planning

Current Capacity

Terminal window
# Cluster capacity
heliosdb-cli capacity-report
# Resource usage
heliosdb-cli resource-usage --detailed
# Growth projection
heliosdb-cli capacity-forecast --days 90

Scaling Decisions

Scale Up When:

  • CPU > 70% sustained for 1 hour
  • Memory > 80% sustained for 30 minutes
  • Disk > 75% usage
  • P99 latency > 50ms sustained

Scale Down When:

  • CPU < 30% sustained for 7 days
  • Memory < 40% sustained for 7 days
  • Cost optimization opportunity identified

Troubleshooting Quick Tips

Slow Queries

Terminal window
# Identify
heliosdb-cli slow-query-log --last 1h
# Analyze
heliosdb-cli explain-slow-queries
# Fix
heliosdb-cli create-recommended-indexes

High CPU

Terminal window
# Identify culprit
heliosdb-cli queries --sort-by cpu
# Kill slow queries
heliosdb-cli kill-slow-queries --min-duration 30s
# Check for hotspots
heliosdb-cli hotspot-analysis

Memory Issues

Terminal window
# Check usage
heliosdb-cli memory-usage
# Flush cache
heliosdb-cli cache-flush
# Reduce cache size
heliosdb-cli set-config cache.size=16GB

Useful CLI Commands

Cluster Management

Terminal window
# Status
heliosdb-cli cluster-status
heliosdb-cli list-nodes
heliosdb-cli quorum-status
# Operations
heliosdb-cli cluster-start
heliosdb-cli cluster-stop --graceful
heliosdb-cli cluster-restart

Node Management

Terminal window
# Node operations
heliosdb-cli node-status <node-id>
heliosdb-cli restart-node <node-id>
heliosdb-cli drain-node <node-id>
heliosdb-cli replace-node <node-id>

Query Management

Terminal window
# Active queries
heliosdb-cli queries
heliosdb-cli slow-query-log
# Query operations
heliosdb-cli kill-query <query-id>
heliosdb-cli kill-slow-queries
heliosdb-cli explain-query <query-id>

Backup/Restore

Terminal window
# Backup
heliosdb-backup full --output /backup/today
heliosdb-backup incremental --base /backup/base
heliosdb-backup wal-archive
# Restore
heliosdb-restore full --input /backup/today
heliosdb-restore pitr --target-time "2025-11-24T14:00:00Z"

Log Locations

Terminal window
# Application logs
/var/log/heliosdb/heliosdb.log
# Slow query log
/var/log/heliosdb/slow-queries.log
# Audit log
/var/log/heliosdb/audit.log
# Error log
/var/log/heliosdb/error.log
# View recent logs
tail -f /var/log/heliosdb/heliosdb.log
# Search logs
grep "ERROR" /var/log/heliosdb/heliosdb.log

Support Resources

Documentation

Contact

  • Slack: #production-support
  • Email: support@heliosdb.io
  • Hotline: +1-XXX-XXX-XXXX (24/7)
  • PagerDuty: heliosdb-production

On-Call

Terminal window
# Check on-call
heliosdb-cli oncall-status
# Page on-call
heliosdb-cli page-oncall "Brief description"

Keyboard Shortcuts for Common Tasks

Terminal window
# Add to ~/.bashrc for quick access
alias hdb-health='heliosdb-cli health-check'
alias hdb-status='heliosdb-cli cluster-status'
alias hdb-nodes='heliosdb-cli list-nodes'
alias hdb-backup='./runbooks/backup.sh full'
alias hdb-logs='tail -f /var/log/heliosdb/heliosdb.log'
alias hdb-queries='heliosdb-cli queries'
alias hdb-metrics='heliosdb-cli metrics'
alias hdb-incident='./runbooks/incident-p0-cluster-outage.sh'

Keep This Handy: Print or bookmark for quick reference during deployments and incidents.

Last Updated: November 24, 2025 Document Status: Production Ready