Skip to content

F1.3 Flink Streaming - Production Deployment Guide

F1.3 Flink Streaming - Production Deployment Guide

Feature: Real-Time Stream Processing Engine Version: v5.0-v5.4 Status: Production-Ready (152 Tests Passing) Date: October 2025


📋 Table of Contents

  1. Overview
  2. Quick Start
  3. Docker Deployment
  4. Kubernetes Deployment
  5. Monitoring & Observability
  6. Operational Runbook
  7. Security Configuration
  8. Capacity Planning
  9. Troubleshooting

1. Overview

1.1 Deployment Options

OptionUse CaseComplexityScalabilityCost
Docker ComposeDev/Test, Small prodLowLimited (1-4 nodes)$100-500/month
KubernetesProduction, EnterpriseMediumHigh (unlimited)$500-5000/month
Managed K8s (EKS/AKS/GKE)Production, Cloud-nativeMedium-HighVery High$1000-10000/month
Bare MetalOn-premise, High performanceHighMediumHardware dependent

Recommended: Kubernetes (EKS/AKS/GKE) for production


1.2 System Requirements

Minimum (Single Node):

  • CPU: 4 cores (8 threads)
  • RAM: 8 GB
  • Storage: 50 GB SSD
  • Network: 1 Gbps
  • OS: Linux (Ubuntu 22.04, RHEL 8+, Amazon Linux 2)

Recommended (Production Node):

  • CPU: 8-16 cores
  • RAM: 16-32 GB
  • Storage: 100-500 GB NVMe SSD
  • Network: 10 Gbps
  • OS: Ubuntu 22.04 LTS

High-Performance (Large Deployments):

  • CPU: 32-64 cores
  • RAM: 64-128 GB
  • Storage: 1-2 TB NVMe SSD
  • Network: 25-100 Gbps
  • OS: Ubuntu 22.04 LTS + kernel tuning

1.3 Prerequisites

Required:

  • Docker 24.0+ (if using containers)
  • Kubernetes 1.28+ (if using K8s)
  • kubectl CLI
  • Helm 3.12+ (optional, for easy K8s deployment)
  • Git (for cloning repository)
  • Rust 1.75+ (for building from source)

Optional:

  • Prometheus (monitoring)
  • Grafana (dashboards)
  • Kafka/Pulsar (event sources)
  • Redis (for distributed caching)

2. Quick Start

2.1 Docker Compose (5 Minutes)

For: Development, testing, small production deployments

Terminal window
# 1. Clone repository
git clone https://github.com/danimoya/HeliosDB.git
cd HeliosDB/heliosdb-streaming
# 2. Create docker-compose.yml (see section 3.2)
# Copy the provided docker-compose.yml to your directory
# 3. Start services
docker-compose up -d
# 4. Check status
docker-compose ps
# 5. View logs
docker-compose logs -f heliosdb-streaming
# 6. Run tests
docker-compose exec heliosdb-streaming cargo test
# Expected: 152 tests passing in ~2 seconds

Access:

  • HeliosDB Streaming: http://localhost:8080
  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3000 (admin/admin)

2.2 Kubernetes (15 Minutes)

For: Production deployments, auto-scaling, high availability

Terminal window
# 1. Create namespace
kubectl create namespace heliosdb
# 2. Deploy HeliosDB Streaming
kubectl apply -f k8s/heliosdb-streaming-deployment.yaml
# 3. Create service
kubectl apply -f k8s/heliosdb-streaming-service.yaml
# 4. Deploy monitoring (optional)
kubectl apply -f k8s/prometheus.yaml
kubectl apply -f k8s/grafana.yaml
# 5. Check deployment
kubectl get pods -n heliosdb
# Expected output:
# NAME READY STATUS RESTARTS AGE
# heliosdb-streaming-0 1/1 Running 0 2m
# heliosdb-streaming-1 1/1 Running 0 2m
# heliosdb-streaming-2 1/1 Running 0 2m
# heliosdb-streaming-3 1/1 Running 0 2m

3. Docker Deployment

3.1 Dockerfile

Create Dockerfile in heliosdb-streaming/:

# Multi-stage build for optimal image size
# Stage 1: Builder
FROM rust:1.75-slim as builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
pkg-config \
libssl-dev \
cmake \
protobuf-compiler \
&& rm -rf /var/lib/apt/lists/*
# Create app directory
WORKDIR /app
# Copy Cargo files
COPY Cargo.toml Cargo.lock ./
COPY heliosdb-streaming/Cargo.toml ./heliosdb-streaming/
# Copy source code
COPY heliosdb-streaming/src ./heliosdb-streaming/src
# Build release binary
RUN cargo build --release --manifest-path heliosdb-streaming/Cargo.toml
# Run tests to validate
RUN cargo test --release --manifest-path heliosdb-streaming/Cargo.toml
# Stage 2: Runtime
FROM ubuntu:22.04
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
ca-certificates \
libssl3 \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN useradd -m -u 1000 heliosdb
# Create directories
RUN mkdir -p /app/data /app/logs /app/checkpoints && \
chown -R heliosdb:heliosdb /app
# Switch to non-root user
USER heliosdb
WORKDIR /app
# Copy binary from builder
COPY --from=builder --chown=heliosdb:heliosdb \
/app/target/release/heliosdb-streaming /app/
# Copy config
COPY --chown=heliosdb:heliosdb heliosdb-streaming/config.yaml /app/
# Expose ports
EXPOSE 8080 9090
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Set environment variables
ENV RUST_LOG=info
ENV RUST_BACKTRACE=1
ENV HELIOSDB_DATA_DIR=/app/data
ENV HELIOSDB_CHECKPOINT_DIR=/app/checkpoints
# Run application
CMD ["/app/heliosdb-streaming"]

Build:

Terminal window
docker build -t heliosdb/streaming:v5.4 -f heliosdb-streaming/Dockerfile .

Image Size: ~150 MB (optimized)


3.2 docker-compose.yml

Create docker-compose.yml:

version: '3.8'
services:
# HeliosDB Streaming
heliosdb-streaming:
image: heliosdb/streaming:v5.4
container_name: heliosdb-streaming
restart: unless-stopped
ports:
- "8080:8080" # API
- "9090:9090" # Metrics
volumes:
- ./data:/app/data
- ./checkpoints:/app/checkpoints
- ./logs:/app/logs
- ./config.yaml:/app/config.yaml:ro
environment:
- RUST_LOG=info
- RUST_BACKTRACE=1
- HELIOSDB_THREADS=4
- HELIOSDB_MAX_MEMORY_MB=4096
- HELIOSDB_CHECKPOINT_INTERVAL_SECS=60
- HELIOSDB_KMS_PROVIDER=local # or aws_kms, azure_keyvault
networks:
- heliosdb-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
# Kafka (optional, for event sources)
kafka:
image: confluentinc/cp-kafka:7.5.0
container_name: kafka
restart: unless-stopped
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- heliosdb-network
depends_on:
- zookeeper
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
container_name: zookeeper
restart: unless-stopped
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
networks:
- heliosdb-network
# Redis (optional, for distributed caching)
redis:
image: redis:7.2-alpine
container_name: redis
restart: unless-stopped
ports:
- "6379:6379"
command: redis-server --appendonly yes
volumes:
- redis-data:/data
networks:
- heliosdb-network
# Prometheus (monitoring)
prometheus:
image: prom/prometheus:v2.47.0
container_name: prometheus
restart: unless-stopped
ports:
- "9091:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
networks:
- heliosdb-network
# Grafana (dashboards)
grafana:
image: grafana/grafana:10.1.0
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- ./monitoring/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml:ro
- ./monitoring/grafana-dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml:ro
- ./monitoring/dashboards:/var/lib/grafana/dashboards:ro
- grafana-data:/var/lib/grafana
networks:
- heliosdb-network
depends_on:
- prometheus
networks:
heliosdb-network:
driver: bridge
volumes:
redis-data:
prometheus-data:
grafana-data:

Usage:

Terminal window
# Start all services
docker-compose up -d
# Scale HeliosDB instances
docker-compose up -d --scale heliosdb-streaming=4
# Stop services
docker-compose down
# Stop and remove volumes
docker-compose down -v

3.3 Configuration File

Create config.yaml:

# HeliosDB Streaming Configuration
# Server Configuration
server:
host: "0.0.0.0"
port: 8080
metrics_port: 9090
max_connections: 1000
# Threading
threads:
worker_threads: 4
blocking_threads: 4
# Memory Management
memory:
max_memory_mb: 4096
buffer_pool_size_mb: 512
gc_interval_secs: 300
# Checkpoint Configuration
checkpoint:
enabled: true
interval_secs: 60
timeout_secs: 10
min_pause_secs: 30
max_concurrent: 1
directory: "/app/checkpoints"
compression: true
encryption: true
# State Backend
state_backend:
type: "rocksdb" # or "inmemory" for testing
path: "/app/data/rocksdb"
cache_size_mb: 1024
block_cache_mb: 512
# KMS Configuration (choose one)
kms:
provider: "local" # or "aws_kms", "azure_keyvault", "gcp_kms"
# AWS KMS (uncomment if using AWS)
# aws:
# key_id: "arn:aws:kms:us-east-1:123456789012:key/abc-123"
# region: "us-east-1"
# Azure Key Vault (uncomment if using Azure)
# azure:
# vault_url: "https://my-vault.vault.azure.net/"
# key_name: "heliosdb-master-key"
# Local (for dev/test)
local:
master_key_file: "/app/data/master.key"
# Key Rotation Policy
key_rotation:
enabled: true
interval_secs: 2592000 # 30 days
max_previous_keys: 3
auto_rotate: true
# Backpressure Configuration
backpressure:
strategy: "adaptive" # or "block", "drop_oldest", "drop_newest", "signal"
initial_buffer_size: 100
min_buffer_size: 10
max_buffer_size: 200
# Connectors
connectors:
kafka:
enabled: true
bootstrap_servers:
- "kafka:9092"
consumer_group: "heliosdb-streaming"
redis:
enabled: true
host: "redis"
port: 6379
pool_size: 10
# Monitoring
monitoring:
prometheus:
enabled: true
port: 9090
logging:
level: "info" # debug, info, warn, error
format: "json" # or "text"
output: "/app/logs/heliosdb.log"
# Performance Tuning
performance:
event_batch_size: 1000
window_size_secs: 60
watermark_interval_secs: 1
allowed_lateness_secs: 60

4. Kubernetes Deployment

4.1 Namespace

Create k8s/namespace.yaml:

apiVersion: v1
kind: Namespace
metadata:
name: heliosdb
labels:
app: heliosdb
environment: production

4.2 ConfigMap

Create k8s/configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
name: heliosdb-streaming-config
namespace: heliosdb
data:
config.yaml: |
server:
host: "0.0.0.0"
port: 8080
metrics_port: 9090
threads:
worker_threads: 8
blocking_threads: 4
memory:
max_memory_mb: 8192
buffer_pool_size_mb: 1024
checkpoint:
enabled: true
interval_secs: 60
directory: "/data/checkpoints"
compression: true
encryption: true
state_backend:
type: "rocksdb"
path: "/data/rocksdb"
cache_size_mb: 2048
kms:
provider: "aws_kms"
aws:
key_id: "${AWS_KMS_KEY_ID}"
region: "${AWS_REGION}"
backpressure:
strategy: "adaptive"
initial_buffer_size: 100
monitoring:
prometheus:
enabled: true
port: 9090
logging:
level: "info"
format: "json"

4.3 Secret

Create k8s/secret.yaml:

apiVersion: v1
kind: Secret
metadata:
name: heliosdb-streaming-secrets
namespace: heliosdb
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: "your-aws-access-key"
AWS_SECRET_ACCESS_KEY: "your-aws-secret-key"
AWS_REGION: "us-east-1"
AWS_KMS_KEY_ID: "arn:aws:kms:us-east-1:123456789012:key/abc-123"

Note: Use Kubernetes Secrets or external secret managers (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault) for production.


4.4 StatefulSet

Create k8s/statefulset.yaml:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: heliosdb-streaming
namespace: heliosdb
labels:
app: heliosdb-streaming
version: v5.4
spec:
serviceName: heliosdb-streaming
replicas: 4
selector:
matchLabels:
app: heliosdb-streaming
template:
metadata:
labels:
app: heliosdb-streaming
version: v5.4
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: heliosdb-streaming
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsNonRoot: true
containers:
- name: heliosdb-streaming
image: heliosdb/streaming:v5.4
imagePullPolicy: IfNotPresent
ports:
- name: api
containerPort: 8080
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: RUST_LOG
value: "info"
- name: RUST_BACKTRACE
value: "1"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
envFrom:
- secretRef:
name: heliosdb-streaming-secrets
volumeMounts:
- name: config
mountPath: /app/config.yaml
subPath: config.yaml
- name: data
mountPath: /data
- name: logs
mountPath: /app/logs
resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
cpu: "8"
memory: "16Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumes:
- name: config
configMap:
name: heliosdb-streaming-config
- name: logs
emptyDir: {}
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gp3 # or "standard", "premium-rwo" depending on cloud provider
resources:
requests:
storage: 100Gi

4.5 Service

Create k8s/service.yaml:

apiVersion: v1
kind: Service
metadata:
name: heliosdb-streaming
namespace: heliosdb
labels:
app: heliosdb-streaming
spec:
type: ClusterIP
clusterIP: None # Headless service for StatefulSet
ports:
- name: api
port: 8080
targetPort: 8080
protocol: TCP
- name: metrics
port: 9090
targetPort: 9090
protocol: TCP
selector:
app: heliosdb-streaming
---
apiVersion: v1
kind: Service
metadata:
name: heliosdb-streaming-lb
namespace: heliosdb
labels:
app: heliosdb-streaming
spec:
type: LoadBalancer
ports:
- name: api
port: 8080
targetPort: 8080
protocol: TCP
selector:
app: heliosdb-streaming

4.6 ServiceAccount & RBAC

Create k8s/rbac.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
name: heliosdb-streaming
namespace: heliosdb
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: heliosdb-streaming
namespace: heliosdb
rules:
- apiGroups: [""]
resources: ["pods", "configmaps", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: heliosdb-streaming
namespace: heliosdb
subjects:
- kind: ServiceAccount
name: heliosdb-streaming
namespace: heliosdb
roleRef:
kind: Role
name: heliosdb-streaming
apiGroup: rbac.authorization.k8s.io

4.7 HorizontalPodAutoscaler (HPA)

Create k8s/hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: heliosdb-streaming-hpa
namespace: heliosdb
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: heliosdb-streaming
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max

4.8 Deployment Commands

Terminal window
# Create namespace
kubectl apply -f k8s/namespace.yaml
# Deploy RBAC
kubectl apply -f k8s/rbac.yaml
# Create ConfigMap and Secrets
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
# Deploy StatefulSet
kubectl apply -f k8s/statefulset.yaml
# Create Services
kubectl apply -f k8s/service.yaml
# Deploy HPA (optional)
kubectl apply -f k8s/hpa.yaml
# Check deployment
kubectl get all -n heliosdb
# Check logs
kubectl logs -f heliosdb-streaming-0 -n heliosdb
# Scale manually
kubectl scale statefulset heliosdb-streaming --replicas=8 -n heliosdb
# Rolling update
kubectl set image statefulset/heliosdb-streaming heliosdb-streaming=heliosdb/streaming:v5.5 -n heliosdb

5. Monitoring & Observability

5.1 Prometheus Configuration

Create monitoring/prometheus.yml:

global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'heliosdb-production'
environment: 'prod'
scrape_configs:
# HeliosDB Streaming metrics
- job_name: 'heliosdb-streaming'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- heliosdb
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: heliosdb-streaming
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: kubernetes_pod_name

5.2 Grafana Dashboards

Create monitoring/dashboards/heliosdb-streaming.json:

{
"dashboard": {
"title": "HeliosDB Streaming - Production",
"tags": ["heliosdb", "streaming", "production"],
"timezone": "browser",
"panels": [
{
"title": "Throughput (Events/sec)",
"type": "graph",
"targets": [
{
"expr": "rate(heliosdb_events_processed_total[5m])",
"legendFormat": "{{pod}}"
}
]
},
{
"title": "Latency (p99)",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.99, heliosdb_latency_seconds_bucket)",
"legendFormat": "p99"
}
]
},
{
"title": "Memory Usage",
"type": "graph",
"targets": [
{
"expr": "heliosdb_memory_usage_bytes / 1024 / 1024",
"legendFormat": "{{pod}} - Memory (MB)"
}
]
},
{
"title": "Backpressure Events",
"type": "graph",
"targets": [
{
"expr": "rate(heliosdb_backpressure_events_total[5m])",
"legendFormat": "{{pod}}"
}
]
},
{
"title": "Checkpoint Duration",
"type": "graph",
"targets": [
{
"expr": "heliosdb_checkpoint_duration_seconds",
"legendFormat": "{{pod}}"
}
]
}
]
}
}

5.3 Key Metrics

MetricDescriptionAlert Threshold
heliosdb_events_processed_totalTotal events processed< 100/sec (low throughput)
heliosdb_latency_secondsEvent processing latencyp99 > 10ms
heliosdb_memory_usage_bytesCurrent memory usage> 80% of limit
heliosdb_backpressure_events_totalBackpressure triggers> 100/min
heliosdb_checkpoint_duration_secondsCheckpoint time> 1 second
heliosdb_checkpoint_failures_totalFailed checkpoints> 0
heliosdb_errors_totalTotal errors> 10/min

5.4 Alerting Rules

Create monitoring/alerts.yaml:

groups:
- name: heliosdb_streaming
interval: 30s
rules:
# High latency
- alert: HighLatency
expr: histogram_quantile(0.99, heliosdb_latency_seconds_bucket) > 0.01
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected (p99 > 10ms)"
description: "Pod {{ $labels.pod }} has p99 latency of {{ $value }}s"
# Low throughput
- alert: LowThroughput
expr: rate(heliosdb_events_processed_total[5m]) < 100
for: 10m
labels:
severity: warning
annotations:
summary: "Low throughput detected (< 100 events/sec)"
# High memory usage
- alert: HighMemoryUsage
expr: heliosdb_memory_usage_bytes / heliosdb_memory_limit_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage (> 80%)"
# Checkpoint failures
- alert: CheckpointFailures
expr: increase(heliosdb_checkpoint_failures_total[5m]) > 0
labels:
severity: critical
annotations:
summary: "Checkpoint failures detected"
# Pod down
- alert: PodDown
expr: up{job="heliosdb-streaming"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "HeliosDB pod is down"

6. Operational Runbook

6.1 Starting the Service

Docker Compose:

Terminal window
# Start all services
docker-compose up -d
# Verify
docker-compose ps
docker-compose logs -f heliosdb-streaming
# Check health
curl http://localhost:8080/health

Kubernetes:

Terminal window
# Deploy (if not already deployed)
kubectl apply -f k8s/
# Check status
kubectl get pods -n heliosdb
kubectl logs -f heliosdb-streaming-0 -n heliosdb
# Check health
kubectl port-forward -n heliosdb heliosdb-streaming-0 8080:8080
curl http://localhost:8080/health

6.2 Stopping the Service

Graceful Shutdown (recommended):

Terminal window
# Docker
docker-compose stop heliosdb-streaming
# Kubernetes
kubectl scale statefulset heliosdb-streaming --replicas=0 -n heliosdb

Force Shutdown (if needed):

Terminal window
# Docker
docker-compose kill heliosdb-streaming
# Kubernetes
kubectl delete pod heliosdb-streaming-0 -n heliosdb --grace-period=0 --force

6.3 Scaling

Horizontal Scaling:

Terminal window
# Docker Compose
docker-compose up -d --scale heliosdb-streaming=8
# Kubernetes (manual)
kubectl scale statefulset heliosdb-streaming --replicas=8 -n heliosdb
# Kubernetes (auto-scaling with HPA)
# HPA will automatically scale based on CPU/memory
kubectl get hpa -n heliosdb

Vertical Scaling (increase resources):

Terminal window
# Edit StatefulSet
kubectl edit statefulset heliosdb-streaming -n heliosdb
# Update resources:
# resources:
# requests:
# cpu: "4"
# memory: "16Gi"
# limits:
# cpu: "16"
# memory: "32Gi"
# Rolling restart
kubectl rollout restart statefulset/heliosdb-streaming -n heliosdb

6.4 Rolling Updates

Terminal window
# Update image
kubectl set image statefulset/heliosdb-streaming \
heliosdb-streaming=heliosdb/streaming:v5.5 \
-n heliosdb
# Check rollout status
kubectl rollout status statefulset/heliosdb-streaming -n heliosdb
# Rollback if needed
kubectl rollout undo statefulset/heliosdb-streaming -n heliosdb

6.5 Backup & Restore

Backup Checkpoints:

Terminal window
# Docker (copy from container)
docker cp heliosdb-streaming:/app/checkpoints ./backup/checkpoints-$(date +%Y%m%d)
# Kubernetes (copy from pod)
kubectl cp heliosdb/heliosdb-streaming-0:/data/checkpoints \
./backup/checkpoints-$(date +%Y%m%d)
# Backup to S3
aws s3 sync /data/checkpoints s3://heliosdb-backups/checkpoints/$(date +%Y%m%d)/

Restore Checkpoints:

Terminal window
# Docker
docker cp ./backup/checkpoints-20251029 heliosdb-streaming:/app/checkpoints
# Kubernetes
kubectl cp ./backup/checkpoints-20251029 \
heliosdb/heliosdb-streaming-0:/data/checkpoints
# Restore from S3
aws s3 sync s3://heliosdb-backups/checkpoints/20251029/ /data/checkpoints/

6.6 Log Management

View Logs:

Terminal window
# Docker
docker-compose logs -f heliosdb-streaming
# Kubernetes
kubectl logs -f heliosdb-streaming-0 -n heliosdb
# All pods
kubectl logs -l app=heliosdb-streaming -n heliosdb --tail=100

Log Aggregation (ELK/EFK Stack):

Terminal window
# Install Elasticsearch + Kibana (optional)
helm repo add elastic https://helm.elastic.co
helm install elasticsearch elastic/elasticsearch -n heliosdb
helm install kibana elastic/kibana -n heliosdb
# Configure Fluentd/Filebeat to ship logs

7. Security Configuration

7.1 TLS/SSL

Create TLS secret:

Terminal window
# Generate self-signed cert (dev/test only)
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt \
-subj "/CN=heliosdb-streaming.heliosdb.svc.cluster.local"
# Create Kubernetes secret
kubectl create secret tls heliosdb-streaming-tls \
--cert=tls.crt --key=tls.key \
-n heliosdb

Update StatefulSet to use TLS:

# Add to container volumes
- name: tls
secret:
secretName: heliosdb-streaming-tls
# Add to volumeMounts
- name: tls
mountPath: /app/tls
readOnly: true
# Add environment variable
- name: HELIOSDB_TLS_ENABLED
value: "true"
- name: HELIOSDB_TLS_CERT
value: "/app/tls/tls.crt"
- name: HELIOSDB_TLS_KEY
value: "/app/tls/tls.key"

7.2 Network Policies

Create k8s/network-policy.yaml:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: heliosdb-streaming-network-policy
namespace: heliosdb
spec:
podSelector:
matchLabels:
app: heliosdb-streaming
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
- podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 9090
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 9092 # Kafka
- protocol: TCP
port: 6379 # Redis
- protocol: TCP
port: 443 # HTTPS (for KMS)

7.3 Pod Security Policy

Create k8s/pod-security-policy.yaml:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: heliosdb-streaming-psp
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: false

8. Capacity Planning

8.1 Sizing Guidelines

Small Deployment (< 10K events/sec):

  • Nodes: 2-4
  • CPU: 4 cores per node
  • RAM: 8 GB per node
  • Storage: 100 GB per node
  • Cost: $500-1000/month

Medium Deployment (10K-100K events/sec):

  • Nodes: 4-8
  • CPU: 8 cores per node
  • RAM: 16 GB per node
  • Storage: 200 GB per node
  • Cost: $2000-4000/month

Large Deployment (100K-1M events/sec):

  • Nodes: 8-32
  • CPU: 16 cores per node
  • RAM: 32 GB per node
  • Storage: 500 GB per node
  • Cost: $8000-20000/month

8.2 Resource Formulas

Memory Calculation:

Required Memory = Base Memory + (Buffer Size × Number of Streams) + State Size
- Base Memory: ~2 GB
- Buffer Size: 100 MB per stream (configurable)
- State Size: Depends on window size and event rate

Storage Calculation:

Required Storage = Checkpoint Size × Retention Count + Log Size
- Checkpoint Size: ~10-20% of memory
- Retention Count: 3-10 (configurable)
- Log Size: ~1 GB per day (with log rotation)

CPU Calculation:

Required CPU = Base CPU + (Events/sec × CPU per Event)
- Base CPU: 1 core
- CPU per Event: ~0.1ms (0.0001 core-seconds)
- Example: 10K events/sec = 1 + (10000 × 0.0001) = 2 cores minimum

9. Troubleshooting

9.1 Common Issues

Issue 1: High Latency (p99 > 10ms)

Symptoms:

  • Slow event processing
  • Increasing backlog

Diagnosis:

Terminal window
# Check metrics
kubectl exec -it heliosdb-streaming-0 -n heliosdb -- \
curl localhost:9090/metrics | grep latency
# Check CPU/memory
kubectl top pod heliosdb-streaming-0 -n heliosdb

Solutions:

  • Scale horizontally (add more pods)
  • Increase CPU/memory resources
  • Optimize window sizes
  • Enable compression for checkpoints

Issue 2: Checkpoint Failures

Symptoms:

  • heliosdb_checkpoint_failures_total > 0
  • Errors in logs

Diagnosis:

Terminal window
# Check logs
kubectl logs heliosdb-streaming-0 -n heliosdb | grep checkpoint
# Check disk space
kubectl exec -it heliosdb-streaming-0 -n heliosdb -- df -h

Solutions:

  • Increase checkpoint timeout
  • Check disk space (increase PVC size)
  • Verify KMS access (check AWS/Azure credentials)
  • Check state backend configuration

Issue 3: Memory Leaks

Symptoms:

  • Memory usage continuously increasing
  • OOMKilled pods

Diagnosis:

Terminal window
# Check memory metrics
kubectl exec -it heliosdb-streaming-0 -n heliosdb -- \
curl localhost:9090/metrics | grep memory
# Check pod restarts
kubectl get pods -n heliosdb

Solutions:

  • Reduce buffer sizes
  • Enable aggressive garbage collection
  • Check for event accumulation
  • Increase memory limits

Issue 4: Pod Not Starting

Symptoms:

  • Pod stuck in Pending or CrashLoopBackOff

Diagnosis:

Terminal window
# Check pod status
kubectl describe pod heliosdb-streaming-0 -n heliosdb
# Check events
kubectl get events -n heliosdb --sort-by='.lastTimestamp'

Solutions:

  • Check resource requests (CPU/memory)
  • Verify PVC is bound
  • Check secrets/configmaps exist
  • Verify image pull secrets (if using private registry)

9.2 Debug Commands

Terminal window
# Get pod shell
kubectl exec -it heliosdb-streaming-0 -n heliosdb -- /bin/bash
# Check running processes
kubectl exec heliosdb-streaming-0 -n heliosdb -- ps aux
# Check network connectivity
kubectl exec heliosdb-streaming-0 -n heliosdb -- curl kafka:9092
# Check file permissions
kubectl exec heliosdb-streaming-0 -n heliosdb -- ls -la /data
# Run cargo tests
kubectl exec heliosdb-streaming-0 -n heliosdb -- cargo test
# Check Rust binary
kubectl exec heliosdb-streaming-0 -n heliosdb -- /app/heliosdb-streaming --version

📞 Support

Documentation: https://docs.heliosdb.com GitHub Issues: https://github.com/danimoya/HeliosDB/issues Email: support@heliosdb.com Slack: heliosdb.slack.com


Document Version: 1.0 Last Updated: October 29, 2025 Status: Production Deployment Guide Next Review: January 2026