HeliosDB Streaming - Architecture Diagrams

System Overview

This document provides visual representations of the HeliosDB Streaming Analytics architecture, deployment topologies, and operational flows.

1. High-Level System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          HeliosDB Streaming Platform                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  ┌──────────────────────┐         ┌──────────────────────┐                 │
│  │   Data Sources       │         │   Stream Ingestion   │                 │
│  │                      │         │                      │                 │
│  │  • Kafka Topics      │────────▶│  • Source Connectors │                 │
│  │  • Apache Pulsar     │         │  • Kafka Consumer    │                 │
│  │  • Webhooks          │         │  • Pulsar Consumer   │                 │
│  │  • File Streams      │         │  • HTTP Endpoints    │                 │
│  │  • Database CDC      │         │  • Backpressure Ctrl │                 │
│  └──────────────────────┘         └──────────┬───────────┘                 │
│                                               │                              │
│                                               ▼                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    Stream Processing Engine                          │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │   │
│  │  │   Windows    │  │  Continuous  │  │    Joins     │             │   │
│  │  │              │  │   Queries    │  │              │             │   │
│  │  │ • Tumbling   │  │ • Real-time  │  │ • Stream-    │             │   │
│  │  │ • Sliding    │  │ • Aggregates │  │   Stream     │             │   │
│  │  │ • Session    │  │ • Transforms │  │ • Stream-    │             │   │
│  │  │ • Event-time │  │ • Filters    │  │   Table      │             │   │
│  │  └──────────────┘  └──────────────┘  └──────────────┘             │   │
│  │                                                                      │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │   │
│  │  │ Pattern      │  │  SQL Layer   │  │   State      │             │   │
│  │  │ Matching     │  │              │  │ Management   │             │   │
│  │  │              │  │ • SELECT/    │  │              │             │   │
│  │  │ • CEP/NFA    │  │   WHERE/     │  │ • Checkpoints│             │   │
│  │  │ • Sequences  │  │   GROUP BY   │  │ • Savepoints │             │   │
│  │  │ • MATCH_     │  │ • Window     │  │ • Recovery   │             │   │
│  │  │   RECOGNIZE  │  │   Functions  │  │ • Encryption │             │   │
│  │  └──────────────┘  └──────────────┘  └──────────────┘             │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                               │                              │
│                                               ▼                              │
│  ┌──────────────────────┐         ┌──────────────────────┐                 │
│  │   Result Delivery    │         │   Data Sinks         │                 │
│  │                      │         │                      │                 │
│  │  • Sink Connectors   │────────▶│  • Kafka Topics      │                 │
│  │  • Kafka Producer    │         │  • Databases         │                 │
│  │  • Database Writers  │         │  • Analytics DBs     │                 │
│  │  • Webhooks          │         │  • Dashboards        │                 │
│  │  • 2PC Transactions  │         │  • Alerting Systems  │                 │
│  └──────────────────────┘         └──────────────────────┘                 │
│                                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

         ┌─────────────────────────────────────────────────────────────┐
         │              Management & Monitoring Layer                   │
         │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
         │  │   Job    │  │ Resource │  │  Auth &  │  │ Metrics  │   │
         │  │Management│  │Management│  │   Rate   │  │   API    │   │
         │  │   API    │  │   Pool   │  │  Limit   │  │(Prom.)   │   │
         │  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │
         └─────────────────────────────────────────────────────────────┘

2. Data Flow Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Event Processing Pipeline                           │
└─────────────────────────────────────────────────────────────────────────────┘

Input Event         Watermark           Window              Aggregate
   Stream          Generation         Assignment          Computation
     │                 │                  │                    │
     ▼                 ▼                  ▼                    ▼
┌─────────┐      ┌──────────┐      ┌──────────┐       ┌──────────┐
│ {event, │      │Watermark │      │Assign to │       │ Window   │
│  time}  │─────▶│Generator │─────▶│Window(s) │──────▶│Aggregator│──┐
└─────────┘      │          │      │          │       │          │  │
                 │• Periodic│      │• Tumbling│       │• Count   │  │
                 │• Punctuated      │• Sliding │       │• Sum     │  │
                 │• Aligned │      │• Session │       │• Avg     │  │
                 └──────────┘      └──────────┘       │• Custom  │  │
                                                       └──────────┘  │
                                                                      │
                    ┌─────────────────────────────────────────────────┘
                    │
                    ▼
              ┌──────────┐        ┌──────────┐        ┌──────────┐
              │ Window   │        │  State   │        │  Output  │
              │ Trigger  │───────▶│ Snapshot │───────▶│  Event   │
              └──────────┘        └──────────┘        └──────────┘
                                       │
                                       ▼
                                  ┌──────────┐
                                  │Checkpoint│
                                  │ Storage  │
                                  │(Encrypted)
                                  └──────────┘

3. Security Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Security Layers                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  External Request                                                             │
│       │                                                                        │
│       ▼                                                                        │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │  Layer 1: Rate Limiting (DDoS Protection)                           │     │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │     │
│  │  │   IP-based   │  │  User-based  │  │    Global    │             │     │
│  │  │ 100 req/min  │  │ 500 req/min  │  │10K req/min   │             │     │
│  │  │  +10 burst   │  │  +50 burst   │  │   (strict)   │             │     │
│  │  └──────────────┘  └──────────────┘  └──────────────┘             │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│       │                                                                        │
│       ▼                                                                        │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │  Layer 2: Authentication (JWT Tokens)                               │     │
│  │  ┌──────────────────────────────────────────────────────────┐     │     │
│  │  │  Token Validation                                          │     │     │
│  │  │  • HS256 signature verification                            │     │     │
│  │  │  • Expiration check (24h default)                          │     │     │
│  │  │  • User existence validation                               │     │     │
│  │  │  • Enabled status check                                    │     │     │
│  │  └──────────────────────────────────────────────────────────┘     │     │
│  │  ┌──────────────────────────────────────────────────────────┐     │     │
│  │  │  Password Security                                         │     │     │
│  │  │  • bcrypt hashing (cost=12)                                │     │     │
│  │  │  • Salt per password                                       │     │     │
│  │  │  • No plaintext storage                                    │     │     │
│  │  └──────────────────────────────────────────────────────────┘     │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│       │                                                                        │
│       ▼                                                                        │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │  Layer 3: Authorization (RBAC)                                      │     │
│  │  ┌────────────┐   ┌────────────┐   ┌────────────┐                 │     │
│  │  │   Admin    │   │  Operator  │   │   Viewer   │                 │     │
│  │  │            │   │            │   │            │                 │     │
│  │  │ All perms  │   │ • Read     │   │ • Read     │                 │     │
│  │  │            │   │ • Execute  │   │   only     │                 │     │
│  │  │            │   │ • Cancel   │   │            │                 │     │
│  │  └────────────┘   └────────────┘   └────────────┘                 │     │
│  │                                                                      │     │
│  │  Permissions: Read, Write, Execute, Delete, Admin, Cancel, Manage  │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│       │                                                                        │
│       ▼                                                                        │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │  Layer 4: Data Encryption                                           │     │
│  │  ┌──────────────────────────────────────────────────────────┐     │     │
│  │  │  At Rest (State/Checkpoints)                              │     │     │
│  │  │  • AES-256-GCM encryption                                 │     │     │
│  │  │  • Multi-cloud KMS support (AWS/Azure/GCP)                │     │     │
│  │  │  • Automatic key rotation (30/60/90 days)                 │     │     │
│  │  │  • Per-checkpoint encryption keys                          │     │     │
│  │  └──────────────────────────────────────────────────────────┘     │     │
│  │  ┌──────────────────────────────────────────────────────────┐     │     │
│  │  │  In Transit                                                │     │     │
│  │  │  • TLS 1.3 for all network communication                   │     │     │
│  │  │  • Certificate-based mutual auth                           │     │     │
│  │  └──────────────────────────────────────────────────────────┘     │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│       │                                                                        │
│       ▼                                                                        │
│  ┌────────────────────────────────────────────────────────────────────┐     │
│  │  Layer 5: Audit Logging                                             │     │
│  │  • All API requests logged with user/IP/timestamp                   │     │
│  │  • Job lifecycle events tracked                                     │     │
│  │  • Permission checks recorded                                       │     │
│  │  • Security events (auth failures) alerted                          │     │
│  └────────────────────────────────────────────────────────────────────┘     │
│       │                                                                        │
│       ▼                                                                        │
│  Authorized & Secure Processing                                               │
│                                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

4. State Management Lifecycle

┌─────────────────────────────────────────────────────────────────────────────┐
│                     State Management & Fault Tolerance                       │
└─────────────────────────────────────────────────────────────────────────────┘

Normal Operation                  Checkpoint                    Recovery
      │                               │                             │
      ▼                               ▼                             ▼
┌──────────┐                   ┌──────────┐                  ┌──────────┐
│ Operator │                   │Checkpoint│                  │  Detect  │
│Processing│                   │ Trigger  │                  │  Failure │
│          │                   │          │                  │          │
│• Window  │                   │• Aligned │                  │• Timeout │
│• Agg     │                   │• Barrier │                  │• Error   │
│• Join    │                   │• Async   │                  │• Restart │
└────┬─────┘                   └────┬─────┘                  └────┬─────┘
     │                              │                              │
     │    Periodic                  │                              │
     │    (e.g., 60s)              │                              │
     │                              │                              │
     ▼                              ▼                              ▼
┌──────────┐                   ┌──────────┐                  ┌──────────┐
│  State   │                   │ Snapshot │                  │  Load    │
│  Update  │                   │  State   │                  │ Latest   │
│          │                   │          │                  │Checkpoint│
│• In-Mem  │──────────────────▶│• Barrier │                  │          │
│• Pending │                   │  Align   │◀─────────────────│• ID/Path │
│• Buffer  │                   │• Collect │                  │• Decrypt │
└──────────┘                   └────┬─────┘                  └────┬─────┘
                                    │                              │
                                    ▼                              ▼
                               ┌──────────┐                  ┌──────────┐
                               │ Encrypt  │                  │ Restore  │
                               │          │                  │  State   │
                               │• AES-GCM │                  │          │
                               │• KMS Key │                  │• Replay  │
                               │• Per-CP  │                  │• Resume  │
                               └────┬─────┘                  └────┬─────┘
                                    │                              │
                                    ▼                              ▼
                               ┌──────────┐                  ┌──────────┐
                               │  Write   │                  │ Continue │
                               │ Backend  │                  │Processing│
                               │          │                  │          │
                               │• Memory  │                  │• From CP │
                               │• File    │                  │• No Loss │
                               │• S3/HDFS │                  │          │
                               └────┬─────┘                  └──────────┘
                                    │
                                    ▼
                               ┌──────────┐
                               │ Confirm  │
                               │          │
                               │• Success │
                               │• Metrics │
                               └──────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│  Savepoint (User-Triggered)                                                  │
│  • Same as checkpoint but initiated manually                                 │
│  • Used for version upgrades, config changes, cluster migration              │
│  • Retained indefinitely (until explicitly deleted)                          │
│  • Can be used to restore to a specific logical point                        │
└─────────────────────────────────────────────────────────────────────────────┘

5. Deployment Topology Options

5.1 Single-Node Development

┌────────────────────────────────┐
│    Development Machine         │
│                                │
│  ┌──────────────────────────┐ │
│  │   HeliosDB Streaming     │ │
│  │                          │ │
│  │  • All-in-one process    │ │
│  │  • In-memory state       │ │
│  │  • Local Kafka/Pulsar    │ │
│  │  • SQLite/File backend   │ │
│  └──────────────────────────┘ │
│                                │
│  Ideal for:                    │
│  • Local development           │
│  • Testing                     │
│  • Prototyping                 │
└────────────────────────────────┘

5.2 Small Production Cluster (3-5 nodes)

┌─────────────────────────────────────────────────────────────────────────┐
│                         Load Balancer                                    │
│                    (HAProxy / Nginx / ALB)                               │
└────────────────────────────┬────────────────────────────────────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
    │   Node 1    │  │   Node 2    │  │   Node 3    │
    │             │  │             │  │             │
    │ • Stream    │  │ • Stream    │  │ • Stream    │
    │   Engine    │  │   Engine    │  │   Engine    │
    │ • Job Mgr   │  │ • Job Mgr   │  │ • Job Mgr   │
    │ • Resource  │  │ • Resource  │  │ • Resource  │
    │   Pool      │  │   Pool      │  │   Pool      │
    └─────────────┘  └─────────────┘  └─────────────┘
           │                │                │
           └────────────────┼────────────────┘
                           │
                           ▼
           ┌───────────────────────────────────┐
           │     Shared State Storage          │
           │  • S3 / HDFS / NFS                │
           │  • Encrypted checkpoints          │
           │  • Coordinated via consensus      │
           └───────────────────────────────────┘
                           │
                           ▼
           ┌───────────────────────────────────┐
           │       External Services           │
           │  • Kafka / Pulsar                 │
           │  • PostgreSQL / TimescaleDB       │
           │  • Prometheus / Grafana           │
           │  • KMS (AWS/Azure/GCP)            │
           └───────────────────────────────────┘

Ideal for:
• Production workloads (up to 100K events/sec)
• High availability (N+1 redundancy)
• Regional deployments

5.3 Large Enterprise Cluster (10+ nodes)

┌─────────────────────────────────────────────────────────────────────────┐
│                    Global Load Balancer (Route53 / Traffic Manager)     │
└────────────────────────────┬────────────────────────────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
    ┌──────────────────┐          ┌──────────────────┐
    │   Region US-EAST │          │   Region EU-WEST │
    └──────────────────┘          └──────────────────┘
              │                             │
    ┌─────────┴─────────┐         ┌────────┴────────┐
    │                   │         │                 │
    ▼                   ▼         ▼                 ▼
┌─────────┐       ┌─────────┐ ┌─────────┐    ┌─────────┐
│ Zone A  │       │ Zone B  │ │ Zone A  │    │ Zone B  │
│ (3 nodes)       │ (3 nodes)│ (3 nodes)     │ (3 nodes)
└─────────┘       └─────────┘ └─────────┘    └─────────┘

Per-Region Architecture:
┌───────────────────────────────────────────────────────────────────┐
│  Job Coordinator Cluster (HA)                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                        │
│  │ Leader   │  │ Follower │  │ Follower │                        │
│  │ (Active) │  │(Standby) │  │(Standby) │                        │
│  └──────────┘  └──────────┘  └──────────┘                        │
└────────────────────────┬──────────────────────────────────────────┘
                         │
        ┌────────────────┼────────────────┬───────────────┐
        │                │                │               │
        ▼                ▼                ▼               ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Worker Pool │ │  Worker Pool │ │  Worker Pool │ │  Worker Pool │
│  (3 nodes)   │ │  (3 nodes)   │ │  (3 nodes)   │ │  (3 nodes)   │
│              │ │              │ │              │ │              │
│ • Ingestion  │ │ • Processing │ │ • Analytics  │ │ • Output     │
│ • Validation │ │ • Transforms │ │ • Aggregates │ │ • Sinks      │
│ • Routing    │ │ • Enrichment │ │ • Windows    │ │ • Webhooks   │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘

Shared Infrastructure:
┌───────────────────────────────────────────────────────────────────┐
│  Distributed State Backend                                        │
│  • S3 (multi-region replication)                                  │
│  • Encrypted checkpoints (KMS per region)                         │
│  • Cross-region disaster recovery                                 │
└───────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────┐
│  Monitoring & Observability                                       │
│  • Prometheus (HA pair per region)                                │
│  • Grafana (global view)                                          │
│  • Distributed tracing (Jaeger)                                   │
│  • Centralized logging (ELK/CloudWatch)                           │
└───────────────────────────────────────────────────────────────────┘

Ideal for:
• Mission-critical workloads (1M+ events/sec)
• Multi-region deployments
• Geographic data sovereignty
• 99.99% availability SLA

6. Component Interaction Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                      API Request Flow                                        │
└─────────────────────────────────────────────────────────────────────────────┘

Client                REST API            Job Manager         Stream Engine
  │                      │                      │                    │
  │  POST /jobs          │                      │                    │
  │─────────────────────▶│                      │                    │
  │                      │                      │                    │
  │                      │  Rate Limit Check    │                    │
  │                      │──┐                   │                    │
  │                      │  │ (IP/User/Global)  │                    │
  │                      │◀─┘                   │                    │
  │                      │                      │                    │
  │                      │  JWT Validation      │                    │
  │                      │──┐                   │                    │
  │                      │  │ (Signature/Exp)   │                    │
  │                      │◀─┘                   │                    │
  │                      │                      │                    │
  │                      │  RBAC Check          │                    │
  │                      │──┐                   │                    │
  │                      │  │ (Permission)      │                    │
  │                      │◀─┘                   │                    │
  │                      │                      │                    │
  │                      │  Create Job          │                    │
  │                      │─────────────────────▶│                    │
  │                      │                      │                    │
  │                      │                      │  Allocate Resources│
  │                      │                      │──┐                 │
  │                      │                      │  │ (CPU/Memory)    │
  │                      │                      │◀─┘                 │
  │                      │                      │                    │
  │                      │                      │  Initialize Stream │
  │                      │                      │───────────────────▶│
  │                      │                      │                    │
  │                      │                      │                    │  Setup
  │                      │                      │                    │──┐
  │                      │                      │                    │  │ Sources
  │                      │                      │                    │  │ Windows
  │                      │                      │                    │  │ State
  │                      │                      │                    │◀─┘
  │                      │                      │                    │
  │                      │                      │  Job Started       │
  │                      │                      │◀───────────────────│
  │                      │                      │                    │
  │                      │  Job ID + Status     │                    │
  │                      │◀─────────────────────│                    │
  │                      │                      │                    │
  │  200 OK {jobId}      │                      │                    │
  │◀─────────────────────│                      │                    │
  │                      │                      │                    │
  │                      │                      │  Processing...     │
  │                      │                      │                    │──┐
  │                      │                      │                    │  │ Events
  │                      │                      │                    │  │ Windows
  │                      │                      │                    │  │ Aggregates
  │                      │                      │                    │◀─┘
  │                      │                      │                    │
  │  WebSocket /metrics  │                      │                    │
  │─────────────────────▶│                      │                    │
  │                      │  Subscribe Metrics   │                    │
  │                      │─────────────────────▶│                    │
  │                      │                      │                    │
  │                      │  Metrics Stream      │                    │
  │                      │◀─────────────────────│                    │
  │  {throughput,        │                      │                    │
  │   latency, ...}      │                      │                    │
  │◀─────────────────────│                      │                    │
  │                      │                      │                    │

7. Kubernetes Deployment Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Kubernetes Cluster                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  Namespace: heliosdb-streaming                                               │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  Ingress (nginx-ingress / ALB)                                      │    │
│  │  • TLS termination                                                  │    │
│  │  • Path-based routing (/api/*, /metrics, /health)                  │    │
│  │  • Rate limiting (nginx plugin)                                     │    │
│  └──────────────────────────┬─────────────────────────────────────────┘    │
│                             │                                                │
│         ┌───────────────────┼───────────────────┐                           │
│         │                   │                   │                           │
│         ▼                   ▼                   ▼                           │
│  ┌────────────┐      ┌────────────┐      ┌────────────┐                   │
│  │  Service   │      │  Service   │      │  Service   │                   │
│  │   (API)    │      │ (Metrics)  │      │  (Health)  │                   │
│  │            │      │            │      │            │                   │
│  │ Type: LB   │      │ Type: LB   │      │Type: NodeP.│                   │
│  │ Port: 8080 │      │ Port: 9090 │      │ Port: 8081 │                   │
│  └─────┬──────┘      └─────┬──────┘      └─────┬──────┘                   │
│        │                   │                   │                           │
│        └───────────────────┼───────────────────┘                           │
│                           │                                                │
│  ┌────────────────────────┴────────────────────────────────────────────┐  │
│  │  StatefulSet: heliosdb-streaming-workers                            │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │  │
│  │  │   Pod 0      │  │   Pod 1      │  │   Pod 2      │              │  │
│  │  │              │  │              │  │              │              │  │
│  │  │ Container:   │  │ Container:   │  │ Container:   │              │  │
│  │  │ heliosdb     │  │ heliosdb     │  │ heliosdb     │              │  │
│  │  │              │  │              │  │              │              │  │
│  │  │ Resources:   │  │ Resources:   │  │ Resources:   │              │  │
│  │  │ CPU: 2       │  │ CPU: 2       │  │ CPU: 2       │              │  │
│  │  │ Mem: 4Gi     │  │ Mem: 4Gi     │  │ Mem: 4Gi     │              │  │
│  │  │              │  │              │  │              │              │  │
│  │  │ Volume:      │  │ Volume:      │  │ Volume:      │              │  │
│  │  │ PVC-0 (10Gi) │  │ PVC-1 (10Gi) │  │ PVC-2 (10Gi) │              │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘              │  │
│  │                                                                       │  │
│  │  • Replicas: 3 (auto-scale 3-10 based on CPU/custom metrics)        │  │
│  │  • Anti-affinity: spread across availability zones                   │  │
│  │  • Liveness probe: /health (every 30s)                               │  │
│  │  • Readiness probe: /ready (every 10s)                               │  │
│  │  • PodDisruptionBudget: minAvailable=2                               │  │
│  └───────────────────────────────────────────────────────────────────────  │
│                                                                               │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  ConfigMap: heliosdb-config                                         │    │
│  │  • RUST_LOG=info                                                    │    │
│  │  • CHECKPOINT_INTERVAL=60                                           │    │
│  │  • STATE_BACKEND=s3://heliosdb-state/checkpoints                    │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  Secret: heliosdb-secrets                                           │    │
│  │  • JWT_SECRET (base64)                                              │    │
│  │  • AWS_ACCESS_KEY_ID                                                │    │
│  │  • AWS_SECRET_ACCESS_KEY                                            │    │
│  │  • KAFKA_PASSWORD                                                   │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  PersistentVolumeClaim (per pod)                                    │    │
│  │  • StorageClass: gp3 (AWS EBS) / premium-ssd (Azure)                │    │
│  │  • Size: 10Gi per pod                                               │    │
│  │  • Access: ReadWriteOnce                                            │    │
│  │  • Reclaim: Retain (for checkpoints)                                │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  HorizontalPodAutoscaler                                            │    │
│  │  • Target: heliosdb-streaming-workers                               │    │
│  │  • Min replicas: 3                                                  │    │
│  │  • Max replicas: 10                                                 │    │
│  │  • Metrics:                                                         │    │
│  │    - CPU utilization > 70%                                          │    │
│  │    - Custom: events_per_second > 50000                              │    │
│  │    - Custom: backpressure_ratio > 0.8                               │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

External Dependencies:
┌─────────────────────────────────────────────────────────────────────────────┐
│  • Kafka / Pulsar (managed service or separate cluster)                     │
│  • S3 / Azure Blob / GCS (checkpoint storage)                               │
│  • AWS KMS / Azure Key Vault / GCP KMS (encryption keys)                    │
│  • Prometheus (metrics collection)                                          │
│  • Grafana (visualization)                                                  │
└─────────────────────────────────────────────────────────────────────────────┘

8. Monitoring & Observability Stack

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Monitoring Architecture                                   │
└─────────────────────────────────────────────────────────────────────────────┘

HeliosDB Streaming Cluster
     │
     │ /metrics (Prometheus format)
     │
     ▼
┌────────────────────────────────┐
│  Prometheus Server (HA)        │
│  • Scrape interval: 15s        │
│  • Retention: 15 days          │
│  • Alert evaluation: 1m        │
│                                │
│  Key Metrics:                  │
│  • events_processed_total      │
│  • window_trigger_duration     │
│  • checkpoint_duration         │
│  • backpressure_ratio          │
│  • job_status                  │
│  • rate_limit_hits             │
└────────┬───────────────────────┘
         │
         ├────────────────┬────────────────┐
         │                │                │
         ▼                ▼                ▼
┌────────────────┐ ┌──────────────┐ ┌───────────────┐
│   Grafana      │ │  AlertManager│ │   Thanos      │
│                │ │              │ │  (Long-term)  │
│ • Dashboards   │ │ • Routing    │ │               │
│ • Variables    │ │ • Grouping   │ │ • 90d retention│
│ • Alerts       │ │ • Silencing  │ │ • Downsampling│
│                │ │              │ │ • S3 storage  │
│ Pre-built:     │ │ Channels:    │ └───────────────┘
│ • Overview     │ │ • PagerDuty  │
│ • Job Status   │ │ • Slack      │
│ • Performance  │ │ • Email      │
│ • Security     │ │ • Webhooks   │
└────────────────┘ └──────┬───────┘
                          │
                          │ Alert: Job Failed
                          │ Alert: High Latency
                          │ Alert: Rate Limit Exceeded
                          │
                          ▼
                   ┌──────────────┐
                   │  On-Call     │
                   │  Engineers   │
                   └──────────────┘

Distributed Tracing:
┌────────────────────────────────┐
│  Jaeger / Zipkin               │
│  • Request traces              │
│  • Span relationships          │
│  • Latency breakdown           │
│  • Error tracking              │
└────────────────────────────────┘

Log Aggregation:
┌────────────────────────────────┐
│  ELK / Loki / CloudWatch       │
│  • Structured JSON logs        │
│  • Full-text search            │
│  • Log correlation with traces │
│  • Retention: 30 days          │
└────────────────────────────────┘

9. Technology Stack

┌─────────────────────────────────────────────────────────────────────────────┐
│                         HeliosDB Streaming Stack                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  Core Language: Rust (2021 Edition)                                          │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • Memory safety without garbage collection                         │    │
│  │  • Zero-cost abstractions for high performance                      │    │
│  │  • Concurrency without data races (async/await)                     │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  Async Runtime: Tokio                                                        │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • Multi-threaded work-stealing scheduler                           │    │
│  │  • Efficient I/O with epoll/kqueue                                  │    │
│  │  • Async channels for message passing                               │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  Web Framework: Axum + Tower                                                 │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • Type-safe HTTP routing                                           │    │
│  │  • Middleware composition (auth, rate limiting, tracing)            │    │
│  │  • WebSocket support for real-time metrics                          │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  Data Processing: Arrow + DataFusion                                         │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • Columnar data format (Apache Arrow)                              │    │
│  │  • SQL query engine (DataFusion)                                    │    │
│  │  • Vectorized execution for SIMD                                    │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  Message Brokers: Kafka + Pulsar                                             │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • rdkafka client (librdkafka bindings)                             │    │
│  │  • Exactly-once semantics                                           │    │
│  │  • Consumer groups for load balancing                               │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  Security:                                                                   │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • AES-256-GCM (aes-gcm crate)                                      │    │
│  │  • bcrypt for password hashing                                      │    │
│  │  • JWT (jsonwebtoken crate)                                         │    │
│  │  • AWS/Azure/GCP KMS SDKs                                           │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  State Management:                                                           │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • DashMap for concurrent hash maps                                 │    │
│  │  • Parking lot for efficient locks                                  │    │
│  │  • Bincode for binary serialization                                 │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
│  Observability:                                                              │
│  ┌────────────────────────────────────────────────────────────────────┐    │
│  │  • Prometheus client (metrics export)                               │    │
│  │  • Tracing (structured logging + distributed tracing)               │    │
│  │  • Tracing-subscriber (log formatting)                              │    │
│  └────────────────────────────────────────────────────────────────────┘    │
│                                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

Summary

This architecture document provides comprehensive visual representations of:

System Architecture - High-level component overview
Data Flow - Event processing pipeline with watermarks
Security - Multi-layered defense in depth approach
State Management - Checkpoint lifecycle and fault tolerance
Deployment Options - From dev to enterprise-scale
API Interactions - Request flow through security layers
Kubernetes Setup - Production-ready container orchestration
Monitoring Stack - Observability and alerting infrastructure
Technology Stack - Modern Rust ecosystem choices

These diagrams support the Series A fundraising materials and provide technical decision-makers with clear understanding of the platform’s architecture, scalability, and production readiness.

Key Architectural Advantages:

Modular Design: Each component can scale independently
Defense in Depth: Multiple security layers (rate limiting → auth → RBAC → encryption)
Fault Tolerance: Automatic recovery from failures with zero data loss
Cloud Native: Kubernetes-ready with multi-cloud KMS support
Observable: Comprehensive metrics, logging, and tracing
Performant: Rust + Arrow + async I/O = high throughput, low latency