SaaS Multi-Tenant Case Study

IndustryB2B SaaS / Project Management

SolutionHeliosDB-Lite (database branching)

Cost Reduction85% ($18.4K to $2.7K)

Provisioning0.5s (was 30s)

The Challenge

A B2B SaaS company provides a project management platform used by over 200 companies ranging from 10-person startups to 2,000-person enterprises. Each tenant has between 500 and 150,000 records across projects, tasks, time entries, documents, and team configurations. The platform had been running on AWS RDS PostgreSQL since its founding, starting with a shared-database model where all tenants lived in a single PostgreSQL instance with a tenant_id column on every table.

As the company grew past 100 tenants, the shared-database model started breaking down. Large tenants' queries caused lock contention that affected smaller tenants. A single tenant accidentally running an unfiltered SELECT * on the time entries table would saturate the connection pool. Row-Level Security policies added complexity but not true isolation -- a bug in any query could theoretically expose cross-tenant data. After a near-miss where a reporting query briefly returned rows belonging to another tenant (caught in QA, never reached production), the CTO mandated a move to database-per-tenant isolation.

The database-per-tenant migration solved the isolation problem but created a cost problem. Each tenant got their own RDS instance (db.t3.medium minimum for production SLA), and the monthly bill went from $2,800 for a single large RDS instance to $18,400 for 200+ individual instances. Worse, provisioning a new tenant took 30 seconds to spin up an RDS instance, run migrations, and seed default data -- an eternity when a prospect was signing up for a free trial. Schema migrations became a nightmare: rolling out a single ALTER TABLE required orchestrating changes across 200+ independent databases, and any failure left the fleet in an inconsistent state.

Why AWS RDS Database-Per-Tenant Wasn't Enough

Problem	Impact
Linear cost scaling	Each tenant = one RDS instance ($72-$180/month minimum)
$18.4K/month for 200 tenants	Projected $50K+ at 500 tenants; unsustainable unit economics
30-second tenant provisioning	Free trial signup conversion dropped 15% due to loading time
Schema migration across 200+ DBs	4-6 hours of orchestrated rolling updates per migration
Migration failure rate	~3% of instances failed per migration, requiring manual intervention
Idle resource waste	60% of tenant instances averaged <5% CPU utilization
Backup management	200+ individual backup schedules and retention policies
Connection pool exhaustion	Each instance maintained a warm pool; 200 pools = 2,000+ idle connections

The HeliosDB Solution

The team replaced their fleet of 200+ RDS instances with a single HeliosDB-Lite deployment using database branching for tenant isolation. Each tenant gets their own branch -- a lightweight, copy-on-write fork of a template database. Branches share unchanged data pages, so 200 tenants with similar schemas use a fraction of the storage that 200 independent databases would require.

Architecture Overview

The deployment uses a single HeliosDB-Lite server with a template branch containing the base schema and default data. When a new tenant signs up, the platform creates a branch from the template in under 500 milliseconds. Schema migrations are applied to the template branch first, then each tenant branch is rebased -- or, for breaking changes, a new branch is created from the updated template and data is migrated.

Tenant Provisioning via Branching

-- The template branch contains the base schema and seed data.
-- It is maintained as the "golden master" for all tenants.

-- Step 1: Create the template (done once, updated with each migration)
CREATE BRANCH template_v47;

-- Switch to the template branch to set up the schema
USE BRANCH template_v47;

CREATE TABLE projects (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    description TEXT,
    status VARCHAR(20) DEFAULT 'active',
    owner_id BIGINT NOT NULL,
    settings JSONB DEFAULT '{}',
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    project_id BIGINT NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    title VARCHAR(500) NOT NULL,
    description TEXT,
    status VARCHAR(20) DEFAULT 'todo',
    priority INT DEFAULT 0,
    assignee_id BIGINT,
    due_date DATE,
    estimated_hours DECIMAL(6, 2),
    tags JSONB DEFAULT '[]',
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE time_entries (
    id BIGSERIAL PRIMARY KEY,
    task_id BIGINT NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
    user_id BIGINT NOT NULL,
    started_at TIMESTAMP NOT NULL,
    ended_at TIMESTAMP,
    duration_minutes INT,
    notes TEXT,
    billable BOOLEAN DEFAULT TRUE
);

CREATE TABLE team_members (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    display_name VARCHAR(200) NOT NULL,
    role VARCHAR(50) DEFAULT 'member',
    settings JSONB DEFAULT '{}',
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_tasks_project ON tasks (project_id, status);
CREATE INDEX idx_tasks_assignee ON tasks (assignee_id) WHERE status != 'done';
CREATE INDEX idx_time_project ON time_entries (task_id, started_at DESC);

-- Seed default data (project templates, default settings, etc.)
INSERT INTO projects (name, description, status, owner_id, settings)
VALUES ('Getting Started', 'Your first project', 'active', 0,
        '{"template": true, "tutorial": true}');

-- Step 2: Create a tenant branch (takes <500ms)
-- This is a copy-on-write fork -- no data is physically copied.
CREATE BRANCH tenant_acme FROM template_v47;

-- The tenant immediately has the full schema + seed data.
-- Writes go to the tenant's branch; reads fall through to shared pages.

Tenant Management Service (TypeScript)

import { Pool, PoolClient } from "pg";

interface TenantInfo {
  tenantId: string;
  companyName: string;
  plan: "free" | "pro" | "enterprise";
  branchName: string;
  createdAt: Date;
}

interface ProvisioningResult {
  tenantId: string;
  branchName: string;
  provisioningMs: number;
  status: "ready" | "error";
}

class TenantManager {
  private adminPool: Pool;
  private tenantPools: Map<string, Pool> = new Map();

  constructor(heliosDsn: string) {
    // Admin pool for branch management operations
    this.adminPool = new Pool({
      connectionString: heliosDsn,
      max: 10,
    });
  }

  /**
   * Provision a new tenant by creating a branch from the template.
   *
   * Previous architecture (AWS RDS):
   *   1. CreateDBInstance API call (15-25s)
   *   2. Wait for instance available (5-10s)
   *   3. Run schema migrations (2-3s)
   *   4. Seed default data (1-2s)
   *   Total: ~30 seconds
   *
   * HeliosDB branching:
   *   1. CREATE BRANCH (copy-on-write fork) (<500ms)
   *   Total: <500ms
   */
  async provisionTenant(
    tenantId: string,
    companyName: string,
    plan: "free" | "pro" | "enterprise"
  ): Promise<ProvisioningResult> {
    const start = Date.now();
    const branchName = `tenant_${tenantId}`;

    const client = await this.adminPool.connect();
    try {
      // Create a copy-on-write branch from the template.
      // This is nearly instantaneous regardless of template size
      // because no data is physically copied.
      await client.query(`CREATE BRANCH ${branchName} FROM template_v47`);

      // Switch to the new branch and customize for this tenant
      await client.query(`USE BRANCH ${branchName}`);

      // Set tenant-specific configuration
      await client.query(
        `
        INSERT INTO team_members (email, display_name, role, settings)
        VALUES ($1, $2, 'admin', $3)
        `,
        [
          `admin@${tenantId}.example.com`,
          `${companyName} Admin`,
          JSON.stringify({ plan, onboarding: true }),
        ]
      );

      // Update the getting-started project for this tenant
      await client.query(
        `
        UPDATE projects
        SET settings = settings || $1::jsonb
        WHERE settings->>'template' = 'true'
        `,
        [JSON.stringify({ company_name: companyName, plan })]
      );

      const elapsed = Date.now() - start;
      return {
        tenantId,
        branchName,
        provisioningMs: elapsed,
        status: "ready",
      };
    } catch (err) {
      console.error(`Failed to provision tenant ${tenantId}:`, err);
      // Clean up on failure
      try {
        await client.query(`DROP BRANCH IF EXISTS ${branchName}`);
      } catch {
        // Best effort cleanup
      }
      return {
        tenantId,
        branchName,
        provisioningMs: Date.now() - start,
        status: "error",
      };
    } finally {
      client.release();
    }
  }

  /**
   * Get a connection pool for a specific tenant's branch.
   * Pools are created lazily and cached.
   */
  getTenantPool(tenantId: string): Pool {
    const branchName = `tenant_${tenantId}`;

    if (!this.tenantPools.has(tenantId)) {
      const pool = new Pool({
        connectionString: `postgresql://app:password@localhost:5433/saasdb?options=-c%20helios.branch=${branchName}`,
        max: 5,  // Per-tenant pool size -- much smaller than RDS per-instance
        idleTimeoutMillis: 30000,
      });
      this.tenantPools.set(tenantId, pool);
    }

    return this.tenantPools.get(tenantId)!;
  }

  /**
   * Execute a query in a tenant's isolated branch.
   * Complete data isolation -- no tenant_id filters needed.
   */
  async queryTenant<T>(
    tenantId: string,
    sql: string,
    params?: any[]
  ): Promise<T[]> {
    const pool = this.getTenantPool(tenantId);
    const result = await pool.query(sql, params);
    return result.rows as T[];
  }

  /**
   * Delete a tenant and all their data by dropping the branch.
   * Instant, regardless of data volume.
   */
  async deprovisionTenant(tenantId: string): Promise<void> {
    const branchName = `tenant_${tenantId}`;

    // Close and remove the connection pool
    const pool = this.tenantPools.get(tenantId);
    if (pool) {
      await pool.end();
      this.tenantPools.delete(tenantId);
    }

    // Drop the branch -- instant, frees copy-on-write pages
    const client = await this.adminPool.connect();
    try {
      await client.query(`DROP BRANCH ${branchName}`);
    } finally {
      client.release();
    }
  }
}

// Usage: Tenant provisioning during free trial signup
const manager = new TenantManager(
  "postgresql://admin:password@localhost:5433/saasdb"
);

// This completes in <500ms -- the prospect sees their dashboard immediately
const result = await manager.provisionTenant(
  "acme-corp",
  "Acme Corporation",
  "free"
);
console.log(`Provisioned in ${result.provisioningMs}ms`);

// Querying is standard SQL -- no tenant_id filters needed
// The branch provides complete isolation at the database level
const projects = await manager.queryTenant<{ name: string; status: string }>(
  "acme-corp",
  "SELECT name, status FROM projects ORDER BY created_at DESC LIMIT 10"
);

Zero-Downtime Schema Migrations via Branching

The most operationally impactful capability -- migrating 200+ tenant databases safely:

import psycopg2
import time
from dataclasses import dataclass
from typing import Optional
from concurrent.futures import ThreadPoolExecutor, as_completed

@dataclass
class MigrationResult:
    tenant_id: str
    status: str            # 'success', 'failed', 'rolled_back'
    duration_ms: float
    error: Optional[str] = None

class BranchMigrator:
    """
    Performs zero-downtime schema migrations across all tenant branches.

    Strategy:
    1. Apply migration to the template branch (golden master)
    2. For each tenant, create a migration branch, apply changes, verify, swap
    3. If any step fails, the tenant stays on their original branch

    Previous approach (200 independent RDS instances):
      - Serial execution: 4-6 hours
      - ~3% failure rate requiring manual intervention
      - Risky: failed migration left database in inconsistent state

    HeliosDB branching approach:
      - Parallel execution: 2-3 minutes for 200 tenants
      - 0% data loss risk: original branch preserved until verified
      - Atomic swap: tenant sees old or new schema, never a partial state
    """

    def __init__(self, dsn: str, max_workers: int = 20):
        self.dsn = dsn
        self.max_workers = max_workers

    def migrate_all_tenants(
        self, migration_sql: str, version: int
    ) -> list[MigrationResult]:
        """Apply a migration to all tenant branches in parallel."""

        # Step 1: Apply to template first
        print(f"Applying migration v{version} to template...")
        self._apply_to_template(migration_sql, version)

        # Step 2: Get all tenant branches
        tenant_ids = self._list_tenant_branches()
        print(f"Migrating {len(tenant_ids)} tenant branches...")

        # Step 3: Migrate tenants in parallel
        results = []
        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            futures = {
                executor.submit(
                    self._migrate_tenant, tid, migration_sql, version
                ): tid
                for tid in tenant_ids
            }
            for future in as_completed(futures):
                result = future.result()
                results.append(result)
                symbol = "OK" if result.status == "success" else "FAIL"
                print(
                    f"  [{symbol}] {result.tenant_id} "
                    f"({result.duration_ms:.0f}ms)"
                )

        # Summary
        succeeded = sum(1 for r in results if r.status == "success")
        failed = sum(1 for r in results if r.status != "success")
        print(f"\nMigration v{version} complete: "
              f"{succeeded} succeeded, {failed} failed")

        return results

    def _apply_to_template(self, migration_sql: str, version: int) -> None:
        """Apply migration to the template branch (golden master)."""
        conn = psycopg2.connect(self.dsn)
        try:
            with conn:
                with conn.cursor() as cur:
                    # Update template to new version
                    cur.execute(
                        f"CREATE BRANCH template_v{version} "
                        f"FROM template_v{version - 1}"
                    )
                    cur.execute(f"USE BRANCH template_v{version}")
                    cur.execute(migration_sql)
        finally:
            conn.close()

    def _migrate_tenant(
        self, tenant_id: str, migration_sql: str, version: int
    ) -> MigrationResult:
        """
        Migrate a single tenant using branch-and-swap strategy.

        1. Create a migration branch from the tenant's current branch
        2. Apply the migration SQL
        3. Run verification queries
        4. Swap the tenant to the migrated branch
        5. Keep old branch for 24h rollback window
        """
        start = time.time()
        conn = psycopg2.connect(self.dsn)
        old_branch = f"tenant_{tenant_id}"
        new_branch = f"tenant_{tenant_id}_v{version}"

        try:
            with conn:
                with conn.cursor() as cur:
                    # Create migration branch (copy-on-write, instant)
                    cur.execute(
                        f"CREATE BRANCH {new_branch} FROM {old_branch}"
                    )

                    # Apply migration on the new branch
                    cur.execute(f"USE BRANCH {new_branch}")
                    cur.execute(migration_sql)

                    # Verify: check row counts match (data integrity)
                    cur.execute(f"USE BRANCH {old_branch}")
                    cur.execute(
                        "SELECT COUNT(*) FROM projects"
                    )
                    old_count = cur.fetchone()[0]

                    cur.execute(f"USE BRANCH {new_branch}")
                    cur.execute(
                        "SELECT COUNT(*) FROM projects"
                    )
                    new_count = cur.fetchone()[0]

                    if old_count != new_count:
                        raise ValueError(
                            f"Row count mismatch: {old_count} vs {new_count}"
                        )

                    # Swap: rename branches atomically
                    cur.execute(
                        f"ALTER BRANCH {old_branch} "
                        f"RENAME TO {old_branch}_retired_v{version}"
                    )
                    cur.execute(
                        f"ALTER BRANCH {new_branch} "
                        f"RENAME TO {old_branch}"
                    )

            elapsed = (time.time() - start) * 1000
            return MigrationResult(tenant_id, "success", elapsed)

        except Exception as e:
            elapsed = (time.time() - start) * 1000

            # Rollback: drop the migration branch, tenant stays on original
            try:
                with conn:
                    with conn.cursor() as cur:
                        cur.execute(f"DROP BRANCH IF EXISTS {new_branch}")
            except Exception:
                pass

            return MigrationResult(tenant_id, "failed", elapsed, str(e))

        finally:
            conn.close()

    def _list_tenant_branches(self) -> list[str]:
        """List all active tenant branch names."""
        conn = psycopg2.connect(self.dsn)
        try:
            with conn.cursor() as cur:
                cur.execute("""
                    SELECT branch_name
                    FROM helios_branches()
                    WHERE branch_name LIKE 'tenant_%'
                      AND branch_name NOT LIKE '%_retired_%'
                    ORDER BY branch_name
                """)
                return [row[0].replace("tenant_", "") for row in cur.fetchall()]
        finally:
            conn.close()

    def rollback_tenant(self, tenant_id: str, version: int) -> None:
        """
        Rollback a single tenant to their pre-migration branch.
        The retired branch is preserved for exactly this purpose.
        """
        conn = psycopg2.connect(self.dsn)
        try:
            with conn:
                with conn.cursor() as cur:
                    current = f"tenant_{tenant_id}"
                    retired = f"tenant_{tenant_id}_retired_v{version}"

                    # Swap back
                    cur.execute(
                        f"ALTER BRANCH {current} "
                        f"RENAME TO tenant_{tenant_id}_failed_v{version}"
                    )
                    cur.execute(
                        f"ALTER BRANCH {retired} RENAME TO {current}"
                    )
        finally:
            conn.close()


# Usage: Adding a 'priority_label' column to tasks table
migrator = BranchMigrator(
    "postgresql://admin:password@localhost:5433/saasdb",
    max_workers=20,
)

migration_sql = """
    ALTER TABLE tasks ADD COLUMN priority_label VARCHAR(20);
    UPDATE tasks SET priority_label = CASE
        WHEN priority >= 8 THEN 'critical'
        WHEN priority >= 5 THEN 'high'
        WHEN priority >= 3 THEN 'medium'
        ELSE 'low'
    END;
    CREATE INDEX idx_tasks_priority_label ON tasks (priority_label)
        WHERE status != 'done';
"""

results = migrator.migrate_all_tenants(migration_sql, version=48)

# If any tenant had issues, roll them back individually
for r in results:
    if r.status == "failed":
        print(f"Rolling back {r.tenant_id}: {r.error}")
        migrator.rollback_tenant(r.tenant_id, version=48)

Per-Tenant Analytics (SQL)

Queries that run within a tenant's isolated branch -- no WHERE tenant_id = ? needed:

-- Dashboard query: project health overview
-- Runs in the tenant's branch -- complete isolation, no tenant_id filter
SELECT
    p.id,
    p.name,
    p.status,
    COUNT(t.id) AS total_tasks,
    SUM(CASE WHEN t.status = 'done' THEN 1 ELSE 0 END) AS completed_tasks,
    SUM(CASE WHEN t.status = 'done' THEN 1 ELSE 0 END)::DECIMAL
        / NULLIF(COUNT(t.id), 0) * 100 AS completion_pct,
    SUM(CASE WHEN t.due_date < CURRENT_DATE AND t.status != 'done'
        THEN 1 ELSE 0 END) AS overdue_tasks,
    COALESCE(SUM(te.duration_minutes), 0) / 60.0 AS total_hours_logged,
    COALESCE(SUM(t.estimated_hours), 0) AS total_estimated_hours
FROM projects p
LEFT JOIN tasks t ON t.project_id = p.id
LEFT JOIN time_entries te ON te.task_id = t.id
WHERE p.status = 'active'
GROUP BY p.id, p.name, p.status
ORDER BY overdue_tasks DESC, completion_pct ASC;

-- Team utilization report: hours by member over the past 4 weeks
WITH weekly_hours AS (
    SELECT
        tm.display_name,
        tm.role,
        DATE_TRUNC('week', te.started_at) AS week,
        SUM(te.duration_minutes) / 60.0 AS hours_worked,
        SUM(CASE WHEN te.billable THEN te.duration_minutes ELSE 0 END)
            / 60.0 AS billable_hours
    FROM team_members tm
    JOIN time_entries te ON te.user_id = tm.id
    WHERE te.started_at >= NOW() - INTERVAL '4 weeks'
    GROUP BY tm.display_name, tm.role, DATE_TRUNC('week', te.started_at)
)
SELECT
    display_name,
    role,
    ROUND(AVG(hours_worked)::DECIMAL, 1) AS avg_weekly_hours,
    ROUND(AVG(billable_hours)::DECIMAL, 1) AS avg_billable_hours,
    ROUND(
        AVG(billable_hours) / NULLIF(AVG(hours_worked), 0) * 100, 1
    ) AS billable_pct,
    ROUND(MAX(hours_worked)::DECIMAL, 1) AS peak_week_hours
FROM weekly_hours
GROUP BY display_name, role
ORDER BY avg_weekly_hours DESC;

-- Cross-project task search with JSONB tag filtering
SELECT
    t.id,
    t.title,
    t.status,
    t.priority_label,
    p.name AS project_name,
    tm.display_name AS assignee,
    t.due_date,
    t.tags
FROM tasks t
JOIN projects p ON t.project_id = p.id
LEFT JOIN team_members tm ON t.assignee_id = tm.id
WHERE t.status != 'done'
  AND t.tags @> '["urgent"]'::jsonb
ORDER BY t.priority DESC, t.due_date ASC NULLS LAST;

Results

Metric	Before (200 RDS Instances)	After (HeliosDB-Lite Branching)	Improvement
Monthly infrastructure cost	$18,400	$2,700	85% reduction
Cost per tenant	$92	$13.50	85% reduction
Projected cost at 500 tenants	$46,000	$4,200	91% reduction
Tenant provisioning time	30 seconds	0.5 seconds	98% faster
Schema migration (200 tenants)	4-6 hours (serial)	2-3 minutes (parallel)	99% faster
Migration failure rate	~3% per migration	0% (branch-and-swap)	Eliminated
Migration rollback time	30-60 minutes (restore from backup)	<1 second (branch rename)	Near-instant
Idle connection pools	2,000+ connections	200 connections (shared)	90% reduction
Storage efficiency	200 full copies	Copy-on-write (shared pages)	~70% reduction
Backup management	200 individual schedules	1 unified backup	99.5% simpler
Trial signup conversion	72% (30s wait)	89% (<1s wait)	+17 percentage points

Cost Breakdown

Cost Category	Before (Monthly)	After (Monthly)	Notes
RDS instances (200x db.t3.medium)	$14,400	$0	Replaced by HeliosDB-Lite
RDS storage (200x 20GB gp3)	$1,600	$0	Copy-on-write branching
RDS backups	$800	$0	Single unified backup
RDS data transfer	$600	$0	Local connections
Migration tooling (Step Functions)	$200	$0	Built-in branching
Connection pooling (PgBouncer on EC2)	$800	$0	Not needed
HeliosDB-Lite compute (2x c6g.2xlarge)	$0	$2,200	Primary + standby
HeliosDB-Lite storage (2TB gp3)	$0	$320	Shared via COW
Monitoring	$0	$180	Prometheus + Grafana
Total	$18,400	$2,700	85% reduction

Key Takeaways

Database branching transforms multi-tenant economics. The fundamental insight is that most tenants share the same schema and default data. Copy-on-write branching means creating a new tenant's database is a metadata operation, not a data copy. Storage is shared for unchanged pages, so 200 tenants with similar schemas use dramatically less storage than 200 independent databases.
Sub-second tenant provisioning improved trial conversion by 17 points. When a prospect clicked "Start Free Trial," the previous 30-second wait for RDS provisioning caused 28% abandonment. With HeliosDB branching completing in under 500ms, the prospect sees their dashboard almost immediately. This single change drove trial-to-paid conversion up by 17 percentage points.
Branch-and-swap migrations eliminated downtime and rollback risk. The previous approach of running ALTER TABLE on 200 live databases was inherently risky -- a failed migration left a tenant's database in an inconsistent state, requiring a time-consuming restore from backup. With branching, the migration is applied to a copy; the tenant only sees the result after verification. Rolling back is a branch rename that takes less than a second.
Eliminating tenant_id from every query reduced bugs. With branch-based isolation, there is no possibility of cross-tenant data leakage from a missing WHERE clause. Every query runs in a tenant's isolated branch by default. This eliminated an entire class of security bugs and simplified the application code significantly -- no more RLS policies, no more mandatory tenant_id parameters on every database function.
Connection pool consolidation reduced memory overhead by 90%. Instead of 200 independent connection pools (one per RDS instance), all tenants share a single connection pool with branch-switching. This reduced the total idle connection count from over 2,000 to roughly 200, freeing significant memory on the application servers.

Technical Stack

Component	Technology
Database	HeliosDB-Lite (server mode, 2-node primary/standby)
Multi-tenancy	Database branching (copy-on-write)
Client Protocol	PostgreSQL wire protocol (SCRAM-SHA-256 auth)
Application	TypeScript (Next.js), Python (migration tooling)
Driver	pg (Node.js), psycopg2 (Python)
Deployment	AWS EC2 (c6g.2xlarge ARM64), gp3 EBS storage
Monitoring	Prometheus + Grafana (branch-level metrics)
CI/CD	GitHub Actions (migration testing against template branch)
Previous Stack (replaced)	200+ AWS RDS PostgreSQL instances, PgBouncer, AWS Step Functions (migration orchestration)

85% Infrastructure Cost Reduction with Database Branching for Multi-Tenancy