Zero-Downtime Schema Migration Architecture

Document Version: 1.0 Created: November 9, 2025 Status: Phase 2 Architecture Design Author: System Architecture Designer

Executive Summary

This document defines the complete architecture for HeliosDB’s zero-downtime schema migration system, enabling online DDL operations during live traffic without table locks or application downtime. The system is inspired by proven tools like GitHub’s gh-ost and Percona’s pt-online-schema-change, but extends them with distributed capabilities, AI-driven optimization, and native integration with HeliosDB’s multi-model architecture.

Key Capabilities

Online DDL: ADD/DROP/MODIFY COLUMN, ADD/DROP INDEX without blocking operations
Ghost Table Approach: Shadow table creation with trigger-based replication
Atomic Cutover: Millisecond switchover with automatic rollback
Zero Data Loss: Transaction-consistent migration with validation
Performance Monitoring: Real-time overhead tracking and throttling
Multi-Model Support: Works across document, graph, time-series, and relational models

1. System Architecture

1.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    Application Layer                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │
│  │  SQL Client  │  │  GraphQL API │  │  REST API    │             │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘             │
│         │                 │                 │                       │
│         └─────────────────┴─────────────────┘                       │
│                           ▼                                         │
│         ┌─────────────────────────────────────────────┐            │
│         │   Migration Coordinator (Router)            │            │
│         │  • Route to original or ghost table         │            │
│         │  • Handle dual-write during migration       │            │
│         └───────────────┬─────────────────────────────┘            │
│                         │                                           │
└─────────────────────────┼───────────────────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────────────────┐
│            Zero-Downtime Migration Engine                           │
│                         │                                           │
│  ┌──────────────────────▼──────────────────────────┐               │
│  │         Migration Orchestrator                  │               │
│  │  • State machine management                     │               │
│  │  • Phase transitions                            │               │
│  │  • Rollback coordination                        │               │
│  └──┬────────────────┬────────────────┬────────────┘               │
│     │                │                │                             │
│  ┌──▼────────┐  ┌───▼──────┐  ┌─────▼──────┐  ┌─────────────┐   │
│  │  Ghost    │  │  Copy    │  │  Trigger   │  │  Validator  │   │
│  │  Table    │  │  Engine  │  │  Manager   │  │  Engine     │   │
│  │  Manager  │  │          │  │            │  │             │   │
│  └───────────┘  └──────────┘  └────────────┘  └─────────────┘   │
│                                                                     │
│  ┌──────────────────────────────────────────────────────┐         │
│  │         Migration State Store                        │         │
│  │  • Migration metadata (state, progress, metrics)     │         │
│  │  • Cutover checkpoints                               │         │
│  │  • Rollback snapshots                                │         │
│  └──────────────────────────────────────────────────────┘         │
└─────────────────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────────────────┐
│              Storage Layer                                          │
│                         │                                           │
│  ┌──────────────────────▼──────────────────────────┐               │
│  │                                                  │               │
│  │  ┌──────────────┐         ┌──────────────┐     │               │
│  │  │   Original   │         │    Ghost     │     │               │
│  │  │    Table     │         │    Table     │     │               │
│  │  │              │         │  (_new_xxx)  │     │               │
│  │  └──────────────┘         └──────────────┘     │               │
│  │                                                  │               │
│  │  ┌──────────────────────────────────────┐      │               │
│  │  │  Trigger Layer (Dual Write)          │      │               │
│  │  │  • INSERT/UPDATE/DELETE capture      │      │               │
│  │  │  • Async replication to ghost        │      │               │
│  │  └──────────────────────────────────────┘      │               │
│  │                                                  │               │
│  │         HeliosDB Storage Engine                 │               │
│  │  (LSM Tree, WAL, MVCC, Transaction Manager)    │               │
│  └──────────────────────────────────────────────────┘              │
└─────────────────────────────────────────────────────────────────────┘

1.2 Component Breakdown

1.2.1 Migration Coordinator

Purpose: Routes queries to appropriate table during migration
Responsibilities:
- Query routing (original vs ghost table)
- Dual-write coordination
- Read/write splitting during cutover
- Performance monitoring and throttling

1.2.2 Migration Orchestrator

Purpose: Manages migration lifecycle and state transitions
Responsibilities:
- State machine execution
- Phase orchestration (Planning → Executing → Validating → Cutover)
- Error handling and rollback coordination
- Progress tracking and reporting

1.2.3 Ghost Table Manager

Purpose: Creates and manages shadow tables
Responsibilities:
- Ghost table creation with new schema
- Naming convention (_helios_ghost_{table_name}_{timestamp})
- Cleanup after successful cutover
- Preservation for rollback

1.2.4 Copy Engine

Purpose: Efficiently copy data from original to ghost table
Responsibilities:
- Chunked, resumable bulk copy
- Throttling to avoid overwhelming storage
- Progress tracking (rows copied, estimated time remaining)
- Delta detection for incremental catch-up

1.2.5 Trigger Manager

Purpose: Synchronize ongoing changes during migration
Responsibilities:
- Install INSERT/UPDATE/DELETE triggers on original table
- Capture changes to changelog table
- Async replication to ghost table
- Conflict resolution (last-write-wins)

1.2.6 Validator Engine

Purpose: Ensure data consistency between original and ghost
Responsibilities:
- Row count validation
- Checksum comparison (chunk-wise)
- Sample-based deep comparison
- Index validation

1.2.7 Migration State Store

Purpose: Persistent storage for migration metadata
Responsibilities:
- Migration state persistence
- Progress checkpoints
- Rollback snapshots
- Audit trail

2. Migration Lifecycle State Machine

2.1 State Diagram

                     ┌──────────────┐
                     │   CREATED    │
                     └──────┬───────┘
                            │ start_migration()
                            ▼
                     ┌──────────────┐
                     │   PLANNING   │──────────► FAILED (validation error)
                     └──────┬───────┘
                            │ validate()
                            ▼
                     ┌──────────────┐
                     │  PREPARING   │
                     │  Ghost Table │
                     └──────┬───────┘
                            │ create_ghost_table()
                            ▼
                     ┌──────────────┐
        ┌───────────►│   COPYING    │──────────► PAUSED
        │            │  Bulk Data   │◄──────────┘ (resume_migration)
        │            └──────┬───────┘
        │                   │ bulk_copy_complete()
        │                   ▼
        │            ┌──────────────┐
        │            │SYNCHRONIZING │
        │            │ Incremental  │
        │            └──────┬───────┘
        │                   │ catch_up_complete()
        │                   ▼
        │            ┌──────────────┐
        │            │  VALIDATING  │──────────► FAILED (validation failed)
        │            └──────┬───────┘                │
        │                   │ validation_passed()     │
        │                   ▼                         │
        │            ┌──────────────┐                │
        │            │  READY FOR   │                │
        │            │   CUTOVER    │                │
        │            └──────┬───────┘                │
        │                   │ cutover()               │
        │                   ▼                         │
        │            ┌──────────────┐                │
        │       ┌───┤   CUTOVER    │                │
        │       │   │  In Progress │                │
        │       │   └──────┬───────┘                │
        │       │          │ cutover_complete()      │
        │       │          ▼                         │
        │       │   ┌──────────────┐                │
        │       │   │  COMPLETED   │                │
        │       │   └──────────────┘                │
        │       │                                    │
        │       │                                    │
        │  rollback()                          rollback()
        │       │                                    │
        │       ▼                                    ▼
        │    ┌──────────────┐              ┌──────────────┐
        └────┤  ROLLING     │◄─────────────┤    FAILED    │
             │    BACK      │              └──────────────┘
             └──────┬───────┘
                    │ rollback_complete()
                    ▼
             ┌──────────────┐
             │ ROLLED BACK  │
             └──────────────┘

2.2 State Transitions

Current State	Event	Next State	Reversible
CREATED	`start_migration()`	PLANNING	Yes
PLANNING	`validate()`	PREPARING	Yes
PREPARING	`create_ghost_table()`	COPYING	Yes
COPYING	`bulk_copy_complete()`	SYNCHRONIZING	❌ No*
COPYING	`pause_migration()`	PAUSED	Yes
PAUSED	`resume_migration()`	COPYING	Yes
SYNCHRONIZING	`catch_up_complete()`	VALIDATING	❌ No*
VALIDATING	`validation_passed()`	READY FOR CUTOVER	Yes
READY FOR CUTOVER	`cutover()`	CUTOVER	Yes (for 5 min)
CUTOVER	`cutover_complete()`	COMPLETED	❌ No
ANY	`rollback()`	ROLLING BACK	-
ROLLING BACK	`rollback_complete()`	ROLLED BACK	❌ No

*Can rollback but will lose incremental copy progress

2.3 Rollback Points

The system supports rollback at different stages with varying complexity:

CREATED → PLANNING → PREPARING: Instant rollback
- No data copied, no triggers installed
- Simply drop ghost table (if created)
COPYING → SYNCHRONIZING: Fast rollback
- Drop triggers
- Drop ghost table
- Update migration state
- Estimated time: <10 seconds
VALIDATING → READY FOR CUTOVER: Fast rollback
- Same as above
- Validation data already discarded
CUTOVER (first 5 minutes): Emergency rollback
- Requires table rename swap back
- Reapply missed writes from changelog
- Estimated time: <30 seconds
- Hard limit: 5 minutes, after that rollback is NOT safe
COMPLETED: No rollback
- Original table dropped
- Must create new migration to revert schema

3. Ghost Table Approach

3.1 Ghost Table Lifecycle

The ghost table approach is inspired by GitHub’s gh-ost but adapted for HeliosDB’s distributed architecture:

Step 1: Create Ghost Table
┌─────────────────────────────────────┐
│  Original Table: users              │
│  ┌─────────┬──────────┬──────────┐ │
│  │ id (PK) │ name     │ email    │ │
│  ├─────────┼──────────┼──────────┤ │
│  │ 1       │ Alice    │ a@ex.com │ │
│  │ 2       │ Bob      │ b@ex.com │ │
│  └─────────┴──────────┴──────────┘ │
└─────────────────────────────────────┘
                 │
                 │ CREATE GHOST TABLE
                 ▼
┌─────────────────────────────────────────────┐
│ Ghost Table: _helios_ghost_users_20251109  │
│ ┌─────────┬──────────┬──────────┬────────┐ │
│ │ id (PK) │ name     │ email    │ phone  │ │ ← New column
│ ├─────────┼──────────┼──────────┼────────┤ │
│ │ (empty) │          │          │        │ │
│ └─────────┴──────────┴──────────┴────────┘ │
└─────────────────────────────────────────────┘


Step 2: Install Triggers (Dual Write)
┌─────────────────────────────────────┐
│  Original Table: users              │
│  ┌─────────────────────────────┐   │
│  │  Triggers Installed:        │   │
│  │  • INSERT → Changelog       │   │
│  │  • UPDATE → Changelog       │   │
│  │  • DELETE → Changelog       │   │
│  └─────────────────────────────┘   │
│  ┌─────────┬──────────┬──────────┐ │
│  │ id (PK) │ name     │ email    │ │
│  ├─────────┼──────────┼──────────┤ │
│  │ 1       │ Alice    │ a@ex.com │ │
│  │ 2       │ Bob      │ b@ex.com │ │
│  └─────────┴──────────┴──────────┘ │
└─────────────┬───────────────────────┘
              │
              │ Async Replication
              ▼
┌──────────────────────────────────────┐
│  Changelog Table                     │
│  ┌────────┬─────┬─────────────────┐ │
│  │ lsn    │ op  │ data            │ │
│  ├────────┼─────┼─────────────────┤ │
│  │ 100001 │ INS │ {id:3,name:...} │ │
│  │ 100002 │ UPD │ {id:1,email:..} │ │
│  └────────┴─────┴─────────────────┘ │
└──────────────────────────────────────┘


Step 3: Bulk Copy (Chunked)
┌─────────────────────────────────────┐
│  Original Table: users (100M rows)  │
│  ┌─────────┬──────────┬──────────┐ │
│  │ id (PK) │ name     │ email    │ │
│  └─────────┴──────────┴──────────┘ │
└─────────────┬───────────────────────┘
              │
              │ Chunked Copy (10K rows/chunk)
              │ Throttled (max 50% I/O)
              ▼
┌─────────────────────────────────────────────┐
│ Ghost Table: _helios_ghost_users_20251109  │
│ ┌─────────┬──────────┬──────────┬────────┐ │
│ │ id (PK) │ name     │ email    │ phone  │ │
│ ├─────────┼──────────┼──────────┼────────┤ │
│ │ 1       │ Alice    │ a@ex.com │ NULL   │ │
│ │ 2       │ Bob      │ b@ex.com │ NULL   │ │
│ │ ...     │ ...      │ ...      │ NULL   │ │
│ │ (99.9% complete)                        │ │
│ └─────────┴──────────┴──────────┴────────┘ │
└─────────────────────────────────────────────┘


Step 4: Catch-up (Apply Changelog)
┌──────────────────────────────────────┐
│  Changelog Table                     │
│  Apply to ghost table in LSN order   │
│  ┌────────┬─────┬─────────────────┐ │
│  │ lsn    │ op  │ data            │ │
│  ├────────┼─────┼─────────────────┤ │
│  │ 100001 │ INS │ {id:3,name:...} │ │ ──┐
│  │ 100002 │ UPD │ {id:1,email:..} │ │   │
│  └────────┴─────┴─────────────────┘ │   │
└──────────────────────────────────────┘   │
                                           │ Apply
                                           ▼
┌─────────────────────────────────────────────┐
│ Ghost Table: _helios_ghost_users_20251109  │
│ ┌─────────┬──────────┬──────────┬────────┐ │
│ │ id (PK) │ name     │ email    │ phone  │ │
│ ├─────────┼──────────┼──────────┼────────┤ │
│ │ 1       │ Alice    │ new@...  │ NULL   │ │ ← Updated
│ │ 2       │ Bob      │ b@ex.com │ NULL   │ │
│ │ 3       │ Charlie  │ c@ex.com │ NULL   │ │ ← Inserted
│ │ (100% in sync)                          │ │
│ └─────────┴──────────┴──────────┴────────┘ │
└─────────────────────────────────────────────┘


Step 5: Atomic Cutover (Rename Swap)
BEGIN TRANSACTION;
  -- Lock both tables briefly (milliseconds)
  LOCK TABLE users, _helios_ghost_users_20251109;

  -- Apply final changelog entries
  -- (should be <100 rows accumulated during lock)

  -- Atomic rename
  ALTER TABLE users RENAME TO _helios_old_users_20251109;
  ALTER TABLE _helios_ghost_users_20251109 RENAME TO users;

  -- Update metadata catalog
  UPDATE helios_migration_state SET status = 'COMPLETED';
COMMIT;

Post-Cutover:
┌─────────────────────────────────────────────┐
│  New Table: users (with new schema)        │
│  ┌─────────┬──────────┬──────────┬────────┐ │
│  │ id (PK) │ name     │ email    │ phone  │ │
│  ├─────────┼──────────┼──────────┼────────┤ │
│  │ 1       │ Alice    │ new@...  │ NULL   │ │
│  │ 2       │ Bob      │ b@ex.com │ NULL   │ │
│  │ 3       │ Charlie  │ c@ex.com │ NULL   │ │
│  └─────────┴──────────┴──────────┴────────┘ │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Backup: _helios_old_users_20251109        │
│  (kept for 24 hours for emergency rollback) │
└─────────────────────────────────────────────┘

3.2 Trigger-Based Synchronization

HeliosDB uses a changelog table approach to capture changes:

-- Changelog table structure
CREATE TABLE _helios_changelog_{table_name}_{timestamp} (
    lsn BIGINT PRIMARY KEY,           -- Log Sequence Number
    operation VARCHAR(10) NOT NULL,    -- INSERT, UPDATE, DELETE
    pk_value TEXT NOT NULL,            -- Primary key of affected row
    data JSONB,                        -- Full row data (for INSERT/UPDATE)
    timestamp TIMESTAMP NOT NULL,      -- When change occurred
    applied BOOLEAN DEFAULT false      -- Has been applied to ghost
);

-- Trigger on original table (pseudo-code)
CREATE TRIGGER capture_changes
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW
BEGIN
    INSERT INTO _helios_changelog_users_20251109 (
        lsn, operation, pk_value, data, timestamp
    )
    VALUES (
        next_lsn(),
        operation_type,
        NEW.id (or OLD.id for DELETE),
        row_to_json(NEW),
        NOW()
    );
END;

Replication Worker (async process):

Polls changelog table for applied = false
Applies changes to ghost table in LSN order
Marks as applied = true
Handles conflicts (last-write-wins based on timestamp)
Throttles to avoid overwhelming storage

4. Migration Operations Support

4.1 Supported DDL Operations

Operation	Complexity	Downtime	Data Copy	Notes
ADD COLUMN (nullable)	Low	0ms	No	Metadata-only change
ADD COLUMN (NOT NULL)	Medium	~ms	Yes	Requires default or backfill
DROP COLUMN	Low	~ms	No	Metadata-only, data ignored
MODIFY COLUMN (type)	High	~ms	Yes	Full table rewrite if incompatible
RENAME COLUMN	Low	0ms	No	Metadata-only change
ADD INDEX	Medium	0ms	No	Built concurrently on ghost
DROP INDEX	Low	0ms	No	Metadata-only change
ADD PRIMARY KEY	High	~ms	Yes	Full table rewrite
DROP PRIMARY KEY	High	~ms	Yes	Full table rewrite
ADD FOREIGN KEY	Medium	~ms	No	Validation required
DROP FOREIGN KEY	Low	0ms	No	Metadata-only change
ADD UNIQUE CONSTRAINT	Medium	~ms	No	Validation + index build
DROP UNIQUE CONSTRAINT	Low	0ms	No	Metadata-only change

4.2 Operation Implementation Details

4.2.1 ADD COLUMN (nullable)

Scenario: Add optional phone column to users table.

Strategy: Fast Path - Metadata Only

-- Original command
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- HeliosDB internal execution
BEGIN TRANSACTION;
    -- Update schema metadata
    UPDATE helios_catalog.columns
    SET schema_version = schema_version + 1
    WHERE table_name = 'users';

    INSERT INTO helios_catalog.columns (
        table_name, column_name, data_type, nullable, default_value
    ) VALUES (
        'users', 'phone', 'VARCHAR(20)', true, NULL
    );

    -- No data migration needed
    -- Existing rows will return NULL for phone column
COMMIT;

Overhead: <10ms (metadata write only)

4.2.2 ADD COLUMN (NOT NULL with default)

Scenario: Add mandatory created_at column with default value.

Strategy: Ghost Table Migration

-- Original command
ALTER TABLE users ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW();

-- HeliosDB internal execution
-- Step 1: Create ghost table with new column
CREATE TABLE _helios_ghost_users_20251109 AS
SELECT *, NOW() AS created_at FROM users WHERE 1=0;

-- Step 2: Install triggers for dual write
-- Step 3: Bulk copy with default value
INSERT INTO _helios_ghost_users_20251109
SELECT *, NOW() AS created_at
FROM users
ORDER BY id
LIMIT 10000;  -- Chunked

-- Step 4: Catch-up from changelog
-- Step 5: Atomic cutover

Overhead: Proportional to table size, ~1-5 minutes for 100M rows

4.2.3 MODIFY COLUMN (type change)

Scenario: Change user_id from INT to BIGINT.

Strategy: Ghost Table with Type Casting

-- Original command
ALTER TABLE users MODIFY COLUMN user_id BIGINT;

-- HeliosDB internal execution
-- Create ghost table with new type
CREATE TABLE _helios_ghost_users_20251109 (
    user_id BIGINT,  -- Changed from INT
    name VARCHAR(255),
    email VARCHAR(255)
);

-- Copy with casting
INSERT INTO _helios_ghost_users_20251109
SELECT CAST(user_id AS BIGINT), name, email
FROM users
LIMIT 10000;  -- Chunked

Type Compatibility Matrix:

From → To	Safe?	Strategy
INT → BIGINT	Yes	Direct cast
BIGINT → INT	⚠ Warning	Cast with overflow check
VARCHAR(N) → VARCHAR(M) where M>N	Yes	Direct
VARCHAR(N) → VARCHAR(M) where M<N	❌ No	Error if data > M
TIMESTAMP → DATE	⚠ Warning	Truncation (time lost)
JSON → TEXT	Yes	Stringify
TEXT → JSON	⚠ Warning	Parse (may fail)

4.2.4 ADD INDEX

Scenario: Add index on email column.

Strategy: Concurrent Index Build

-- Original command
CREATE INDEX idx_users_email ON users(email);

-- HeliosDB internal execution
-- Build index on ghost table while it's being populated
-- No need for triggers since we're adding index, not changing data
CREATE INDEX CONCURRENTLY idx_users_email
ON _helios_ghost_users_20251109(email);

Overhead: Index build time (~1-10 min for 100M rows)

4.2.5 DROP COLUMN

Scenario: Remove deprecated legacy_field column.

Strategy: Ghost Table without Column

-- Original command
ALTER TABLE users DROP COLUMN legacy_field;

-- HeliosDB internal execution
CREATE TABLE _helios_ghost_users_20251109 (
    user_id BIGINT,
    name VARCHAR(255),
    email VARCHAR(255)
    -- legacy_field omitted
);

-- Copy only required columns
INSERT INTO _helios_ghost_users_20251109
SELECT user_id, name, email  -- Exclude legacy_field
FROM users
LIMIT 10000;

Overhead: Minimal (column ignored during copy)

5. Rollback Capabilities

5.1 One-Click Rollback Design

Philosophy: Rollback should be as easy as starting the migration.

Implementation:

// Simple rollback API
migration_engine.rollback(migration_id)?;

// Or via SQL
ROLLBACK MIGRATION 'migration-uuid-1234';

5.2 Rollback Mechanisms by Phase

Phase 1-3 (PLANNING → COPYING): Drop Ghost Table

async fn rollback_early_phase(migration_id: &str) -> Result<()> {
    // 1. Drop triggers if installed
    drop_triggers(migration_id).await?;

    // 2. Drop ghost table
    let ghost_table = format!("_helios_ghost_{}", migration_id);
    execute_sql(&format!("DROP TABLE IF EXISTS {}", ghost_table)).await?;

    // 3. Drop changelog table
    let changelog_table = format!("_helios_changelog_{}", migration_id);
    execute_sql(&format!("DROP TABLE IF EXISTS {}", changelog_table)).await?;

    // 4. Update migration state
    update_migration_state(migration_id, MigrationState::RolledBack).await?;

    Ok(())
}

Time: <10 seconds Risk: None (no production data affected)

Phase 4 (VALIDATING → READY): Same as Phase 1-3

Phase 5 (CUTOVER - First 5 Minutes): Emergency Rename Swap

async fn rollback_cutover(migration_id: &str) -> Result<()> {
    // Check if within 5-minute window
    let cutover_time = get_cutover_timestamp(migration_id).await?;
    let elapsed = Utc::now() - cutover_time;
    if elapsed > Duration::minutes(5) {
        return Err(Error::RollbackWindowExpired);
    }

    // Begin transaction
    let txn = begin_transaction().await?;

    // 1. Lock both tables
    txn.execute("LOCK TABLE users, _helios_old_users_20251109").await?;

    // 2. Swap names back
    txn.execute("ALTER TABLE users RENAME TO _helios_temp_broken").await?;
    txn.execute("ALTER TABLE _helios_old_users_20251109 RENAME TO users").await?;
    txn.execute("DROP TABLE _helios_temp_broken").await?;

    // 3. Apply missed writes from changelog
    let changelog = get_cutover_changelog(migration_id).await?;
    for entry in changelog {
        apply_changelog_entry(&txn, &entry).await?;
    }

    // 4. Update metadata
    txn.execute(&format!(
        "UPDATE helios_migration_state SET status = 'ROLLED_BACK' WHERE id = '{}'",
        migration_id
    )).await?;

    txn.commit().await?;

    Ok(())
}

Time: 10-30 seconds Risk: Low (within 5-minute window) Data Loss: None (changelog replay ensures consistency)

Phase 6 (COMPLETED): No Rollback

Once migration is marked COMPLETED and old table is dropped, rollback is NOT possible. To revert schema changes:

Create a new migration with inverse operations
Execute new migration to restore old schema

Recommendation: Keep _helios_old_{table} for 24 hours before dropping.

5.3 Rollback Safety Checks

pub struct RollbackSafetyChecker {
    migration_state: MigrationState,
    cutover_timestamp: Option<DateTime<Utc>>,
    old_table_exists: bool,
    changelog_exists: bool,
}

impl RollbackSafetyChecker {
    pub fn can_rollback(&self) -> Result<RollbackFeasibility> {
        match self.migration_state {
            MigrationState::Planning |
            MigrationState::Preparing |
            MigrationState::Copying => {
                Ok(RollbackFeasibility::Safe {
                    estimated_time_seconds: 10,
                    data_loss_risk: DataLossRisk::None,
                })
            }

            MigrationState::Synchronizing |
            MigrationState::Validating |
            MigrationState::ReadyForCutover => {
                Ok(RollbackFeasibility::Safe {
                    estimated_time_seconds: 15,
                    data_loss_risk: DataLossRisk::Low,
                })
            }

            MigrationState::Cutover => {
                if let Some(cutover_time) = self.cutover_timestamp {
                    let elapsed = Utc::now() - cutover_time;
                    if elapsed <= Duration::minutes(5) {
                        Ok(RollbackFeasibility::Risky {
                            estimated_time_seconds: 30,
                            data_loss_risk: DataLossRisk::Low,
                            warnings: vec![
                                "Within emergency rollback window".to_string(),
                                "Some queries may fail during rollback".to_string(),
                            ],
                        })
                    } else {
                        Err(Error::RollbackWindowExpired)
                    }
                } else {
                    Err(Error::InvalidState)
                }
            }

            MigrationState::Completed => {
                if self.old_table_exists {
                    Ok(RollbackFeasibility::Possible {
                        estimated_time_seconds: 300,
                        data_loss_risk: DataLossRisk::High,
                        warnings: vec![
                            "Migration completed, old table still exists".to_string(),
                            "Rollback will require new migration".to_string(),
                        ],
                    })
                } else {
                    Err(Error::CannotRollback {
                        reason: "Old table dropped, rollback impossible".to_string(),
                    })
                }
            }

            _ => Err(Error::InvalidState),
        }
    }
}

6. Migration Testing Framework

6.1 Shadow Traffic Validation

Before cutover, HeliosDB can validate the ghost table using shadow traffic:

┌─────────────────────────────────────────────────────────────┐
│               Application Traffic                           │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
         ┌──────────────────────────────┐
         │  Migration Coordinator       │
         │  (Shadow Mode Enabled)       │
         └───────┬──────────────┬───────┘
                 │              │
        Primary  │              │ Shadow (read-only)
                 ▼              ▼
    ┌─────────────────┐  ┌─────────────────┐
    │  Original Table │  │   Ghost Table   │
    │  (users)        │  │  (_new_users)   │
    └─────────────────┘  └─────────────────┘
         │                      │
         │                      │
         ▼                      ▼
    ┌─────────────────┐  ┌─────────────────┐
    │  Return Result  │  │  Compare Result │
    │  to Application │  │  Log Diff       │
    └─────────────────┘  └─────────────────┘

Shadow Traffic Mode:

All read queries executed against BOTH tables
Results compared for consistency
Differences logged for analysis
Only original table results returned to application
Zero impact on application (async comparison)

Validation Metrics:

Consistency Rate: % of queries with matching results
Performance Delta: Ghost table vs original (latency, throughput)
Data Completeness: Row count, checksum comparison

6.2 Automated Testing Before Cutover

pub struct MigrationTestSuite {
    migration_id: String,
    original_table: String,
    ghost_table: String,
}

impl MigrationTestSuite {
    /// Run comprehensive pre-cutover tests
    pub async fn run_all_tests(&self) -> Result<TestReport> {
        let mut report = TestReport::new();

        // Test 1: Row count validation
        report.add_result(self.test_row_count().await?);

        // Test 2: Checksum comparison
        report.add_result(self.test_checksums().await?);

        // Test 3: Sample-based deep comparison
        report.add_result(self.test_sample_comparison(1000).await?);

        // Test 4: Index validation
        report.add_result(self.test_index_consistency().await?);

        // Test 5: Constraint validation
        report.add_result(self.test_constraints().await?);

        // Test 6: Performance comparison
        report.add_result(self.test_performance().await?);

        // Test 7: Shadow traffic validation (if enabled)
        if self.is_shadow_mode_enabled() {
            report.add_result(self.test_shadow_traffic().await?);
        }

        Ok(report)
    }

    async fn test_row_count(&self) -> Result<TestResult> {
        let original_count = query_one::<i64>(&format!(
            "SELECT COUNT(*) FROM {}", self.original_table
        )).await?;

        let ghost_count = query_one::<i64>(&format!(
            "SELECT COUNT(*) FROM {}", self.ghost_table
        )).await?;

        Ok(TestResult {
            name: "Row Count Validation".to_string(),
            passed: original_count == ghost_count,
            message: format!(
                "Original: {}, Ghost: {}",
                original_count, ghost_count
            ),
        })
    }

    async fn test_checksums(&self) -> Result<TestResult> {
        // Compute checksums for 100 chunks
        let chunk_size = self.get_table_size().await? / 100;
        let mut mismatches = Vec::new();

        for i in 0..100 {
            let offset = i * chunk_size;
            let original_checksum = self.compute_checksum(
                &self.original_table, offset, chunk_size
            ).await?;
            let ghost_checksum = self.compute_checksum(
                &self.ghost_table, offset, chunk_size
            ).await?;

            if original_checksum != ghost_checksum {
                mismatches.push(i);
            }
        }

        Ok(TestResult {
            name: "Checksum Comparison".to_string(),
            passed: mismatches.is_empty(),
            message: if mismatches.is_empty() {
                "All chunks match".to_string()
            } else {
                format!("Mismatched chunks: {:?}", mismatches)
            },
        })
    }

    async fn test_sample_comparison(&self, sample_size: usize) -> Result<TestResult> {
        // Randomly sample rows and compare
        let sample_ids = self.get_random_sample_ids(sample_size).await?;
        let mut mismatches = 0;

        for id in sample_ids {
            let original_row = self.get_row(&self.original_table, id).await?;
            let ghost_row = self.get_row(&self.ghost_table, id).await?;

            if original_row != ghost_row {
                mismatches += 1;
            }
        }

        Ok(TestResult {
            name: "Sample Comparison".to_string(),
            passed: mismatches == 0,
            message: format!(
                "Sampled {} rows, {} mismatches",
                sample_size, mismatches
            ),
        })
    }

    async fn test_performance(&self) -> Result<TestResult> {
        // Run representative queries on both tables
        let queries = vec![
            "SELECT * FROM {} WHERE id = 12345",
            "SELECT COUNT(*) FROM {} WHERE created_at > NOW() - INTERVAL '1 day'",
            "SELECT AVG(value) FROM {} GROUP BY category",
        ];

        let mut original_times = Vec::new();
        let mut ghost_times = Vec::new();

        for query_template in queries {
            let original_query = query_template.replace("{}", &self.original_table);
            let ghost_query = query_template.replace("{}", &self.ghost_table);

            let original_time = measure_query_time(&original_query).await?;
            let ghost_time = measure_query_time(&ghost_query).await?;

            original_times.push(original_time);
            ghost_times.push(ghost_time);
        }

        let avg_original = original_times.iter().sum::<f64>() / original_times.len() as f64;
        let avg_ghost = ghost_times.iter().sum::<f64>() / ghost_times.len() as f64;
        let performance_delta = ((avg_ghost - avg_original) / avg_original) * 100.0;

        // Pass if ghost table is within 20% of original performance
        let passed = performance_delta.abs() < 20.0;

        Ok(TestResult {
            name: "Performance Comparison".to_string(),
            passed,
            message: format!(
                "Original: {:.2}ms, Ghost: {:.2}ms, Delta: {:.1}%",
                avg_original, avg_ghost, performance_delta
            ),
        })
    }
}

6.3 Migration Dry Run

// Execute migration in dry-run mode
let migration = MigrationBuilder::new()
    .table("users")
    .add_column("phone", DataType::VarChar(20), nullable=true)
    .dry_run(true)  // No actual changes
    .build();

let result = migration_engine.execute(migration).await?;

// Dry run provides:
// - Estimated duration
// - Resource requirements (disk space, I/O)
// - Potential issues
// - Rollback feasibility

7. API Design

7.1 SQL Syntax

7.1.1 Start Online Migration

-- Explicit online migration
ALTER TABLE users
    ADD COLUMN phone VARCHAR(20) NULL
    WITH (MIGRATION_MODE = 'ONLINE');

-- Shorthand (online by default)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Force offline migration (faster for large tables with downtime tolerance)
ALTER TABLE users
    ADD COLUMN phone VARCHAR(20)
    WITH (MIGRATION_MODE = 'OFFLINE');

7.1.2 Migration Options

ALTER TABLE users MODIFY COLUMN user_id BIGINT
WITH (
    MIGRATION_MODE = 'ONLINE',
    CHUNK_SIZE = 10000,           -- Rows per chunk
    THROTTLE_IO = 50,              -- Max % of I/O capacity
    ENABLE_SHADOW_TRAFFIC = true,  -- Validate with shadow traffic
    ROLLBACK_RETENTION = '24h'     -- Keep old table for 24 hours
);

7.1.3 Monitor Migration

-- Show all active migrations
SHOW MIGRATIONS;

-- Show specific migration details
SHOW MIGRATION 'migration-uuid-1234';

-- Output:
-- Migration ID: migration-uuid-1234
-- Table: users
-- Operation: ADD COLUMN phone VARCHAR(20)
-- Status: COPYING
-- Progress: 67% (67M/100M rows)
-- Elapsed Time: 00:15:32
-- Estimated Remaining: 00:07:45
-- Current Phase: Bulk Copy
-- Rollback Available: Yes

7.1.4 Control Migration

-- Pause migration (useful for maintenance windows)
PAUSE MIGRATION 'migration-uuid-1234';

-- Resume paused migration
RESUME MIGRATION 'migration-uuid-1234';

-- Rollback migration
ROLLBACK MIGRATION 'migration-uuid-1234';

-- Force cutover (skip validation)
CUTOVER MIGRATION 'migration-uuid-1234' WITH (FORCE = true);

7.2 Programmatic API (Rust)

use heliosdb::migration::{MigrationEngine, MigrationBuilder, MigrationMode};

// Create migration engine
let migration_engine = MigrationEngine::new(config);

// Build migration
let migration = MigrationBuilder::new()
    .table("users")
    .add_column(ColumnDefinition {
        name: "phone".to_string(),
        data_type: DataType::VarChar(20),
        nullable: true,
        default: None,
    })
    .mode(MigrationMode::Online)
    .chunk_size(10_000)
    .throttle_io_percent(50)
    .enable_shadow_traffic(true)
    .build()?;

// Execute migration asynchronously
let migration_id = migration_engine.start(migration).await?;

// Monitor progress
let progress = migration_engine.get_progress(&migration_id).await?;
println!("Progress: {}%", progress.percent_complete);

// Pause if needed
migration_engine.pause(&migration_id).await?;

// Resume
migration_engine.resume(&migration_id).await?;

// Rollback
migration_engine.rollback(&migration_id).await?;

7.3 REST API

# Start migration
curl -X POST http://localhost:8080/api/migrations \
  -H "Content-Type: application/json" \
  -d '{
    "table": "users",
    "operation": {
      "type": "ADD_COLUMN",
      "column": {
        "name": "phone",
        "type": "VARCHAR(20)",
        "nullable": true
      }
    },
    "mode": "ONLINE",
    "options": {
      "chunk_size": 10000,
      "throttle_io_percent": 50,
      "enable_shadow_traffic": true
    }
  }'

# Response:
# {
#   "migration_id": "migration-uuid-1234",
#   "status": "PLANNING",
#   "estimated_duration_seconds": 450
# }

# Get migration status
curl http://localhost:8080/api/migrations/migration-uuid-1234

# Pause migration
curl -X POST http://localhost:8080/api/migrations/migration-uuid-1234/pause

# Resume migration
curl -X POST http://localhost:8080/api/migrations/migration-uuid-1234/resume

# Rollback migration
curl -X POST http://localhost:8080/api/migrations/migration-uuid-1234/rollback

8. Performance Analysis

8.1 Overhead During Migration

Phase	I/O Overhead	CPU Overhead	Latency Impact	Notes
Planning	0%	<1%	0ms	Metadata-only
Ghost Table Creation	~5%	~2%	0ms	One-time
Trigger Installation	0%	0%	+0.1ms/write	Small overhead
Bulk Copy	10-50%*	5-15%	0ms	Configurable throttle
Synchronization	5-10%	2-5%	+0.2ms/write	Changelog replication
Validation	20-40%	10-20%	0ms	Background process
Cutover	0%	<1%	5-10ms	Brief lock

*Throttle configurable via THROTTLE_IO parameter

8.2 Migration Duration Estimates

Assumptions:

Table size: 100M rows
Row size: 1KB average
Total data: 100GB
I/O capacity: 100MB/s
Throttle: 50% (50MB/s allocated to migration)

Phase	Duration	Calculation
Planning	<1 second	Metadata operations
Ghost Table Creation	~1 second	DDL operation
Trigger Installation	<1 second	DDL operation
Bulk Copy	~33 minutes	100GB ÷ 50MB/s = 2000s ≈ 33 min
Synchronization	~5 minutes	Catch up on incremental changes
Validation	~10 minutes	Checksum + sampling
Cutover	<10 seconds	Rename + apply final changelog
Total	~50 minutes	End-to-end

Scaling:

10M rows: ~5 minutes
1B rows: ~8 hours
10B rows: ~3 days (consider partitioned migration)

8.3 Optimization Techniques

8.3.1 Adaptive Throttling

pub struct AdaptiveThrottle {
    target_io_percent: f64,
    current_io_usage: f64,
    adjustment_factor: f64,
}

impl AdaptiveThrottle {
    pub fn adjust_chunk_size(&mut self, current_chunk_size: usize) -> usize {
        // Measure current I/O usage
        self.current_io_usage = self.measure_io_usage();

        // Adjust chunk size based on I/O headroom
        let io_headroom = self.target_io_percent - self.current_io_usage;

        if io_headroom > 10.0 {
            // Plenty of headroom, increase chunk size
            (current_chunk_size as f64 * 1.2) as usize
        } else if io_headroom < -5.0 {
            // Over budget, decrease chunk size
            (current_chunk_size as f64 * 0.8) as usize
        } else {
            // Within acceptable range
            current_chunk_size
        }
    }
}

8.3.2 Parallel Chunk Processing

For very large tables, process multiple chunks in parallel:

pub async fn parallel_copy(
    &self,
    table: &str,
    ghost_table: &str,
    total_rows: u64,
    chunk_size: usize,
    parallelism: usize,
) -> Result<()> {
    let num_chunks = (total_rows as usize + chunk_size - 1) / chunk_size;

    // Create channel for chunk IDs
    let (tx, rx) = mpsc::channel(num_chunks);

    // Send all chunk IDs
    for chunk_id in 0..num_chunks {
        tx.send(chunk_id).await?;
    }
    drop(tx);

    // Spawn worker tasks
    let mut handles = Vec::new();
    for worker_id in 0..parallelism {
        let rx = rx.clone();
        let table = table.to_string();
        let ghost_table = ghost_table.to_string();

        let handle = tokio::spawn(async move {
            while let Ok(chunk_id) = rx.recv().await {
                let offset = chunk_id * chunk_size;
                copy_chunk(&table, &ghost_table, offset, chunk_size).await?;
            }
            Ok::<(), Error>(())
        });

        handles.push(handle);
    }

    // Wait for all workers
    for handle in handles {
        handle.await??;
    }

    Ok(())
}

Performance: 2-4x faster for large tables (I/O bound)

8.3.3 Compression During Copy

// Compress data during copy to reduce I/O
INSERT INTO _helios_ghost_users_20251109
SELECT
    id,
    name,
    COMPRESS(large_text_field) as large_text_field
FROM users
WHERE id BETWEEN ? AND ?;

Benefit: 30-50% faster for tables with compressible data

9. Failure Modes & Recovery

9.1 Failure Scenarios

Failure	Phase	Impact	Recovery
Process Crash	Any	Migration paused	Resume from last checkpoint
Disk Full	Copying	Migration paused	Free disk space, resume
Network Partition	Copying	Partial data loss	Resync from changelog
Validation Failure	Validating	Migration failed	Rollback, investigate
Cutover Timeout	Cutover	Partial cutover	Emergency rollback
Trigger Error	Synchronizing	Data inconsistency	Pause, resync, resume

9.2 Recovery Strategies

9.2.1 Checkpoint-Based Resume

pub struct MigrationCheckpoint {
    migration_id: String,
    last_copied_pk: i64,        // Last primary key copied
    last_applied_lsn: i64,       // Last LSN applied from changelog
    rows_copied: u64,
    timestamp: DateTime<Utc>,
}

impl MigrationEngine {
    pub async fn resume_from_checkpoint(&self, migration_id: &str) -> Result<()> {
        // Load checkpoint
        let checkpoint = self.load_checkpoint(migration_id).await?;

        // Resume from last position
        match self.get_migration_state(migration_id).await? {
            MigrationState::Copying => {
                // Resume bulk copy from last_copied_pk
                self.resume_bulk_copy(migration_id, checkpoint.last_copied_pk).await?;
            }
            MigrationState::Synchronizing => {
                // Resume changelog application from last_applied_lsn
                self.resume_changelog_sync(migration_id, checkpoint.last_applied_lsn).await?;
            }
            _ => {
                return Err(Error::InvalidState);
            }
        }

        Ok(())
    }
}

9.2.2 Automatic Retry with Exponential Backoff

pub async fn copy_chunk_with_retry(
    &self,
    table: &str,
    ghost_table: &str,
    offset: u64,
    chunk_size: usize,
) -> Result<()> {
    let mut retries = 0;
    let max_retries = 5;

    loop {
        match self.copy_chunk(table, ghost_table, offset, chunk_size).await {
            Ok(_) => return Ok(()),
            Err(e) if retries < max_retries => {
                let backoff = Duration::from_secs(2_u64.pow(retries));
                warn!("Chunk copy failed, retrying in {:?}: {}", backoff, e);
                tokio::time::sleep(backoff).await;
                retries += 1;
            }
            Err(e) => {
                error!("Chunk copy failed after {} retries: {}", retries, e);
                return Err(e);
            }
        }
    }
}

9.2.3 Health Checks

pub struct MigrationHealthChecker {
    migration_id: String,
}

impl MigrationHealthChecker {
    pub async fn run_health_check(&self) -> HealthStatus {
        let mut issues = Vec::new();

        // Check disk space
        if self.available_disk_space().await < self.estimated_space_needed().await {
            issues.push(HealthIssue::InsufficientDisk);
        }

        // Check trigger status
        if !self.triggers_active().await {
            issues.push(HealthIssue::TriggersInactive);
        }

        // Check changelog lag
        let lag = self.changelog_lag().await;
        if lag > Duration::from_secs(300) {
            issues.push(HealthIssue::ChangelogLagging);
        }

        // Check data consistency
        if let Some(drift) = self.detect_data_drift().await {
            issues.push(HealthIssue::DataDrift(drift));
        }

        if issues.is_empty() {
            HealthStatus::Healthy
        } else {
            HealthStatus::Unhealthy(issues)
        }
    }
}

10. Integration with HeliosDB

10.1 Schema Management Integration

impl SchemaService {
    /// Apply schema change with zero-downtime migration
    pub async fn apply_schema_change(
        &self,
        table: &str,
        change: SchemaChange,
        options: MigrationOptions,
    ) -> Result<MigrationId> {
        // Check if online migration is needed
        let requires_migration = self.requires_migration(&change)?;

        if requires_migration {
            // Use zero-downtime migration
            let migration_engine = self.get_migration_engine();
            let migration = MigrationBuilder::new()
                .table(table)
                .change(change)
                .options(options)
                .build()?;

            migration_engine.start(migration).await
        } else {
            // Fast path: metadata-only change
            self.apply_metadata_change(table, change).await
        }
    }
}

10.2 Transaction Manager Integration

impl TransactionManager {
    /// Route query to appropriate table during migration
    pub async fn execute_query(
        &self,
        query: &Query,
    ) -> Result<QueryResult> {
        let table = query.table();

        // Check if migration in progress
        if let Some(migration) = self.get_active_migration(table).await? {
            match migration.state {
                MigrationState::Copying |
                MigrationState::Synchronizing |
                MigrationState::Validating => {
                    // Route to original table
                    self.execute_on_table(table, query).await
                }

                MigrationState::Cutover => {
                    // Brief lock during cutover
                    // Wait for cutover to complete
                    self.wait_for_cutover(&migration).await?;
                    self.execute_on_table(table, query).await
                }

                _ => {
                    self.execute_on_table(table, query).await
                }
            }
        } else {
            // No migration, execute normally
            self.execute_on_table(table, query).await
        }
    }
}

10.3 Storage Engine Integration

impl LsmStorage {
    /// Support for ghost table operations
    pub async fn create_ghost_table(
        &self,
        original_table: &str,
        new_schema: &Schema,
    ) -> Result<String> {
        let ghost_table_name = format!(
            "_helios_ghost_{}_{}",
            original_table,
            Utc::now().timestamp()
        );

        // Create new LSM tree with new schema
        self.create_table(&ghost_table_name, new_schema).await?;

        Ok(ghost_table_name)
    }

    /// Atomic table rename for cutover
    pub async fn atomic_rename(
        &self,
        old_name: &str,
        new_name: &str,
    ) -> Result<()> {
        // Use storage-level atomic rename
        // This is a metadata operation only
        self.metadata_store.rename_table(old_name, new_name).await
    }
}

10.4 Multi-Model Support

The migration system works across all HeliosDB data models:

10.4.1 Document Model

// Migrate document collection schema
ALTER COLLECTION users ADD FIELD phone STRING NULL;

// Internally uses ghost collection approach
// - Create ghost collection with new schema
// - Copy documents with new field (set to null)
// - Sync incremental changes
// - Atomic cutover

10.4.2 Graph Model

// Migrate vertex schema
ALTER VERTEX User ADD PROPERTY phone STRING NULL;

// Ghost vertex table approach
// - Create ghost vertex table
// - Copy vertices with new property
// - Preserve edges (reference by ID)
// - Atomic cutover

10.4.3 Time-Series Model

// Migrate time-series metric schema
ALTER METRIC cpu_usage ADD TAG hostname STRING;

// Special handling for time-series:
// - Only apply to new data (forward-only)
// - Historical data remains unchanged
// - No ghost table needed

11. Edge Cases & Solutions

11.1 Large Tables (>1TB)

Challenge: Single migration takes too long (days)

Solution: Partitioned Migration

// Migrate table partition by partition
for partition in table.partitions() {
    let migration = MigrationBuilder::new()
        .table(table.name())
        .partition(partition.id())
        .change(change.clone())
        .build()?;

    migration_engine.start(migration).await?;
}

// Cutover all partitions atomically
migration_engine.cutover_all(table.name()).await?;

11.2 Complex Constraints

Challenge: Foreign key constraints between tables

Solution: Constraint-Aware Migration

// Detect foreign key constraints
let constraints = self.get_foreign_keys(table)?;

for constraint in constraints {
    // Temporarily disable foreign key checks during migration
    self.disable_constraint(&constraint)?;
}

// Perform migration
migration_engine.start(migration).await?;

// Re-enable and validate constraints
for constraint in constraints {
    self.enable_constraint(&constraint)?;
    self.validate_constraint(&constraint)?;
}

11.3 Hot Partitions

Challenge: Some partitions have very high write rate

Solution: Intelligent Scheduling

pub struct PartitionMigrationScheduler {
    partitions: Vec<Partition>,
}

impl PartitionMigrationScheduler {
    pub fn schedule(&self) -> Vec<MigrationTask> {
        // Measure write rate for each partition
        let write_rates = self.measure_write_rates();

        // Schedule migrations during low-traffic periods
        let mut tasks = Vec::new();

        for partition in &self.partitions {
            let low_traffic_window = self.find_low_traffic_window(partition);
            tasks.push(MigrationTask {
                partition: partition.clone(),
                scheduled_start: low_traffic_window.start,
                estimated_duration: self.estimate_duration(partition),
            });
        }

        tasks
    }
}

11.4 Schema Evolution Conflicts

Challenge: Multiple migrations on same table

Solution: Migration Queue

pub struct MigrationQueue {
    pending: VecDeque<Migration>,
    active: Option<Migration>,
}

impl MigrationQueue {
    pub async fn enqueue(&mut self, migration: Migration) -> Result<()> {
        // Check for conflicts
        if let Some(active) = &self.active {
            if active.table == migration.table {
                // Queue migration for later
                self.pending.push_back(migration);
                return Ok(());
            }
        }

        // No conflict, start immediately
        self.start(migration).await
    }

    pub async fn on_migration_complete(&mut self, migration_id: &str) -> Result<()> {
        self.active = None;

        // Start next queued migration
        if let Some(next) = self.pending.pop_front() {
            self.start(next).await?;
        }

        Ok(())
    }
}

12. Testing Strategy

12.1 Unit Tests

#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_ghost_table_creation() {
        let engine = MigrationEngine::new(config);
        let ghost_table = engine.create_ghost_table("users", new_schema).await.unwrap();
        assert!(ghost_table.starts_with("_helios_ghost_users_"));
    }

    #[tokio::test]
    async fn test_chunked_copy() {
        let engine = MigrationEngine::new(config);
        engine.copy_chunk("users", ghost_table, 0, 10_000).await.unwrap();
        let count = query_one::<i64>(&format!("SELECT COUNT(*) FROM {}", ghost_table)).await.unwrap();
        assert_eq!(count, 10_000);
    }

    #[tokio::test]
    async fn test_rollback_early_phase() {
        let engine = MigrationEngine::new(config);
        let migration_id = engine.start(migration).await.unwrap();
        engine.rollback(&migration_id).await.unwrap();
        // Verify ghost table is dropped
        assert!(!table_exists(&ghost_table_name));
    }
}

12.2 Integration Tests

#[tokio::test]
async fn test_end_to_end_migration() {
    // Setup: Create table with 1M rows
    let db = TestDatabase::new().await;
    db.execute("CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(255))").await.unwrap();
    db.insert_rows("users", 1_000_000).await.unwrap();

    // Start migration: Add column
    let migration_id = db.migrate(
        "ALTER TABLE users ADD COLUMN email VARCHAR(255) NULL"
    ).await.unwrap();

    // Wait for completion
    db.wait_for_migration(&migration_id).await.unwrap();

    // Verify new schema
    let schema = db.get_schema("users").await.unwrap();
    assert!(schema.has_column("email"));

    // Verify data integrity
    let count = db.query_one::<i64>("SELECT COUNT(*) FROM users").await.unwrap();
    assert_eq!(count, 1_000_000);
}

#[tokio::test]
async fn test_migration_with_live_traffic() {
    let db = TestDatabase::new().await;
    db.execute("CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(255))").await.unwrap();
    db.insert_rows("users", 100_000).await.unwrap();

    // Start migration
    let migration_id = db.migrate(
        "ALTER TABLE users ADD COLUMN email VARCHAR(255) NULL"
    ).await.unwrap();

    // Simulate live traffic during migration
    let write_handle = tokio::spawn(async move {
        for i in 100_000..110_000 {
            db.execute(&format!(
                "INSERT INTO users (id, name) VALUES ({}, 'user_{}')", i, i
            )).await.unwrap();
            tokio::time::sleep(Duration::from_millis(10)).await;
        }
    });

    // Wait for both migration and writes
    db.wait_for_migration(&migration_id).await.unwrap();
    write_handle.await.unwrap();

    // Verify all writes are present
    let count = db.query_one::<i64>("SELECT COUNT(*) FROM users").await.unwrap();
    assert_eq!(count, 110_000);
}

12.3 Chaos Engineering Tests

#[tokio::test]
async fn test_migration_resilience_to_failures() {
    let db = TestDatabase::new().await;
    db.insert_rows("users", 1_000_000).await.unwrap();

    // Start migration
    let migration_id = db.migrate(
        "ALTER TABLE users ADD COLUMN email VARCHAR(255) NULL"
    ).await.unwrap();

    // Wait until 50% complete
    db.wait_for_progress(&migration_id, 50).await.unwrap();

    // Simulate process crash
    db.crash_and_restart().await;

    // Resume migration
    db.resume_migration(&migration_id).await.unwrap();

    // Verify completion
    db.wait_for_migration(&migration_id).await.unwrap();
    let count = db.query_one::<i64>("SELECT COUNT(*) FROM users").await.unwrap();
    assert_eq!(count, 1_000_000);
}

#[tokio::test]
async fn test_rollback_during_cutover() {
    let db = TestDatabase::new().await;
    db.insert_rows("users", 100_000).await.unwrap();

    let migration_id = db.migrate(
        "ALTER TABLE users MODIFY COLUMN id BIGINT"
    ).await.unwrap();

    // Wait for cutover
    db.wait_for_state(&migration_id, MigrationState::Cutover).await.unwrap();

    // Trigger emergency rollback
    db.rollback_migration(&migration_id).await.unwrap();

    // Verify original schema restored
    let schema = db.get_schema("users").await.unwrap();
    assert_eq!(schema.get_column("id").data_type, DataType::Int);
}

13. Monitoring & Observability

13.1 Metrics

pub struct MigrationMetrics {
    // Progress metrics
    pub rows_copied: Counter,
    pub rows_remaining: Gauge,
    pub percent_complete: Gauge,

    // Performance metrics
    pub copy_throughput_rows_per_sec: Gauge,
    pub copy_throughput_bytes_per_sec: Gauge,
    pub io_wait_time_ms: Histogram,

    // Resource metrics
    pub disk_space_used_bytes: Gauge,
    pub cpu_usage_percent: Gauge,
    pub memory_usage_bytes: Gauge,

    // Reliability metrics
    pub retries: Counter,
    pub errors: Counter,
    pub checkpoint_saves: Counter,

    // Validation metrics
    pub validation_mismatches: Counter,
    pub checksum_failures: Counter,
}

impl MigrationMetrics {
    pub fn export_prometheus(&self) -> String {
        // Export metrics in Prometheus format
        format!(
            "# HELP heliosdb_migration_progress Migration progress percentage\n\
             # TYPE heliosdb_migration_progress gauge\n\
             heliosdb_migration_progress{{migration_id=\"{}\"}} {}\n\
             \n\
             # HELP heliosdb_migration_throughput Copy throughput in rows/sec\n\
             # TYPE heliosdb_migration_throughput gauge\n\
             heliosdb_migration_throughput{{migration_id=\"{}\"}} {}\n",
            self.migration_id,
            self.percent_complete.get(),
            self.migration_id,
            self.copy_throughput_rows_per_sec.get()
        )
    }
}

13.2 Logging

// Structured logging for migrations
info!(
    migration_id = %migration_id,
    table = %table,
    operation = %operation,
    state = ?state,
    progress_percent = progress,
    "Migration progress update"
);

warn!(
    migration_id = %migration_id,
    chunk_id = chunk_id,
    error = %err,
    "Chunk copy failed, retrying"
);

error!(
    migration_id = %migration_id,
    reason = %reason,
    "Migration failed, initiating rollback"
);

13.3 Dashboards

Recommended Grafana Dashboard Panels:

Migration Progress
- % Complete gauge
- Estimated time remaining
- Rows copied / total rows
Throughput
- Rows/sec line chart
- MB/sec line chart
Resource Usage
- CPU % line chart
- Memory usage line chart
- Disk I/O line chart
Health
- Error rate counter
- Retry rate counter
- Validation failures counter

14. Future Enhancements

14.1 AI-Driven Migration Optimization

Vision: Use ML to predict optimal migration parameters

pub struct AIOptimizer {
    model: MigrationOptimizationModel,
}

impl AIOptimizer {
    pub fn predict_optimal_params(
        &self,
        table_stats: &TableStats,
        workload_pattern: &WorkloadPattern,
    ) -> MigrationParams {
        // ML model predicts:
        // - Optimal chunk size
        // - Best time to start migration
        // - Expected duration
        // - Risk of performance impact

        let features = self.extract_features(table_stats, workload_pattern);
        let prediction = self.model.predict(features);

        MigrationParams {
            chunk_size: prediction.chunk_size,
            throttle_io_percent: prediction.throttle,
            recommended_start_time: prediction.best_start_time,
            estimated_duration: prediction.duration,
        }
    }
}

14.2 Cross-Region Migration

Vision: Migrate schemas across geographically distributed clusters

pub struct CrossRegionMigration {
    source_region: Region,
    target_regions: Vec<Region>,
}

impl CrossRegionMigration {
    pub async fn execute(&self, migration: Migration) -> Result<()> {
        // 1. Migrate in source region first
        let source_migration_id = self.migrate_region(
            &self.source_region,
            migration.clone()
        ).await?;

        // 2. Replicate to target regions
        for region in &self.target_regions {
            self.replicate_migration(region, &source_migration_id).await?;
        }

        // 3. Coordinated cutover across all regions
        self.global_cutover(migration.table()).await?;

        Ok(())
    }
}

14.3 Blue-Green Schema Deployment

Vision: Deploy schema changes with instant rollback capability

// Deploy new schema as "green" version
let green_schema = db.deploy_schema_version("users", "v2", new_schema).await?;

// Route 10% of traffic to green
db.route_traffic("users", blue=90, green=10).await?;

// Monitor for issues
if green_schema.has_errors() {
    // Instant rollback
    db.route_traffic("users", blue=100, green=0).await?;
} else {
    // Gradually shift traffic
    db.route_traffic("users", blue=0, green=100).await?;
}

15. Conclusion

The Zero-Downtime Migration Architecture enables HeliosDB to perform schema changes on live production systems without downtime, data loss, or performance degradation. Key advantages:

15.1 Benefits

Zero Downtime: Applications continue operating during migrations
Zero Data Loss: Transaction-consistent migration with validation
Rollback Safety: One-click rollback with multiple safety checkpoints
Performance Control: Throttling prevents overwhelming production systems
Multi-Model Support: Works across relational, document, graph, and time-series models
Distributed Ready: Designed for multi-region, multi-tenant deployments

15.2 Competitive Advantage

vs gh-ost: Native distributed support, multi-model, AI optimization
vs pt-online-schema-change: Better rollback, shadow traffic validation
vs AWS DMS: Zero-downtime for schema changes (not just data migration)
vs Liquibase/Flyway: Online execution, not just versioning

15.3 Production Readiness

This architecture is designed for Phase 2 implementation with clear deliverables:

Month 1: Core migration engine (ghost table, triggers, copy)
Month 2: Validation framework, rollback mechanisms
Month 3: Performance optimization, testing, documentation

Estimated Development: 3 months, 2 senior engineers Investment: $450K ROI: Essential for enterprise adoption, enables $50M+ ARR

Appendix A: Core Implementation Skeleton

pub struct MigrationEngine {
    state_store: Arc<MigrationStateStore>,
    ghost_table_manager: Arc<GhostTableManager>,
    copy_engine: Arc<CopyEngine>,
    trigger_manager: Arc<TriggerManager>,
    validator: Arc<ValidatorEngine>,
    orchestrator: Arc<MigrationOrchestrator>,
}

impl MigrationEngine {
    pub async fn start(&self, migration: Migration) -> Result<MigrationId> {
        // 1. Validate migration
        self.validate(&migration).await?;

        // 2. Create migration record
        let migration_id = self.state_store.create(migration).await?;

        // 3. Start orchestration
        self.orchestrator.execute(migration_id).await?;

        Ok(migration_id)
    }

    pub async fn get_progress(&self, migration_id: &MigrationId) -> Result<Progress> {
        self.state_store.get_progress(migration_id).await
    }

    pub async fn rollback(&self, migration_id: &MigrationId) -> Result<()> {
        self.orchestrator.rollback(migration_id).await
    }
}

pub struct MigrationOrchestrator {
    state_machine: Arc<StateMachine>,
}

impl MigrationOrchestrator {
    pub async fn execute(&self, migration_id: MigrationId) -> Result<()> {
        loop {
            let state = self.state_machine.current_state(&migration_id).await?;

            match state {
                State::Planning => self.plan(&migration_id).await?,
                State::Preparing => self.prepare(&migration_id).await?,
                State::Copying => self.copy(&migration_id).await?,
                State::Synchronizing => self.synchronize(&migration_id).await?,
                State::Validating => self.validate(&migration_id).await?,
                State::ReadyForCutover => break,
                State::Failed => return Err(Error::MigrationFailed),
                _ => return Err(Error::InvalidState),
            }
        }

        // Wait for manual cutover or auto-cutover
        Ok(())
    }
}

End of Document

Next Steps:

Review with engineering team
Prototype core components (ghost table, triggers)
Define detailed API specifications
Create implementation plan for Phase 2