Skip to content

Zero-Downtime Migration - Quick Reference

Zero-Downtime Migration - Quick Reference

Last Updated: November 9, 2025 For: Developers, DBAs, DevOps Engineers See Full Architecture: ZERO_DOWNTIME_MIGRATION_ARCHITECTURE.md


Quick Start

Start a Migration

-- Add nullable column (fast, metadata-only)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Add non-null column (full migration)
ALTER TABLE users ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW();
-- Modify column type
ALTER TABLE users MODIFY COLUMN user_id BIGINT;
-- Add index (concurrent build)
CREATE INDEX idx_users_email ON users(email);
-- With migration options
ALTER TABLE users ADD COLUMN phone VARCHAR(20)
WITH (
MIGRATION_MODE = 'ONLINE', -- ONLINE, OFFLINE, or HYBRID
CHUNK_SIZE = 10000, -- Rows per chunk
THROTTLE_IO = 50, -- Max % of I/O capacity
ENABLE_SHADOW_TRAFFIC = true -- Validate with shadow traffic
);

Monitor Migration

-- Show all active migrations
SHOW MIGRATIONS;
-- Show specific migration
SHOW MIGRATION 'migration-uuid-1234';
-- Output:
-- Migration ID: migration-uuid-1234
-- Table: users
-- Operation: ADD COLUMN phone VARCHAR(20)
-- Status: COPYING
-- Progress: 67% (67M/100M rows)
-- Elapsed: 00:15:32
-- Remaining: 00:07:45
-- Rollback: Available

Control Migration

-- Pause migration (for maintenance)
PAUSE MIGRATION 'migration-uuid-1234';
-- Resume paused migration
RESUME MIGRATION 'migration-uuid-1234';
-- Rollback migration (if needed)
ROLLBACK MIGRATION 'migration-uuid-1234';
-- Force immediate cutover (skip validation)
CUTOVER MIGRATION 'migration-uuid-1234' WITH (FORCE = true);

Migration Lifecycle

CREATED → PLANNING → PREPARING → COPYING → SYNCHRONIZING →
VALIDATING → READY → CUTOVER → COMPLETED
Rollback available: *
(5 min) ❌
*Rollback loses incremental progress

State Descriptions

StateWhat’s HappeningCan Rollback?Est. Time
PLANNINGValidating schema changeInstant<1s
PREPARINGCreating ghost table<10s<1s
COPYINGBulk copying dataFast~30 min (100M rows)
SYNCHRONIZINGApplying incremental changesFast~5 min
VALIDATINGChecksums, row countsFast~10 min
READYWaiting for cutover approvalFastManual
CUTOVERAtomic rename swap5 min window<10s
COMPLETEDDone, old table kept 24h❌ No*-

*After completion, create new migration to revert schema


Supported Operations

Fast Operations (Metadata-Only)

OperationDowntimeExample
ADD COLUMN (nullable)0msALTER TABLE users ADD COLUMN phone VARCHAR(20);
DROP COLUMN~msALTER TABLE users DROP COLUMN legacy_field;
RENAME COLUMN0msALTER TABLE users RENAME COLUMN email TO email_address;
DROP INDEX0msDROP INDEX idx_users_email;
DROP FOREIGN KEY0msALTER TABLE orders DROP CONSTRAINT fk_user_id;

Moderate Operations (Ghost Table)

OperationDowntimeData CopyExample
ADD COLUMN (NOT NULL)~msYesALTER TABLE users ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW();
MODIFY COLUMN (compatible)~msYesALTER TABLE users MODIFY COLUMN user_id BIGINT;
ADD INDEX0msNo*CREATE INDEX idx_users_email ON users(email);
ADD UNIQUE CONSTRAINT~msNo*ALTER TABLE users ADD CONSTRAINT uk_email UNIQUE(email);

*Index built concurrently during migration

Complex Operations (Full Table Rewrite)

OperationDowntimeData CopyExample
MODIFY COLUMN (incompatible)~msYesALTER TABLE users MODIFY COLUMN age VARCHAR(10);
ADD PRIMARY KEY~msYesALTER TABLE users ADD PRIMARY KEY (id);
DROP PRIMARY KEY~msYesALTER TABLE users DROP PRIMARY KEY;

Performance Expectations

Throughput (100M rows, 100GB table)

PhaseDurationThroughputNotes
Planning<1s-Metadata validation
Ghost Table Creation<1s-DDL operation
Bulk Copy~33 min50 MB/sThrottle=50%, 10K chunk size
Synchronization~5 min-Catch up on changes
Validation~10 min-Checksums, sampling
Cutover<10s-Atomic rename
Total~50 min-End-to-end

Overhead on Production Traffic

PhaseI/O OverheadCPU OverheadLatency Impact
Triggers5-10%2-5%+0.1ms/write
Bulk Copy10-50%*5-15%0ms (reads unaffected)
Validation20-40%10-20%0ms (background)
Cutover0%<1%5-10ms (brief lock)

*Configurable via THROTTLE_IO parameter

Scaling

Table SizeEst. DurationRecommendation
10M rows (10GB)~5 minStandard migration
100M rows (100GB)~50 minStandard migration
1B rows (1TB)~8 hoursOff-peak hours
10B rows (10TB)~3 daysPartitioned migration

Rollback Guide

When to Rollback

  • ❌ Validation failed (data inconsistency detected)
  • ❌ Performance degradation observed
  • ❌ Application errors after cutover
  • ❌ Unexpected downtime during migration
  • Change of plans (no longer needed)

How to Rollback

-- Immediate rollback
ROLLBACK MIGRATION 'migration-uuid-1234';
-- Check if rollback is safe
SELECT rollback_feasibility FROM helios_migrations
WHERE migration_id = 'migration-uuid-1234';
-- Possible values:
-- 'SAFE' - No data loss, <10s rollback
-- 'RISKY' - Within 5-min cutover window, some queries may fail
-- 'IMPOSSIBLE' - Old table dropped, must create new migration

Rollback Time Estimates

State When Rolled BackTimeRisk
PLANNING → PREPARING<10sNone
COPYING → SYNCHRONIZING<15sLow (loses copy progress)
VALIDATING → READY<15sNone
CUTOVER (first 5 min)<30sLow (brief interruption)
COMPLETEDN/AImpossible*

*Create inverse migration to revert schema


Common Scenarios

Scenario 1: Add Optional Column

Goal: Add phone column to 100M row users table.

-- Step 1: Start migration
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Step 2: Monitor (optional)
SHOW MIGRATION '<migration-id>';
-- Step 3: Wait for completion (or it completes automatically)
-- Duration: ~1 second (metadata-only for nullable columns)

Rollback: Instant, no data loss.


Scenario 2: Add Required Column with Default

Goal: Add mandatory created_at column with default value.

-- Step 1: Start migration
ALTER TABLE users
ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW();
-- Step 2: Monitor progress
SHOW MIGRATIONS;
-- Step 3: Let it complete automatically
-- Duration: ~50 minutes for 100M rows
-- Step 4: Verify
SELECT COUNT(*) FROM users WHERE created_at IS NOT NULL;

Rollback: Available until COMPLETED state (~50 min window).


Scenario 3: Change Column Type

Goal: Change user_id from INT to BIGINT (approaching 2B limit).

-- Step 1: Start migration with throttling (off-peak hours)
ALTER TABLE users MODIFY COLUMN user_id BIGINT
WITH (
THROTTLE_IO = 30, -- Use only 30% I/O during business hours
ENABLE_SHADOW_TRAFFIC = true -- Validate performance
);
-- Step 2: Monitor shadow traffic validation
SELECT * FROM helios_migration_shadow_stats
WHERE migration_id = '<migration-id>';
-- Step 3: Review validation results
-- - Consistency: 100%
-- - Performance delta: +2.3%
-- - Ready for cutover
-- Step 4: Manual cutover (if auto-cutover disabled)
CUTOVER MIGRATION '<migration-id>';

Rollback: Available until cutover completes.


Scenario 4: Add Index on Large Table

Goal: Add index on email column (100M rows).

-- Step 1: Create index concurrently (no ghost table needed)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- Step 2: Monitor progress
SHOW MIGRATION '<migration-id>';
-- Progress: 67% (67M/100M rows indexed)
-- Estimated remaining: 00:05:30
-- Step 3: Completion (automatic)
-- Duration: ~15 minutes for 100M rows

Rollback: Drop index if needed.


Scenario 5: Emergency Rollback During Cutover

Goal: Rollback migration that just completed cutover due to unexpected issue.

-- Within 5 minutes of cutover:
ROLLBACK MIGRATION 'migration-uuid-1234';
-- HeliosDB performs:
-- 1. Swap table names back
-- 2. Replay missed writes from changelog
-- 3. Restore original state
-- Duration: ~30 seconds

Warning: After 5 minutes, rollback is no longer safe. Create new migration to revert schema.


Troubleshooting

Migration Stuck in COPYING

Symptoms:

  • Progress not advancing
  • Elapsed time exceeds estimate

Diagnosis:

SELECT * FROM helios_migration_diagnostics
WHERE migration_id = '<migration-id>';

Solutions:

  1. Disk I/O bottleneck: Increase THROTTLE_IO to use more capacity
  2. Lock contention: Check for long-running transactions blocking migration
  3. Network partition: Check replication lag between nodes

Validation Failed

Symptoms:

ERROR: Migration validation failed
Row count mismatch: original=100000000, ghost=99999987

Diagnosis:

SELECT * FROM helios_migration_validation_details
WHERE migration_id = '<migration-id>';

Solutions:

  1. Trigger malfunction: Check trigger status
  2. Changelog overflow: Increase changelog retention
  3. Replication lag: Wait for synchronization to catch up

Out of Disk Space

Symptoms:

ERROR: Insufficient disk space for ghost table
Required: 120 GB, Available: 85 GB

Solutions:

  1. Pause migration: PAUSE MIGRATION '<migration-id>';
  2. Free disk space: Clean up old backups, logs
  3. Resume migration: RESUME MIGRATION '<migration-id>';

Best Practices

Planning

  1. Test in staging first: Validate schema change works as expected
  2. Check disk space: Ensure 2x table size is available
  3. Review migration plan: Use EXPLAIN MIGRATION to see execution plan
  4. Schedule off-peak: Large migrations during low-traffic periods
  5. Enable shadow traffic: Catch performance issues before cutover

Execution

  1. Monitor actively: Check progress every 5-10 minutes
  2. Watch application metrics: Latency, error rate, throughput
  3. Throttle conservatively: Start with 30-50% I/O, increase if safe
  4. Keep old table: Don’t drop until 24h after successful cutover
  5. Document changes: Update schema documentation

Rollback

  1. Validate before cutover: Always run validation phase
  2. Have rollback plan: Know when and how to rollback
  3. Test rollback in staging: Practice emergency rollback
  4. Monitor post-cutover: Watch for 5 minutes after cutover
  5. Communicate status: Keep stakeholders informed

API Reference (Rust)

use heliosdb::migration::{MigrationEngine, MigrationBuilder, MigrationMode};
// Create migration engine
let migration_engine = MigrationEngine::new(config);
// Build migration
let migration = MigrationBuilder::new()
.table("users")
.add_column(ColumnDefinition {
name: "phone".to_string(),
data_type: DataType::VarChar(20),
nullable: true,
default: None,
})
.mode(MigrationMode::Online)
.chunk_size(10_000)
.throttle_io_percent(50)
.enable_shadow_traffic(true)
.build()?;
// Execute migration
let migration_id = migration_engine.start(migration).await?;
// Monitor progress
let progress = migration_engine.get_progress(&migration_id).await?;
println!("Progress: {}%", progress.percent_complete);
// Pause/Resume
migration_engine.pause(&migration_id).await?;
migration_engine.resume(&migration_id).await?;
// Rollback
migration_engine.rollback(&migration_id).await?;

REST API Examples

Terminal window
# Start migration
curl -X POST http://localhost:8080/api/migrations \
-H "Content-Type: application/json" \
-d '{
"table": "users",
"operation": {
"type": "ADD_COLUMN",
"column": {
"name": "phone",
"type": "VARCHAR(20)",
"nullable": true
}
},
"mode": "ONLINE"
}'
# Get status
curl http://localhost:8080/api/migrations/{migration_id}
# Pause
curl -X POST http://localhost:8080/api/migrations/{migration_id}/pause
# Resume
curl -X POST http://localhost:8080/api/migrations/{migration_id}/resume
# Rollback
curl -X POST http://localhost:8080/api/migrations/{migration_id}/rollback

FAQ

Q: Can I run multiple migrations simultaneously?

A: No, only one migration per table at a time. Migrations on different tables can run in parallel.

Q: What happens to foreign keys during migration?

A: Foreign keys are temporarily disabled during migration and re-enabled/validated after cutover.

Q: Can I cancel a migration?

A: Yes, rollback is available in all states except COMPLETED (after old table is dropped).

Q: How much disk space is needed?

A: Approximately 2x the table size (original + ghost table). After cutover, original table is kept for 24h then dropped.

Q: What about triggers on the original table?

A: User-defined triggers are copied to the ghost table. Migration triggers are separate and removed after completion.

Q: Can I modify the schema during migration?

A: No, the schema is locked during migration. Queue additional changes to run after completion.

Q: How do I migrate a partitioned table?

A: Migrate each partition separately, or use the PARTITIONED_MIGRATION mode to migrate all at once with coordinated cutover.

Q: What if my application caches schema metadata?

A: Refresh schema cache after migration completion. HeliosDB sends schema change notifications to clients.


Support

  • Documentation: Full Architecture
  • Troubleshooting: Check logs in helios_migration_logs table
  • Monitoring: Metrics exposed at /metrics endpoint (Prometheus format)
  • Slack: #heliosdb-migrations

Last Updated: November 9, 2025 Version: 1.0 Status: Phase 2 Architecture