Zero-Downtime Migration - Quick Reference
Zero-Downtime Migration - Quick Reference
Last Updated: November 9, 2025 For: Developers, DBAs, DevOps Engineers See Full Architecture: ZERO_DOWNTIME_MIGRATION_ARCHITECTURE.md
Quick Start
Start a Migration
-- Add nullable column (fast, metadata-only)ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Add non-null column (full migration)ALTER TABLE users ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW();
-- Modify column typeALTER TABLE users MODIFY COLUMN user_id BIGINT;
-- Add index (concurrent build)CREATE INDEX idx_users_email ON users(email);
-- With migration optionsALTER TABLE users ADD COLUMN phone VARCHAR(20)WITH ( MIGRATION_MODE = 'ONLINE', -- ONLINE, OFFLINE, or HYBRID CHUNK_SIZE = 10000, -- Rows per chunk THROTTLE_IO = 50, -- Max % of I/O capacity ENABLE_SHADOW_TRAFFIC = true -- Validate with shadow traffic);Monitor Migration
-- Show all active migrationsSHOW MIGRATIONS;
-- Show specific migrationSHOW MIGRATION 'migration-uuid-1234';
-- Output:-- Migration ID: migration-uuid-1234-- Table: users-- Operation: ADD COLUMN phone VARCHAR(20)-- Status: COPYING-- Progress: 67% (67M/100M rows)-- Elapsed: 00:15:32-- Remaining: 00:07:45-- Rollback: AvailableControl Migration
-- Pause migration (for maintenance)PAUSE MIGRATION 'migration-uuid-1234';
-- Resume paused migrationRESUME MIGRATION 'migration-uuid-1234';
-- Rollback migration (if needed)ROLLBACK MIGRATION 'migration-uuid-1234';
-- Force immediate cutover (skip validation)CUTOVER MIGRATION 'migration-uuid-1234' WITH (FORCE = true);Migration Lifecycle
CREATED → PLANNING → PREPARING → COPYING → SYNCHRONIZING →VALIDATING → READY → CUTOVER → COMPLETED
Rollback available: * (5 min) ❌
*Rollback loses incremental progressState Descriptions
| State | What’s Happening | Can Rollback? | Est. Time |
|---|---|---|---|
| PLANNING | Validating schema change | Instant | <1s |
| PREPARING | Creating ghost table | <10s | <1s |
| COPYING | Bulk copying data | Fast | ~30 min (100M rows) |
| SYNCHRONIZING | Applying incremental changes | Fast | ~5 min |
| VALIDATING | Checksums, row counts | Fast | ~10 min |
| READY | Waiting for cutover approval | Fast | Manual |
| CUTOVER | Atomic rename swap | 5 min window | <10s |
| COMPLETED | Done, old table kept 24h | ❌ No* | - |
*After completion, create new migration to revert schema
Supported Operations
Fast Operations (Metadata-Only)
| Operation | Downtime | Example |
|---|---|---|
| ADD COLUMN (nullable) | 0ms | ALTER TABLE users ADD COLUMN phone VARCHAR(20); |
| DROP COLUMN | ~ms | ALTER TABLE users DROP COLUMN legacy_field; |
| RENAME COLUMN | 0ms | ALTER TABLE users RENAME COLUMN email TO email_address; |
| DROP INDEX | 0ms | DROP INDEX idx_users_email; |
| DROP FOREIGN KEY | 0ms | ALTER TABLE orders DROP CONSTRAINT fk_user_id; |
Moderate Operations (Ghost Table)
| Operation | Downtime | Data Copy | Example |
|---|---|---|---|
| ADD COLUMN (NOT NULL) | ~ms | Yes | ALTER TABLE users ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW(); |
| MODIFY COLUMN (compatible) | ~ms | Yes | ALTER TABLE users MODIFY COLUMN user_id BIGINT; |
| ADD INDEX | 0ms | No* | CREATE INDEX idx_users_email ON users(email); |
| ADD UNIQUE CONSTRAINT | ~ms | No* | ALTER TABLE users ADD CONSTRAINT uk_email UNIQUE(email); |
*Index built concurrently during migration
Complex Operations (Full Table Rewrite)
| Operation | Downtime | Data Copy | Example |
|---|---|---|---|
| MODIFY COLUMN (incompatible) | ~ms | Yes | ALTER TABLE users MODIFY COLUMN age VARCHAR(10); |
| ADD PRIMARY KEY | ~ms | Yes | ALTER TABLE users ADD PRIMARY KEY (id); |
| DROP PRIMARY KEY | ~ms | Yes | ALTER TABLE users DROP PRIMARY KEY; |
Performance Expectations
Throughput (100M rows, 100GB table)
| Phase | Duration | Throughput | Notes |
|---|---|---|---|
| Planning | <1s | - | Metadata validation |
| Ghost Table Creation | <1s | - | DDL operation |
| Bulk Copy | ~33 min | 50 MB/s | Throttle=50%, 10K chunk size |
| Synchronization | ~5 min | - | Catch up on changes |
| Validation | ~10 min | - | Checksums, sampling |
| Cutover | <10s | - | Atomic rename |
| Total | ~50 min | - | End-to-end |
Overhead on Production Traffic
| Phase | I/O Overhead | CPU Overhead | Latency Impact |
|---|---|---|---|
| Triggers | 5-10% | 2-5% | +0.1ms/write |
| Bulk Copy | 10-50%* | 5-15% | 0ms (reads unaffected) |
| Validation | 20-40% | 10-20% | 0ms (background) |
| Cutover | 0% | <1% | 5-10ms (brief lock) |
*Configurable via THROTTLE_IO parameter
Scaling
| Table Size | Est. Duration | Recommendation |
|---|---|---|
| 10M rows (10GB) | ~5 min | Standard migration |
| 100M rows (100GB) | ~50 min | Standard migration |
| 1B rows (1TB) | ~8 hours | Off-peak hours |
| 10B rows (10TB) | ~3 days | Partitioned migration |
Rollback Guide
When to Rollback
- ❌ Validation failed (data inconsistency detected)
- ❌ Performance degradation observed
- ❌ Application errors after cutover
- ❌ Unexpected downtime during migration
- Change of plans (no longer needed)
How to Rollback
-- Immediate rollbackROLLBACK MIGRATION 'migration-uuid-1234';
-- Check if rollback is safeSELECT rollback_feasibility FROM helios_migrationsWHERE migration_id = 'migration-uuid-1234';
-- Possible values:-- 'SAFE' - No data loss, <10s rollback-- 'RISKY' - Within 5-min cutover window, some queries may fail-- 'IMPOSSIBLE' - Old table dropped, must create new migrationRollback Time Estimates
| State When Rolled Back | Time | Risk |
|---|---|---|
| PLANNING → PREPARING | <10s | None |
| COPYING → SYNCHRONIZING | <15s | Low (loses copy progress) |
| VALIDATING → READY | <15s | None |
| CUTOVER (first 5 min) | <30s | Low (brief interruption) |
| COMPLETED | N/A | Impossible* |
*Create inverse migration to revert schema
Common Scenarios
Scenario 1: Add Optional Column
Goal: Add phone column to 100M row users table.
-- Step 1: Start migrationALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Step 2: Monitor (optional)SHOW MIGRATION '<migration-id>';
-- Step 3: Wait for completion (or it completes automatically)-- Duration: ~1 second (metadata-only for nullable columns)Rollback: Instant, no data loss.
Scenario 2: Add Required Column with Default
Goal: Add mandatory created_at column with default value.
-- Step 1: Start migrationALTER TABLE usersADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT NOW();
-- Step 2: Monitor progressSHOW MIGRATIONS;
-- Step 3: Let it complete automatically-- Duration: ~50 minutes for 100M rows
-- Step 4: VerifySELECT COUNT(*) FROM users WHERE created_at IS NOT NULL;Rollback: Available until COMPLETED state (~50 min window).
Scenario 3: Change Column Type
Goal: Change user_id from INT to BIGINT (approaching 2B limit).
-- Step 1: Start migration with throttling (off-peak hours)ALTER TABLE users MODIFY COLUMN user_id BIGINTWITH ( THROTTLE_IO = 30, -- Use only 30% I/O during business hours ENABLE_SHADOW_TRAFFIC = true -- Validate performance);
-- Step 2: Monitor shadow traffic validationSELECT * FROM helios_migration_shadow_statsWHERE migration_id = '<migration-id>';
-- Step 3: Review validation results-- - Consistency: 100%-- - Performance delta: +2.3%-- - Ready for cutover
-- Step 4: Manual cutover (if auto-cutover disabled)CUTOVER MIGRATION '<migration-id>';Rollback: Available until cutover completes.
Scenario 4: Add Index on Large Table
Goal: Add index on email column (100M rows).
-- Step 1: Create index concurrently (no ghost table needed)CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- Step 2: Monitor progressSHOW MIGRATION '<migration-id>';-- Progress: 67% (67M/100M rows indexed)-- Estimated remaining: 00:05:30
-- Step 3: Completion (automatic)-- Duration: ~15 minutes for 100M rowsRollback: Drop index if needed.
Scenario 5: Emergency Rollback During Cutover
Goal: Rollback migration that just completed cutover due to unexpected issue.
-- Within 5 minutes of cutover:ROLLBACK MIGRATION 'migration-uuid-1234';
-- HeliosDB performs:-- 1. Swap table names back-- 2. Replay missed writes from changelog-- 3. Restore original state-- Duration: ~30 secondsWarning: After 5 minutes, rollback is no longer safe. Create new migration to revert schema.
Troubleshooting
Migration Stuck in COPYING
Symptoms:
- Progress not advancing
- Elapsed time exceeds estimate
Diagnosis:
SELECT * FROM helios_migration_diagnosticsWHERE migration_id = '<migration-id>';Solutions:
- Disk I/O bottleneck: Increase
THROTTLE_IOto use more capacity - Lock contention: Check for long-running transactions blocking migration
- Network partition: Check replication lag between nodes
Validation Failed
Symptoms:
ERROR: Migration validation failedRow count mismatch: original=100000000, ghost=99999987Diagnosis:
SELECT * FROM helios_migration_validation_detailsWHERE migration_id = '<migration-id>';Solutions:
- Trigger malfunction: Check trigger status
- Changelog overflow: Increase changelog retention
- Replication lag: Wait for synchronization to catch up
Out of Disk Space
Symptoms:
ERROR: Insufficient disk space for ghost tableRequired: 120 GB, Available: 85 GBSolutions:
- Pause migration:
PAUSE MIGRATION '<migration-id>'; - Free disk space: Clean up old backups, logs
- Resume migration:
RESUME MIGRATION '<migration-id>';
Best Practices
Planning
- Test in staging first: Validate schema change works as expected
- Check disk space: Ensure 2x table size is available
- Review migration plan: Use
EXPLAIN MIGRATIONto see execution plan - Schedule off-peak: Large migrations during low-traffic periods
- Enable shadow traffic: Catch performance issues before cutover
Execution
- Monitor actively: Check progress every 5-10 minutes
- Watch application metrics: Latency, error rate, throughput
- Throttle conservatively: Start with 30-50% I/O, increase if safe
- Keep old table: Don’t drop until 24h after successful cutover
- Document changes: Update schema documentation
Rollback
- Validate before cutover: Always run validation phase
- Have rollback plan: Know when and how to rollback
- Test rollback in staging: Practice emergency rollback
- Monitor post-cutover: Watch for 5 minutes after cutover
- Communicate status: Keep stakeholders informed
API Reference (Rust)
use heliosdb::migration::{MigrationEngine, MigrationBuilder, MigrationMode};
// Create migration enginelet migration_engine = MigrationEngine::new(config);
// Build migrationlet migration = MigrationBuilder::new() .table("users") .add_column(ColumnDefinition { name: "phone".to_string(), data_type: DataType::VarChar(20), nullable: true, default: None, }) .mode(MigrationMode::Online) .chunk_size(10_000) .throttle_io_percent(50) .enable_shadow_traffic(true) .build()?;
// Execute migrationlet migration_id = migration_engine.start(migration).await?;
// Monitor progresslet progress = migration_engine.get_progress(&migration_id).await?;println!("Progress: {}%", progress.percent_complete);
// Pause/Resumemigration_engine.pause(&migration_id).await?;migration_engine.resume(&migration_id).await?;
// Rollbackmigration_engine.rollback(&migration_id).await?;REST API Examples
# Start migrationcurl -X POST http://localhost:8080/api/migrations \ -H "Content-Type: application/json" \ -d '{ "table": "users", "operation": { "type": "ADD_COLUMN", "column": { "name": "phone", "type": "VARCHAR(20)", "nullable": true } }, "mode": "ONLINE" }'
# Get statuscurl http://localhost:8080/api/migrations/{migration_id}
# Pausecurl -X POST http://localhost:8080/api/migrations/{migration_id}/pause
# Resumecurl -X POST http://localhost:8080/api/migrations/{migration_id}/resume
# Rollbackcurl -X POST http://localhost:8080/api/migrations/{migration_id}/rollbackFAQ
Q: Can I run multiple migrations simultaneously?
A: No, only one migration per table at a time. Migrations on different tables can run in parallel.
Q: What happens to foreign keys during migration?
A: Foreign keys are temporarily disabled during migration and re-enabled/validated after cutover.
Q: Can I cancel a migration?
A: Yes, rollback is available in all states except COMPLETED (after old table is dropped).
Q: How much disk space is needed?
A: Approximately 2x the table size (original + ghost table). After cutover, original table is kept for 24h then dropped.
Q: What about triggers on the original table?
A: User-defined triggers are copied to the ghost table. Migration triggers are separate and removed after completion.
Q: Can I modify the schema during migration?
A: No, the schema is locked during migration. Queue additional changes to run after completion.
Q: How do I migrate a partitioned table?
A: Migrate each partition separately, or use the PARTITIONED_MIGRATION mode to migrate all at once with coordinated cutover.
Q: What if my application caches schema metadata?
A: Refresh schema cache after migration completion. HeliosDB sends schema change notifications to clients.
Support
- Documentation: Full Architecture
- Troubleshooting: Check logs in
helios_migration_logstable - Monitoring: Metrics exposed at
/metricsendpoint (Prometheus format) - Slack: #heliosdb-migrations
Last Updated: November 9, 2025 Version: 1.0 Status: Phase 2 Architecture