F6.21 Tenant Replication API Specification

REST and gRPC API Documentation

Feature ID: F6.21 Version: 6.0 Status: Design Phase Date: November 2, 2025 Last Updated: November 2, 2025

Overview
REST API
gRPC API
WebSocket API
Authentication & Authorization
Error Handling
Rate Limiting
Versioning
SDK Examples

1. Overview

1.1 API Architecture

graph TB
    subgraph "Client Layer"
        WEB[Web Dashboard]
        CLI[CLI Tool]
        SDK[Language SDKs]
    end

    subgraph "API Gateway"
        LB[Load Balancer]
        AUTH[Auth Middleware]
        RATELIMIT[Rate Limiter]
    end

    subgraph "API Layer"
        REST[REST API<br/>Axum]
        GRPC[gRPC API<br/>Tonic]
        WS[WebSocket API]
    end

    subgraph "Business Logic"
        CONTROLLER[Replication Controller]
        ORCHESTRATOR[Orchestrator]
    end

    WEB --> LB
    CLI --> LB
    SDK --> LB

    LB --> AUTH
    AUTH --> RATELIMIT

    RATELIMIT --> REST
    RATELIMIT --> GRPC
    RATELIMIT --> WS

    REST --> CONTROLLER
    GRPC --> CONTROLLER
    WS --> CONTROLLER

    CONTROLLER --> ORCHESTRATOR

    style REST fill:#3498DB
    style GRPC fill:#2ECC71
    style WS fill:#E74C3C

1.2 Base URLs

Environment	REST API Base URL	gRPC Endpoint
Production	`https://api.heliosdb.com/v1`	`grpc.heliosdb.com:443`
Staging	`https://api-staging.heliosdb.com/v1`	`grpc-staging.heliosdb.com:443`
Development	`http://localhost:8080/v1`	`localhost:50051`

1.3 Protocol Selection Guide

Use Case	Recommended Protocol	Reason
Web Dashboard	REST	Browser-friendly, simple
CLI Tool	REST	Easy to debug, curl-compatible
SDKs	gRPC	Type-safe, high performance
Real-time Monitoring	WebSocket	Live updates, low latency
Bulk Operations	gRPC Streaming	Efficient, lower overhead

2. REST API

2.1 Replication Management

2.1.1 Create Replication

Endpoint: POST /replications

Description: Create a new replication configuration for a tenant.

Request:

{
  "tenant_id": "tenant-123",
  "source": {
    "connection_id": "550e8400-e29b-41d4-a716-446655440001",
    "database": "production_db",
    "schema": "public"
  },
  "target": {
    "connection_id": "550e8400-e29b-41d4-a716-446655440002",
    "database": "replica_db",
    "schema": "public"
  },
  "config": {
    "qos_tier": "Premium",
    "max_lag_seconds": 5,
    "priority": 90,
    "table_filter": ["users.*", "orders.*", "products.*"],
    "row_filter": {
      "enabled": false
    },
    "transforms": [
      {
        "type": "AnonymizePII",
        "config": {
          "columns": ["email", "phone", "ssn"],
          "method": "Hash"
        }
      }
    ],
    "compression": {
      "enabled": true,
      "level": "Medium"
    },
    "encryption": {
      "enabled": true,
      "algorithm": "AES256GCM"
    }
  }
}

Response (201 Created):

{
  "id": "650e8400-e29b-41d4-a716-446655440003",
  "tenant_id": "tenant-123",
  "status": "Initializing",
  "created_at": "2025-11-02T10:00:00Z",
  "source": {
    "connection_id": "550e8400-e29b-41d4-a716-446655440001",
    "database": "production_db",
    "schema": "public"
  },
  "target": {
    "connection_id": "550e8400-e29b-41d4-a716-446655440002",
    "database": "replica_db",
    "schema": "public"
  },
  "config": {
    "qos_tier": "Premium",
    "max_lag_seconds": 5,
    "priority": 90,
    "table_filter": ["users.*", "orders.*", "products.*"],
    "transforms": [...],
    "compression": {...},
    "encryption": {...}
  },
  "estimated_initial_sync_duration_hours": 2.5,
  "estimated_data_size_gb": 150.3
}

Error Responses:

400 Bad Request: Invalid configuration
401 Unauthorized: Missing or invalid authentication
403 Forbidden: Insufficient permissions
409 Conflict: Replication already exists
500 Internal Server Error: Server error

Example:

curl -X POST https://api.heliosdb.com/v1/replications \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "tenant-123",
    "source": {"connection_id": "550e8400-e29b-41d4-a716-446655440001"},
    "target": {"connection_id": "550e8400-e29b-41d4-a716-446655440002"},
    "config": {"qos_tier": "Premium", "max_lag_seconds": 5}
  }'

2.1.2 Get Replication

Endpoint: GET /replications/{replication_id}

Description: Retrieve replication configuration and current status.

Response (200 OK):

{
  "id": "650e8400-e29b-41d4-a716-446655440003",
  "tenant_id": "tenant-123",
  "status": "Streaming",
  "health": "Healthy",
  "created_at": "2025-11-02T10:00:00Z",
  "updated_at": "2025-11-02T10:30:00Z",
  "source": {...},
  "target": {...},
  "config": {...},
  "current_state": {
    "replication_lag_seconds": 2.3,
    "replication_lag_bytes": 4096,
    "last_checkpoint": {
      "lsn": 1234567890,
      "timestamp": "2025-11-02T10:29:55Z"
    },
    "throughput": {
      "rows_per_second": 5432.1,
      "bytes_per_second": 1048576
    }
  },
  "statistics": {
    "total_rows_replicated": 15000000,
    "total_bytes_replicated": 5368709120,
    "uptime_seconds": 1800,
    "avg_compression_ratio": 4.2,
    "conflicts_detected": 0,
    "conflicts_resolved": 0
  }
}

Error Responses:

404 Not Found: Replication not found
401 Unauthorized: Missing or invalid authentication
403 Forbidden: Insufficient permissions

2.1.3 List Replications

Endpoint: GET /replications

Description: List all replications with filtering and pagination.

Query Parameters:

tenant_id (optional): Filter by tenant ID
status (optional): Filter by status (Initializing, Syncing, Streaming, Paused, Stopped)
qos_tier (optional): Filter by QoS tier
page (optional, default: 1): Page number
page_size (optional, default: 50, max: 100): Items per page
sort_by (optional, default: created_at): Sort field
sort_order (optional, default: desc): Sort order (asc, desc)

Response (200 OK):

{
  "data": [
    {
      "id": "650e8400-e29b-41d4-a716-446655440003",
      "tenant_id": "tenant-123",
      "status": "Streaming",
      "qos_tier": "Premium",
      "replication_lag_seconds": 2.3,
      "created_at": "2025-11-02T10:00:00Z"
    },
    {
      "id": "650e8400-e29b-41d4-a716-446655440004",
      "tenant_id": "tenant-456",
      "status": "Syncing",
      "qos_tier": "Standard",
      "replication_lag_seconds": 15.7,
      "created_at": "2025-11-02T09:00:00Z"
    }
  ],
  "pagination": {
    "page": 1,
    "page_size": 50,
    "total_pages": 3,
    "total_items": 127
  }
}

Example:

curl -X GET "https://api.heliosdb.com/v1/replications?status=Streaming&qos_tier=Premium&page=1" \
  -H "Authorization: Bearer ${TOKEN}"

2.1.4 Update Replication

Endpoint: PATCH /replications/{replication_id}

Description: Update replication configuration (limited fields).

Request:

{
  "config": {
    "qos_tier": "Premium",
    "max_lag_seconds": 3,
    "priority": 95,
    "table_filter": ["users.*", "orders.*", "products.*", "analytics.*"]
  }
}

Response (200 OK):

{
  "id": "650e8400-e29b-41d4-a716-446655440003",
  "tenant_id": "tenant-123",
  "status": "Streaming",
  "config": {
    "qos_tier": "Premium",
    "max_lag_seconds": 3,
    "priority": 95,
    "table_filter": ["users.*", "orders.*", "products.*", "analytics.*"]
  },
  "updated_at": "2025-11-02T11:00:00Z"
}

Note: Some fields (source, target) cannot be updated. Use migration API instead.

2.1.5 Delete Replication

Endpoint: DELETE /replications/{replication_id}

Description: Stop and delete a replication configuration.

Query Parameters:

cleanup_target (optional, default: false): Delete target database

Response (204 No Content)

Error Responses:

404 Not Found: Replication not found
409 Conflict: Replication is in Failover state (cannot delete)

Example:

curl -X DELETE "https://api.heliosdb.com/v1/replications/650e8400-e29b-41d4-a716-446655440003?cleanup_target=false" \
  -H "Authorization: Bearer ${TOKEN}"

2.2 Replication Control

2.2.1 Start Replication

Endpoint: POST /replications/{replication_id}/start

Description: Start or resume a paused replication.

Request (optional):

{
  "from_checkpoint": "auto",
  "initial_sync_parallelism": 4
}

Response (200 OK):

{
  "id": "650e8400-e29b-41d4-a716-446655440003",
  "status": "Initializing",
  "message": "Replication starting. Initial sync in progress.",
  "estimated_completion": "2025-11-02T12:30:00Z"
}

2.2.2 Pause Replication

Endpoint: POST /replications/{replication_id}/pause

Description: Pause replication (temporary stop).

Response (200 OK):

{
  "id": "650e8400-e29b-41d4-a716-446655440003",
  "status": "Paused",
  "paused_at": "2025-11-02T11:30:00Z",
  "last_checkpoint": {
    "lsn": 1234567890,
    "timestamp": "2025-11-02T11:29:55Z"
  }
}

2.2.3 Stop Replication

Endpoint: POST /replications/{replication_id}/stop

Description: Stop replication permanently.

Response (200 OK):

{
  "id": "650e8400-e29b-41d4-a716-446655440003",
  "status": "Stopped",
  "stopped_at": "2025-11-02T11:30:00Z",
  "final_statistics": {
    "total_rows_replicated": 15000000,
    "total_bytes_replicated": 5368709120,
    "uptime_seconds": 5400
  }
}

2.3 Failover Management

2.3.1 Trigger Failover

Endpoint: POST /replications/{replication_id}/failover

Description: Manually trigger failover to promote replica.

Request:

{
  "reason": "Planned maintenance on primary region",
  "force": false,
  "update_routing": true
}

Response (202 Accepted):

{
  "failover_id": "750e8400-e29b-41d4-a716-446655440005",
  "replication_id": "650e8400-e29b-41d4-a716-446655440003",
  "status": "InProgress",
  "started_at": "2025-11-02T12:00:00Z",
  "estimated_duration_seconds": 30,
  "steps": [
    {"step": "ValidateReplicaHealth", "status": "InProgress"},
    {"step": "StopReplication", "status": "Pending"},
    {"step": "PromoteReplica", "status": "Pending"},
    {"step": "UpdateRouting", "status": "Pending"}
  ]
}

2.3.2 Get Failover Status

Endpoint: GET /failovers/{failover_id}

Description: Get failover progress and status.

Response (200 OK):

{
  "id": "750e8400-e29b-41d4-a716-446655440005",
  "replication_id": "650e8400-e29b-41d4-a716-446655440003",
  "status": "Completed",
  "started_at": "2025-11-02T12:00:00Z",
  "completed_at": "2025-11-02T12:00:28Z",
  "duration_seconds": 28,
  "downtime_ms": 350,
  "steps": [
    {"step": "ValidateReplicaHealth", "status": "Completed", "duration_ms": 1200},
    {"step": "StopReplication", "status": "Completed", "duration_ms": 500},
    {"step": "PromoteReplica", "status": "Completed", "duration_ms": 350},
    {"step": "UpdateRouting", "status": "Completed", "duration_ms": 25000}
  ],
  "old_primary": {
    "connection_id": "550e8400-e29b-41d4-a716-446655440001",
    "region": "us-east-1"
  },
  "new_primary": {
    "connection_id": "550e8400-e29b-41d4-a716-446655440002",
    "region": "us-west-2"
  }
}

2.3.3 List Failover History

Endpoint: GET /failovers

Description: List historical failovers.

Query Parameters:

replication_id (optional): Filter by replication
status (optional): Filter by status
page, page_size: Pagination

Response (200 OK):

{
  "data": [
    {
      "id": "750e8400-e29b-41d4-a716-446655440005",
      "replication_id": "650e8400-e29b-41d4-a716-446655440003",
      "trigger": "Manual",
      "status": "Completed",
      "duration_seconds": 28,
      "downtime_ms": 350,
      "started_at": "2025-11-02T12:00:00Z"
    }
  ],
  "pagination": {...}
}

2.4 Migration Management

2.4.1 Start Migration

Endpoint: POST /migrations

Description: Start live tenant migration across regions.

Request:

{
  "tenant_id": "tenant-123",
  "source_region": "us-east-1",
  "target_region": "us-west-2",
  "migration_config": {
    "bulk_copy_parallelism": 8,
    "cdc_lag_threshold_seconds": 1,
    "cutover_strategy": "Automatic",
    "cleanup_source": false
  }
}

Response (202 Accepted):

{
  "id": "850e8400-e29b-41d4-a716-446655440006",
  "tenant_id": "tenant-123",
  "source_region": "us-east-1",
  "target_region": "us-west-2",
  "status": "InProgress",
  "current_phase": "BulkCopy",
  "started_at": "2025-11-02T13:00:00Z",
  "estimated_completion": "2025-11-02T15:30:00Z",
  "phases": [
    {
      "phase": "BulkCopy",
      "status": "InProgress",
      "progress_percent": 42.5,
      "estimated_data_size_gb": 150.3
    },
    {
      "phase": "CDCCatchup",
      "status": "Pending"
    },
    {
      "phase": "Cutover",
      "status": "Pending"
    }
  ]
}

2.4.2 Get Migration Status

Endpoint: GET /migrations/{migration_id}

Description: Get migration progress.

Response (200 OK):

{
  "id": "850e8400-e29b-41d4-a716-446655440006",
  "tenant_id": "tenant-123",
  "status": "Completed",
  "started_at": "2025-11-02T13:00:00Z",
  "completed_at": "2025-11-02T15:25:00Z",
  "total_duration_seconds": 8700,
  "downtime_ms": 85,
  "data_transferred_gb": 150.3,
  "phases": [
    {
      "phase": "BulkCopy",
      "status": "Completed",
      "duration_seconds": 7200,
      "data_size_gb": 150.3
    },
    {
      "phase": "CDCCatchup",
      "status": "Completed",
      "duration_seconds": 1495,
      "final_lag_seconds": 0.5
    },
    {
      "phase": "Cutover",
      "status": "Completed",
      "duration_ms": 85
    }
  ]
}

2.4.3 Cancel Migration

Endpoint: POST /migrations/{migration_id}/cancel

Description: Cancel in-progress migration (before cutover).

Response (200 OK):

{
  "id": "850e8400-e29b-41d4-a716-446655440006",
  "status": "Cancelled",
  "cancelled_at": "2025-11-02T14:00:00Z",
  "rollback_status": "Completed"
}

Error: Cannot cancel after cutover phase starts.

2.5 Metrics & Monitoring

2.5.1 Get Replication Metrics

Endpoint: GET /replications/{replication_id}/metrics

Description: Get time-series metrics for a replication.

Query Parameters:

start_time: Start timestamp (ISO 8601)
end_time: End timestamp (ISO 8601)
resolution: Data resolution (1m, 5m, 1h, 1d)

Response (200 OK):

{
  "replication_id": "650e8400-e29b-41d4-a716-446655440003",
  "time_range": {
    "start": "2025-11-02T10:00:00Z",
    "end": "2025-11-02T11:00:00Z",
    "resolution": "1m"
  },
  "metrics": {
    "replication_lag_seconds": [
      {"timestamp": "2025-11-02T10:00:00Z", "value": 2.1},
      {"timestamp": "2025-11-02T10:01:00Z", "value": 2.3},
      {"timestamp": "2025-11-02T10:02:00Z", "value": 1.9}
    ],
    "throughput_rows_per_second": [
      {"timestamp": "2025-11-02T10:00:00Z", "value": 5234.2},
      {"timestamp": "2025-11-02T10:01:00Z", "value": 5432.1}
    ],
    "compression_ratio": [
      {"timestamp": "2025-11-02T10:00:00Z", "value": 4.1},
      {"timestamp": "2025-11-02T10:01:00Z", "value": 4.3}
    ]
  }
}

2.5.2 Get Aggregated Metrics

Endpoint: GET /metrics/aggregate

Description: Get aggregated metrics across all replications.

Query Parameters:

group_by: Grouping dimension (tenant_id, qos_tier, region)
start_time, end_time: Time range

Response (200 OK):

{
  "time_range": {...},
  "groups": [
    {
      "group_key": {"qos_tier": "Premium"},
      "metrics": {
        "avg_replication_lag_seconds": 2.1,
        "p99_replication_lag_seconds": 4.8,
        "total_replications": 45,
        "healthy_replications": 44,
        "avg_compression_ratio": 4.2
      }
    },
    {
      "group_key": {"qos_tier": "Standard"},
      "metrics": {
        "avg_replication_lag_seconds": 12.3,
        "p99_replication_lag_seconds": 28.5,
        "total_replications": 120,
        "healthy_replications": 118,
        "avg_compression_ratio": 3.8
      }
    }
  ]
}

2.6 Health & Status

2.6.1 Health Check

Endpoint: GET /health

Description: Service health check.

Response (200 OK):

{
  "status": "Healthy",
  "version": "6.0.0",
  "uptime_seconds": 345600,
  "checks": {
    "database": "Healthy",
    "message_queue": "Healthy",
    "worker_pool": "Healthy"
  }
}

2.6.2 Readiness Check

Endpoint: GET /ready

Description: Service readiness check (for Kubernetes).

Response (200 OK):

{
  "ready": true
}

3. gRPC API

3.1 Service Definition (Protocol Buffers)

syntax = "proto3";

package heliosdb.replication.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

// Replication service
service ReplicationService {
  // Replication management
  rpc CreateReplication(CreateReplicationRequest) returns (Replication);
  rpc GetReplication(GetReplicationRequest) returns (Replication);
  rpc ListReplications(ListReplicationsRequest) returns (ListReplicationsResponse);
  rpc UpdateReplication(UpdateReplicationRequest) returns (Replication);
  rpc DeleteReplication(DeleteReplicationRequest) returns (google.protobuf.Empty);

  // Replication control
  rpc StartReplication(StartReplicationRequest) returns (StartReplicationResponse);
  rpc PauseReplication(PauseReplicationRequest) returns (PauseReplicationResponse);
  rpc StopReplication(StopReplicationRequest) returns (StopReplicationResponse);

  // Streaming
  rpc StreamReplicationMetrics(StreamMetricsRequest) returns (stream ReplicationMetrics);
  rpc StreamReplicationEvents(StreamEventsRequest) returns (stream ReplicationEvent);
}

// Failover service
service FailoverService {
  rpc TriggerFailover(TriggerFailoverRequest) returns (FailoverResponse);
  rpc GetFailoverStatus(GetFailoverStatusRequest) returns (FailoverStatus);
  rpc ListFailoverHistory(ListFailoverHistoryRequest) returns (ListFailoverHistoryResponse);
}

// Migration service
service MigrationService {
  rpc StartMigration(StartMigrationRequest) returns (Migration);
  rpc GetMigrationStatus(GetMigrationStatusRequest) returns (Migration);
  rpc CancelMigration(CancelMigrationRequest) returns (CancelMigrationResponse);
  rpc ListMigrations(ListMigrationsRequest) returns (ListMigrationsResponse);
}

// Messages
message Replication {
  string id = 1;
  string tenant_id = 2;
  ReplicationStatus status = 3;
  ConnectionInfo source = 4;
  ConnectionInfo target = 5;
  ReplicationConfig config = 6;
  ReplicationState current_state = 7;
  ReplicationStatistics statistics = 8;
  google.protobuf.Timestamp created_at = 9;
  google.protobuf.Timestamp updated_at = 10;
}

message ConnectionInfo {
  string connection_id = 1;
  string database = 2;
  string schema = 3;
  string region = 4;
}

message ReplicationConfig {
  QoSTier qos_tier = 1;
  int32 max_lag_seconds = 2;
  int32 priority = 3;
  repeated string table_filter = 4;
  RowFilter row_filter = 5;
  repeated Transform transforms = 6;
  CompressionConfig compression = 7;
  EncryptionConfig encryption = 8;
}

enum QoSTier {
  BEST_EFFORT = 0;
  STANDARD = 1;
  PREMIUM = 2;
  SYNCHRONOUS = 3;
}

enum ReplicationStatus {
  INITIALIZING = 0;
  SYNCING = 1;
  STREAMING = 2;
  PAUSED = 3;
  STOPPED = 4;
  ERROR = 5;
}

message ReplicationState {
  double replication_lag_seconds = 1;
  int64 replication_lag_bytes = 2;
  Checkpoint last_checkpoint = 3;
  Throughput throughput = 4;
  string health = 5;
}

message Checkpoint {
  int64 lsn = 1;
  google.protobuf.Timestamp timestamp = 2;
  int64 transaction_id = 3;
}

message Throughput {
  double rows_per_second = 1;
  int64 bytes_per_second = 2;
  double transactions_per_second = 3;
}

message ReplicationStatistics {
  int64 total_rows_replicated = 1;
  int64 total_bytes_replicated = 2;
  int64 uptime_seconds = 3;
  double avg_compression_ratio = 4;
  int32 conflicts_detected = 5;
  int32 conflicts_resolved = 6;
}

message Transform {
  oneof transform_type {
    AnonymizePIITransform anonymize_pii = 1;
    AggregateTransform aggregate = 2;
    FilterTransform filter = 3;
    CompressColumnsTransform compress_columns = 4;
  }
}

message AnonymizePIITransform {
  repeated string columns = 1;
  AnonymizationMethod method = 2;

  enum AnonymizationMethod {
    HASH = 0;
    TOKENIZE = 1;
    REDACT = 2;
    GENERALIZE = 3;
  }
}

message CompressionConfig {
  bool enabled = 1;
  CompressionLevel level = 2;

  enum CompressionLevel {
    LOW = 0;
    MEDIUM = 1;
    HIGH = 2;
  }
}

message EncryptionConfig {
  bool enabled = 1;
  string algorithm = 2;
}

// Request/Response messages
message CreateReplicationRequest {
  string tenant_id = 1;
  ConnectionInfo source = 2;
  ConnectionInfo target = 3;
  ReplicationConfig config = 4;
}

message GetReplicationRequest {
  string id = 1;
}

message ListReplicationsRequest {
  string tenant_id = 1;
  ReplicationStatus status = 2;
  QoSTier qos_tier = 3;
  int32 page = 4;
  int32 page_size = 5;
}

message ListReplicationsResponse {
  repeated Replication replications = 1;
  Pagination pagination = 2;
}

message Pagination {
  int32 page = 1;
  int32 page_size = 2;
  int32 total_pages = 3;
  int32 total_items = 4;
}

message StreamMetricsRequest {
  string replication_id = 1;
  int32 interval_seconds = 2;
}

message ReplicationMetrics {
  string replication_id = 1;
  google.protobuf.Timestamp timestamp = 2;
  double replication_lag_seconds = 3;
  int64 replication_lag_bytes = 4;
  double rows_per_second = 5;
  int64 bytes_per_second = 6;
  double compression_ratio = 7;
  double cpu_usage_percent = 8;
  int64 memory_usage_bytes = 9;
}

message ReplicationEvent {
  string replication_id = 1;
  google.protobuf.Timestamp timestamp = 2;
  string event_type = 3;
  string severity = 4;
  string message = 5;
  map<string, string> details = 6;
}

// Failover messages
message TriggerFailoverRequest {
  string replication_id = 1;
  string reason = 2;
  bool force = 3;
  bool update_routing = 4;
}

message FailoverResponse {
  string failover_id = 1;
  string replication_id = 2;
  string status = 3;
  google.protobuf.Timestamp started_at = 4;
  int32 estimated_duration_seconds = 5;
}

message FailoverStatus {
  string id = 1;
  string replication_id = 2;
  string status = 3;
  google.protobuf.Timestamp started_at = 4;
  google.protobuf.Timestamp completed_at = 5;
  int32 duration_seconds = 6;
  int64 downtime_ms = 7;
  repeated FailoverStep steps = 8;
}

message FailoverStep {
  string step = 1;
  string status = 2;
  int64 duration_ms = 3;
}

// Migration messages
message StartMigrationRequest {
  string tenant_id = 1;
  string source_region = 2;
  string target_region = 3;
  MigrationConfig config = 4;
}

message MigrationConfig {
  int32 bulk_copy_parallelism = 1;
  int32 cdc_lag_threshold_seconds = 2;
  string cutover_strategy = 3;
  bool cleanup_source = 4;
}

message Migration {
  string id = 1;
  string tenant_id = 2;
  string source_region = 3;
  string target_region = 4;
  string status = 5;
  string current_phase = 6;
  google.protobuf.Timestamp started_at = 7;
  google.protobuf.Timestamp estimated_completion = 8;
  google.protobuf.Timestamp completed_at = 9;
  int32 total_duration_seconds = 10;
  int64 downtime_ms = 11;
  double data_transferred_gb = 12;
  repeated MigrationPhase phases = 13;
}

message MigrationPhase {
  string phase = 1;
  string status = 2;
  double progress_percent = 3;
  int32 duration_seconds = 4;
}

3.2 gRPC Example Usage

Go Client:

package main

import (
    "context"
    "log"

    pb "github.com/heliosdb/api/replication/v1"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials"
)

func main() {
    // Connect to gRPC server
    creds := credentials.NewClientTLSFromCert(nil, "")
    conn, err := grpc.Dial("grpc.heliosdb.com:443", grpc.WithTransportCredentials(creds))
    if err != nil {
        log.Fatalf("Failed to connect: %v", err)
    }
    defer conn.Close()

    client := pb.NewReplicationServiceClient(conn)

    // Create replication
    req := &pb.CreateReplicationRequest{
        TenantId: "tenant-123",
        Source: &pb.ConnectionInfo{
            ConnectionId: "550e8400-e29b-41d4-a716-446655440001",
            Database:     "production_db",
        },
        Target: &pb.ConnectionInfo{
            ConnectionId: "550e8400-e29b-41d4-a716-446655440002",
            Database:     "replica_db",
        },
        Config: &pb.ReplicationConfig{
            QosTier:        pb.QoSTier_PREMIUM,
            MaxLagSeconds:  5,
            Priority:       90,
        },
    }

    replication, err := client.CreateReplication(context.Background(), req)
    if err != nil {
        log.Fatalf("Failed to create replication: %v", err)
    }

    log.Printf("Created replication: %s", replication.Id)

    // Stream metrics
    stream, err := client.StreamReplicationMetrics(context.Background(), &pb.StreamMetricsRequest{
        ReplicationId:   replication.Id,
        IntervalSeconds: 5,
    })
    if err != nil {
        log.Fatalf("Failed to stream metrics: %v", err)
    }

    for {
        metrics, err := stream.Recv()
        if err != nil {
            log.Fatalf("Stream error: %v", err)
        }

        log.Printf("Lag: %.2fs, Throughput: %.0f rows/s",
            metrics.ReplicationLagSeconds,
            metrics.RowsPerSecond)
    }
}

Python Client:

import grpc
from heliosdb.api.replication.v1 import replication_pb2, replication_pb2_grpc

# Connect
credentials = grpc.ssl_channel_credentials()
channel = grpc.secure_channel('grpc.heliosdb.com:443', credentials)
client = replication_pb2_grpc.ReplicationServiceStub(channel)

# Create replication
request = replication_pb2.CreateReplicationRequest(
    tenant_id="tenant-123",
    source=replication_pb2.ConnectionInfo(
        connection_id="550e8400-e29b-41d4-a716-446655440001",
        database="production_db"
    ),
    target=replication_pb2.ConnectionInfo(
        connection_id="550e8400-e29b-41d4-a716-446655440002",
        database="replica_db"
    ),
    config=replication_pb2.ReplicationConfig(
        qos_tier=replication_pb2.QoSTier.PREMIUM,
        max_lag_seconds=5,
        priority=90
    )
)

replication = client.CreateReplication(request)
print(f"Created replication: {replication.id}")

# Stream metrics
for metrics in client.StreamReplicationMetrics(
    replication_pb2.StreamMetricsRequest(
        replication_id=replication.id,
        interval_seconds=5
    )
):
    print(f"Lag: {metrics.replication_lag_seconds:.2f}s, "
          f"Throughput: {metrics.rows_per_second:.0f} rows/s")

4. WebSocket API

4.1 Connection

Endpoint: wss://api.heliosdb.com/v1/ws

Authentication: Query parameter token=<JWT_TOKEN>

Example:

const ws = new WebSocket('wss://api.heliosdb.com/v1/ws?token=' + token);

ws.onopen = () => {
  console.log('Connected');
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Received:', data);
};

Message:

{
  "action": "subscribe",
  "channel": "metrics",
  "filters": {
    "replication_id": "650e8400-e29b-41d4-a716-446655440003"
  }
}

Response Stream:

{
  "channel": "metrics",
  "replication_id": "650e8400-e29b-41d4-a716-446655440003",
  "timestamp": "2025-11-02T10:00:00Z",
  "data": {
    "replication_lag_seconds": 2.3,
    "rows_per_second": 5432.1,
    "compression_ratio": 4.2
  }
}

Message:

{
  "action": "subscribe",
  "channel": "events",
  "filters": {
    "tenant_id": "tenant-123",
    "severity": ["Error", "Critical"]
  }
}

Response Stream:

{
  "channel": "events",
  "event_type": "ReplicationError",
  "severity": "Error",
  "replication_id": "650e8400-e29b-41d4-a716-446655440003",
  "timestamp": "2025-11-02T10:05:00Z",
  "message": "Connection timeout to target database",
  "details": {
    "error_code": "CONN_TIMEOUT",
    "retry_count": 3
  }
}

5. Authentication & Authorization

5.1 JWT Authentication

Header:

Authorization: Bearer <JWT_TOKEN>

JWT Payload:

{
  "sub": "user-12345",
  "iss": "heliosdb.com",
  "aud": "heliosdb-api",
  "exp": 1730638800,
  "iat": 1730552400,
  "roles": ["Admin"],
  "tenant_id": "tenant-123",
  "permissions": [
    "replication:create",
    "replication:read",
    "replication:update",
    "replication:delete",
    "failover:trigger"
  ]
}

5.2 API Key Authentication

Header:

X-API-Key: <API_KEY>

Use Case: Service-to-service authentication

5.3 OAuth 2.0

Supported Flows:

Authorization Code Flow (for web apps)
Client Credentials Flow (for services)

Token Endpoint: POST /oauth/token

6. Error Handling

6.1 Error Response Format

{
  "error": {
    "code": "REPLICATION_LAG_EXCEEDED",
    "message": "Replication lag exceeded maximum threshold",
    "details": {
      "replication_id": "650e8400-e29b-41d4-a716-446655440003",
      "current_lag_seconds": 120,
      "max_lag_seconds": 30
    },
    "request_id": "req-950e8400-e29b-41d4-a716-446655440007",
    "timestamp": "2025-11-02T10:00:00Z"
  }
}

6.2 Error Codes

Code	HTTP Status	Description
`INVALID_REQUEST`	400	Malformed request
`VALIDATION_ERROR`	400	Validation failed
`UNAUTHORIZED`	401	Missing or invalid auth
`FORBIDDEN`	403	Insufficient permissions
`NOT_FOUND`	404	Resource not found
`CONFLICT`	409	Resource conflict
`REPLICATION_LAG_EXCEEDED`	422	Lag too high
`FAILOVER_IN_PROGRESS`	422	Cannot modify during failover
`RATE_LIMIT_EXCEEDED`	429	Too many requests
`INTERNAL_ERROR`	500	Internal server error
`SERVICE_UNAVAILABLE`	503	Service temporarily unavailable

6.3 Retry Strategy

Recommended Exponential Backoff:

retry_delay = min(max_delay, base_delay * 2^attempt)

base_delay = 1 second
max_delay = 60 seconds
max_attempts = 5

Retry on:

503 Service Unavailable
429 Rate Limit Exceeded (use Retry-After header)
Network errors

Do NOT retry on:

400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found

7. Rate Limiting

7.1 Rate Limit Headers

Response Headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 995
X-RateLimit-Reset: 1730552500

7.2 Rate Limits by Tier

Tier	Requests/Minute	Burst
Free	60	10
Standard	600	50
Premium	6000	200
Enterprise	Custom	Custom

7.3 Rate Limit Response

429 Too Many Requests:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Retry after 30 seconds.",
    "retry_after_seconds": 30
  }
}

8. Versioning

8.1 API Versioning Strategy

URL Versioning: /v1/, /v2/, etc.

Deprecation Policy:

6 months notice before deprecation
12 months support after deprecation announcement
Clear migration guide provided

8.2 Breaking Changes

What constitutes a breaking change:

Removing or renaming fields
Changing field types
Changing status codes
Removing endpoints

Non-breaking changes:

Adding new fields (ignored by old clients)
Adding new endpoints
Adding new optional parameters

9. SDK Examples

9.1 TypeScript SDK

import { HeliosDBReplicationClient } from '@heliosdb/replication-sdk';

const client = new HeliosDBReplicationClient({
  apiKey: process.env.HELIOSDB_API_KEY,
  baseUrl: 'https://api.heliosdb.com/v1'
});

// Create replication
const replication = await client.replications.create({
  tenantId: 'tenant-123',
  source: {
    connectionId: '550e8400-e29b-41d4-a716-446655440001',
    database: 'production_db'
  },
  target: {
    connectionId: '550e8400-e29b-41d4-a716-446655440002',
    database: 'replica_db'
  },
  config: {
    qosTier: 'Premium',
    maxLagSeconds: 5,
    priority: 90
  }
});

console.log(`Created replication: ${replication.id}`);

// Start replication
await client.replications.start(replication.id);

// Monitor metrics
const metricsStream = client.replications.streamMetrics(replication.id);
metricsStream.on('data', (metrics) => {
  console.log(`Lag: ${metrics.replicationLagSeconds}s`);
});

9.2 Python SDK

from heliosdb_replication import ReplicationClient

client = ReplicationClient(
    api_key=os.environ['HELIOSDB_API_KEY'],
    base_url='https://api.heliosdb.com/v1'
)

# Create replication
replication = client.replications.create(
    tenant_id='tenant-123',
    source={'connection_id': '550e8400-e29b-41d4-a716-446655440001'},
    target={'connection_id': '550e8400-e29b-41d4-a716-446655440002'},
    config={'qos_tier': 'Premium', 'max_lag_seconds': 5}
)

print(f"Created replication: {replication.id}")

# Start replication
client.replications.start(replication.id)

# Monitor metrics
for metrics in client.replications.stream_metrics(replication.id):
    print(f"Lag: {metrics.replication_lag_seconds}s")

9.3 Rust SDK

use heliosdb_replication::{ReplicationClient, CreateReplicationRequest};

#[tokio::main]
async fn main() -> Result<()> {
    let client = ReplicationClient::new(
        env::var("HELIOSDB_API_KEY")?,
        "https://api.heliosdb.com/v1"
    );

    // Create replication
    let replication = client.replications().create(CreateReplicationRequest {
        tenant_id: "tenant-123".into(),
        source: ConnectionInfo {
            connection_id: "550e8400-e29b-41d4-a716-446655440001".into(),
            ..Default::default()
        },
        target: ConnectionInfo {
            connection_id: "550e8400-e29b-41d4-a716-446655440002".into(),
            ..Default::default()
        },
        config: ReplicationConfig {
            qos_tier: QosTier::Premium,
            max_lag_seconds: 5,
            priority: 90,
            ..Default::default()
        },
    }).await?;

    println!("Created replication: {}", replication.id);

    // Start replication
    client.replications().start(&replication.id).await?;

    // Monitor metrics
    let mut stream = client.replications().stream_metrics(&replication.id).await?;
    while let Some(metrics) = stream.next().await {
        println!("Lag: {}s", metrics.replication_lag_seconds);
    }

    Ok(())
}

10. OpenAPI Specification

Full OpenAPI 3.1 specification available at: https://api.heliosdb.com/v1/openapi.yaml

Interactive Swagger UI: https://api.heliosdb.com/docs

Document Version: 1.0 Status: Draft for Review Authors: API Design Team Last Updated: November 2, 2025

HeliosDB Tenant Replication API - Comprehensive, type-safe, developer-friendly

F6.21 Tenant Replication API Specification

F6.21 Tenant Replication API Specification

REST and gRPC API Documentation

Table of Contents

1. Overview

1.1 API Architecture

1.2 Base URLs

1.3 Protocol Selection Guide

2. REST API

2.1 Replication Management

2.1.1 Create Replication

2.1.2 Get Replication

2.1.3 List Replications

2.1.4 Update Replication

2.1.5 Delete Replication

2.2 Replication Control

2.2.1 Start Replication

2.2.2 Pause Replication

2.2.3 Stop Replication

2.3 Failover Management

2.3.1 Trigger Failover

2.3.2 Get Failover Status

2.3.3 List Failover History

2.4 Migration Management

2.4.1 Start Migration

2.4.2 Get Migration Status

2.4.3 Cancel Migration

2.5 Metrics & Monitoring

2.5.1 Get Replication Metrics

2.5.2 Get Aggregated Metrics

2.6 Health & Status

2.6.1 Health Check

2.6.2 Readiness Check

3. gRPC API

3.1 Service Definition (Protocol Buffers)

3.2 gRPC Example Usage

4. WebSocket API

4.1 Connection

4.2 Subscribe to Metrics

4.3 Subscribe to Events

5. Authentication & Authorization

5.1 JWT Authentication

5.2 API Key Authentication

5.3 OAuth 2.0

6. Error Handling

6.1 Error Response Format

6.2 Error Codes

6.3 Retry Strategy

7. Rate Limiting

7.1 Rate Limit Headers

7.2 Rate Limits by Tier

7.3 Rate Limit Response

8. Versioning

8.1 API Versioning Strategy

8.2 Breaking Changes

9. SDK Examples

9.1 TypeScript SDK

9.2 Python SDK

9.3 Rust SDK

10. OpenAPI Specification