Skip to content

Migrating from Cassandra to HeliosDB

Migrating from Cassandra to HeliosDB

Last Updated: November 11, 2025 Target Audience: Database Administrators, DevOps Engineers, Architects Estimated Migration Time: 1-4 weeks depending on strategy Difficulty: Intermediate


Table of Contents

  1. Overview
  2. Why Migrate to HeliosDB?
  3. Compatibility Matrix
  4. Prerequisites
  5. Migration Strategies
  6. Step-by-Step Migration
  7. Feature Mapping
  8. Troubleshooting
  9. Best Practices
  10. Rollback Plan
  11. Case Studies
  12. Next Steps

Overview

HeliosDB implements the Apache Cassandra Native Protocol (versions 3, 4, and 5) with 95%+ compatibility, enabling seamless migration from Cassandra clusters with minimal code changes. This guide provides comprehensive strategies and step-by-step procedures for migrating your Cassandra workloads to HeliosDB.

What You Get

  • Native CQL Support: Use existing cqlsh, DataStax drivers, and CQL queries unchanged
  • Multi-Protocol: Combine Cassandra CQL with PostgreSQL, MongoDB, and Oracle 23ai in a single database
  • Better Performance: 2.7x OLTP and 7.5x OLAP performance improvements
  • Lower TCO: Eliminate multi-cluster management complexity and reduce operational costs by 40-60%
  • Enhanced Analytics: Native SQL support for complex analytics on Cassandra data

Implementation Details

  • Module: heliosdb-protocols/src/cassandra/
  • Lines of Code: ~5,543 lines
  • Protocol Support: CQL Native Protocol v3, v4, v5
  • Status: Production-ready (95%+ feature complete)

Why Migrate to HeliosDB?

1. Multi-Protocol Unification

Problem: Organizations often run separate databases for different workloads:

  • Cassandra for time-series and high-write workloads
  • PostgreSQL for transactional data
  • MongoDB for document storage
  • Oracle for legacy applications

Solution: HeliosDB speaks all four protocols natively, eliminating data silos.

# Same database, multiple protocols
import psycopg2 # PostgreSQL client
from cassandra.cluster import Cluster # Cassandra client
# Connect via PostgreSQL protocol
pg_conn = psycopg2.connect(host="heliosdb.example.com", port=5432, ...)
# Connect via CQL protocol (same data!)
cluster = Cluster(['heliosdb.example.com'], port=9042)
cql_session = cluster.connect('my_keyspace')

2. Better Analytics Performance

Cassandra Challenge: Complex analytics queries are slow or impossible.

HeliosDB Advantage:

  • 7.5x faster OLAP queries using columnar storage and vectorized execution
  • Native SQL support for complex joins and aggregations
  • Materialized views with automatic maintenance

Example:

-- Cassandra: Multiple queries + client-side join
SELECT * FROM users WHERE user_id = ?;
SELECT * FROM orders WHERE user_id = ?;
-- Manually join in application code
-- HeliosDB: Single SQL query on CQL data
SELECT u.username, COUNT(o.order_id), SUM(o.amount)
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.created_at > '2025-01-01'
GROUP BY u.username
ORDER BY SUM(o.amount) DESC;

3. Lower Operational Complexity

Cassandra Operations:

  • Manual nodetool repairs
  • Complex cluster expansion
  • Tombstone management
  • Compaction tuning

HeliosDB Operations:

  • Self-healing: 96% autonomous issue resolution
  • Auto-scaling: Elastic compute (0 to max CUs)
  • Automated tuning: ML-based query optimization
  • Simplified ops: Single cluster vs. multi-cluster

4. Cost Savings

Typical Savings:

  • 40-60% lower TCO through operational simplification
  • 85% storage savings with intelligent tiering (hot/warm/cold)
  • 30-50% compute savings with scale-to-zero serverless
  • Eliminate cross-cluster data transfer costs

Example: 100TB Cassandra cluster

  • Cassandra: 100TB × $0.10/GB/mo = $10,000/mo
  • HeliosDB: 100TB × $0.02/GB/mo (tiered) = $2,000/mo
  • Annual Savings: $96,000

Compatibility Matrix

Fully Supported (95%+ compatible)

Feature CategoryCassandraHeliosDBNotes
ProtocolNative Protocol v3/v4/v5Full100% wire protocol compatible
AuthenticationAllowAll, Password, SASLFullDrop-in replacement
DDLCREATE/ALTER/DROP keyspace/tableFullIdentical syntax
DMLSELECT/INSERT/UPDATE/DELETEFullNo changes needed
Data TypesAll primitive typesFullASCII, TEXT, INT, BIGINT, UUID, etc.
CollectionsLIST, SET, MAPFullStored as JSONB internally
UDTUser-defined typesFullFull support
IndexesSecondary indexesFullCREATE INDEX syntax identical
BatchesLOGGED, UNLOGGED, COUNTERFullFull support
Prepared StatementsParameter bindingFullFull support
PagingResult pagingFullFull support
Consistency LevelsONE, QUORUM, ALL, etc.FullAll levels supported
TTLTime-to-liveFullRow/column expiration
TimestampsWrite timestampsFullUSING TIMESTAMP
FunctionsBuilt-in functionsFulluuid(), now(), token(), etc.
AggregationsSUM, AVG, COUNT, MIN, MAXFullFull support

Partially Supported

FeatureStatusNotes
Compression (LZ4/Snappy)⚠ 80%Stubs ready, requires dependency crates
Materialized Views⚠ Parser onlyExecution planned for Phase 2 Week 3
Lightweight Transactions (LWT)⚠ BasicIF EXISTS/IF NOT EXISTS supported
Custom Types⚠ LimitedBasic support only

Not Supported (Future)

FeatureStatusTimeline
User-Defined Functions (UDF)❌ Not supportedPhase 3
User-Defined Aggregates (UDA)❌ Not supportedPhase 3
Triggers❌ Not supportedNot planned
SASI Indexes❌ Not supportedPhase 3
Change Data Capture (CDC)❌ Not supportedPhase 3

Performance Characteristics

OperationCassandraHeliosDBNotes
Primary key lookup<1ms<1msEqual performance
Range query (clustering)1-10ms1-5msFaster with columnar storage
Secondary index scan10-50ms5-25msOptimized indexes
ALLOW FILTERING50-500ms20-200msBetter query optimizer
Batch (10 statements)5-20ms5-15msEqual or better
Write throughput10K-100K ops/s10K-150K ops/sHigher peak throughput

Prerequisites

1. Cassandra Version Requirements

Supported Versions:

  • Apache Cassandra 3.0+
  • Apache Cassandra 4.0+
  • Apache Cassandra 5.0
  • DataStax Enterprise (DSE) 6.0+
  • ScyllaDB (Cassandra-compatible)

Version Check:

Terminal window
cqlsh -e "SHOW VERSION"

2. Data Volume Assessment

Calculate Total Data Size:

Terminal window
# On each Cassandra node
nodetool status
nodetool tablestats <keyspace>

Estimate Migration Time:

  • Small (<100 GB): 1-2 days
  • Medium (100 GB - 1 TB): 3-7 days
  • Large (1 TB - 10 TB): 1-3 weeks
  • Very Large (>10 TB): 3-6 weeks

3. Schema Complexity Assessment

Run Assessment Script:

-- Count keyspaces
SELECT COUNT(*) FROM system_schema.keyspaces WHERE keyspace_name NOT IN ('system', 'system_schema');
-- Count tables
SELECT keyspace_name, COUNT(*) AS table_count
FROM system_schema.tables
WHERE keyspace_name NOT IN ('system', 'system_schema')
GROUP BY keyspace_name;
-- Identify UDTs
SELECT keyspace_name, type_name
FROM system_schema.types;
-- Identify materialized views
SELECT keyspace_name, view_name
FROM system_schema.views;

4. Downtime Planning

Migration Strategies by Downtime Tolerance:

StrategyDowntimeComplexityUse Case
Dual-WriteZeroHighProduction critical systems
Replication-BasedMinimal (<5 min)MediumMost common
Snapshot & Restore1-24 hoursLowDev/staging, scheduled maintenance

5. HeliosDB Cluster Setup

Minimum Requirements:

# Production cluster (3 nodes)
compute:
nodes: 3
cpu: 8 cores/node
memory: 32 GB/node
storage: 1 TB NVMe/node
# Development cluster (1 node)
compute:
nodes: 1
cpu: 4 cores
memory: 16 GB
storage: 500 GB SSD

Installation:

Terminal window
# Install HeliosDB
curl -sSL https://get.heliosdb.com | sh
# Start cluster
heliosdb cluster init --nodes 3 --replication 3
# Verify CQL endpoint
heliosdb cluster status | grep cql_endpoint
# Output: cql_endpoint: heliosdb.example.com:9042

Migration Strategies

Best For: Production systems, mission-critical applications, zero downtime requirement

Timeline: 2-4 weeks

Steps:

  1. Set up HeliosDB cluster
  2. Configure application for dual-write (Cassandra + HeliosDB)
  3. Backfill historical data in background
  4. Validate data consistency (100% match)
  5. Gradually switch read traffic (10% → 50% → 100%)
  6. Monitor for 1-2 weeks
  7. Deprecate Cassandra cluster

Pros:

  • Zero downtime
  • Gradual rollout reduces risk
  • Easy rollback at any stage

Cons:

  • Requires application changes
  • Higher temporary costs (dual infrastructure)
  • Longer migration timeline

Implementation:

# Dual-write wrapper
class DualWriteConnection:
def __init__(self):
self.cassandra = Cluster(['cassandra.example.com'], port=9042).connect()
self.heliosdb = Cluster(['heliosdb.example.com'], port=9042).connect()
def execute(self, query, params=None):
# Write to both (fail if either fails)
try:
result_cassandra = self.cassandra.execute(query, params)
result_heliosdb = self.heliosdb.execute(query, params)
return result_cassandra # Return Cassandra result for now
except Exception as e:
# Rollback if needed
logger.error(f"Dual-write failed: {e}")
raise
def execute_read(self, query, params=None):
# Route reads based on traffic split
if random.random() < 0.10: # 10% to HeliosDB
return self.heliosdb.execute(query, params)
else:
return self.cassandra.execute(query, params)
# Usage
conn = DualWriteConnection()
conn.execute("INSERT INTO users (user_id, name) VALUES (?, ?)", [uuid4(), 'Alice'])
users = conn.execute_read("SELECT * FROM users WHERE user_id = ?", [user_id])

Backfill Script:

#!/bin/bash
# backfill.sh - Backfill historical data
CASSANDRA_HOST="cassandra.example.com"
HELIOSDB_HOST="heliosdb.example.com"
KEYSPACE="my_keyspace"
TABLES=("users" "orders" "events")
for TABLE in "${TABLES[@]}"; do
echo "Backfilling $TABLE..."
# Export from Cassandra
cqlsh $CASSANDRA_HOST -e "COPY $KEYSPACE.$TABLE TO '/tmp/$TABLE.csv'"
# Import to HeliosDB
cqlsh $HELIOSDB_HOST -e "COPY $KEYSPACE.$TABLE FROM '/tmp/$TABLE.csv'"
# Verify row count
CASSANDRA_COUNT=$(cqlsh $CASSANDRA_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$TABLE" | tail -1)
HELIOSDB_COUNT=$(cqlsh $HELIOSDB_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$TABLE" | tail -1)
if [ "$CASSANDRA_COUNT" == "$HELIOSDB_COUNT" ]; then
echo "$TABLE backfill verified ($CASSANDRA_COUNT rows)"
else
echo "$TABLE row count mismatch (Cassandra: $CASSANDRA_COUNT, HeliosDB: $HELIOSDB_COUNT)"
exit 1
fi
done
echo "Backfill complete!"

Strategy 2: Replication-Based (Minimal Downtime)

Best For: Most production systems, <5 minute downtime acceptable

Timeline: 1-2 weeks

Steps:

  1. Set up HeliosDB cluster
  2. Configure CDC/replication from Cassandra to HeliosDB
  3. Initial full sync
  4. Catch-up replication (monitor lag)
  5. Planned cutover window (5 minutes)
  6. Switch application to HeliosDB
  7. Monitor for issues

Pros:

  • Minimal downtime (2-5 minutes)
  • Lower risk than snapshot
  • No application dual-write complexity

Cons:

  • Requires CDC/replication tooling
  • Brief downtime during cutover
  • More complex than snapshot

Implementation (using Debezium CDC):

debezium-cassandra-connector.yaml
name: cassandra-heliosdb-replication
connector.class: io.debezium.connector.cassandra.CassandraConnector
cassandra.hosts: cassandra1:9042,cassandra2:9042,cassandra3:9042
cassandra.keyspace: my_keyspace
cassandra.tables: users,orders,events
cassandra.username: cassandra
cassandra.password: ${CASSANDRA_PASSWORD}
# HeliosDB sink
heliosdb.host: heliosdb.example.com
heliosdb.port: 9042
heliosdb.keyspace: my_keyspace
heliosdb.username: admin
heliosdb.password: ${HELIOSDB_PASSWORD}
# Replication settings
snapshot.mode: initial # Full snapshot first
poll.interval.ms: 100
max.batch.size: 5000

Monitor Replication Lag:

Terminal window
# Check replication lag
curl http://debezium:8083/connectors/cassandra-heliosdb-replication/status | jq '.connector.state'
# Expected output during catch-up:
# {
# "state": "RUNNING",
# "lag_seconds": 120, # Decreasing to 0
# "records_replicated": 1500000,
# "records_remaining": 50000
# }

Cutover Script:

#!/bin/bash
# cutover.sh - Perform cutover to HeliosDB
echo "Starting cutover to HeliosDB..."
# 1. Stop application writes
echo "Stopping application..."
kubectl scale deployment my-app --replicas=0
# 2. Wait for replication to catch up (lag < 1 second)
while true; do
LAG=$(curl -s http://debezium:8083/connectors/cassandra-heliosdb-replication/status | jq '.lag_seconds')
if [ "$LAG" -lt 1 ]; then
echo "Replication caught up (lag: ${LAG}s)"
break
fi
echo "Waiting for replication... (lag: ${LAG}s)"
sleep 2
done
# 3. Verify data consistency
echo "Verifying data consistency..."
./verify_data_consistency.sh
# 4. Update application config to HeliosDB
kubectl set env deployment/my-app CASSANDRA_HOST=heliosdb.example.com
# 5. Restart application
echo "Starting application with HeliosDB..."
kubectl scale deployment my-app --replicas=3
echo "Cutover complete! Application now using HeliosDB."
echo "Monitor logs: kubectl logs -f deployment/my-app"

Strategy 3: Snapshot & Restore (Planned Downtime)

Best For: Development/staging, scheduled maintenance windows

Timeline: 1-3 days

Steps:

  1. Schedule maintenance window
  2. Take Cassandra snapshot
  3. Stop writes to Cassandra
  4. Export snapshot to HeliosDB format
  5. Import to HeliosDB
  6. Validate data
  7. Switch application to HeliosDB
  8. Resume operations

Pros:

  • Simplest approach
  • Lower cost (no dual infrastructure)
  • Fastest migration

Cons:

  • Requires downtime (1-24 hours)
  • Not suitable for 24/7 systems
  • Higher risk (all-or-nothing)

Implementation:

#!/bin/bash
# snapshot_migrate.sh - Snapshot and migrate
CASSANDRA_HOST="cassandra.example.com"
HELIOSDB_HOST="heliosdb.example.com"
KEYSPACE="my_keyspace"
SNAPSHOT_NAME="migration_$(date +%Y%m%d_%H%M%S)"
echo "Step 1: Taking Cassandra snapshot..."
nodetool snapshot $KEYSPACE -t $SNAPSHOT_NAME
echo "Step 2: Exporting snapshot..."
# Find snapshot directory
SNAPSHOT_DIR=$(find /var/lib/cassandra/data/$KEYSPACE -name $SNAPSHOT_NAME -type d | head -1)
echo "Snapshot location: $SNAPSHOT_DIR"
# Use sstableloader (recommended) or COPY
echo "Step 3: Loading to HeliosDB..."
sstableloader -d $HELIOSDB_HOST $SNAPSHOT_DIR
# Alternative: COPY command
# for TABLE in users orders events; do
# cqlsh $CASSANDRA_HOST -e "COPY $KEYSPACE.$TABLE TO '/tmp/${TABLE}.csv'"
# cqlsh $HELIOSDB_HOST -e "COPY $KEYSPACE.$TABLE FROM '/tmp/${TABLE}.csv'"
# done
echo "Step 4: Validating data..."
./verify_data_consistency.sh
echo "Migration complete!"

Step-by-Step Migration

Step 1: Connection Setup

Cassandra Connection:

Terminal window
# Connect to Cassandra
cqlsh cassandra.example.com 9042 -u cassandra -p password

HeliosDB Connection (identical syntax):

Terminal window
# Connect to HeliosDB (same protocol!)
cqlsh heliosdb.example.com 9042 -u admin -p password

Test Connectivity:

-- On both Cassandra and HeliosDB
SELECT cluster_name, release_version FROM system.local;

Python Client (no code changes needed):

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
# Cassandra
auth_provider = PlainTextAuthProvider(username='cassandra', password='password')
cluster = Cluster(['cassandra.example.com'], port=9042, auth_provider=auth_provider)
cassandra_session = cluster.connect()
# HeliosDB (just change host!)
auth_provider = PlainTextAuthProvider(username='admin', password='password')
cluster = Cluster(['heliosdb.example.com'], port=9042, auth_provider=auth_provider)
heliosdb_session = cluster.connect()

Step 2: Schema Migration

Export Cassandra Schema:

Terminal window
# Export all keyspaces
cqlsh cassandra.example.com -e "DESCRIBE KEYSPACES" > keyspaces.txt
# Export specific keyspace schema
cqlsh cassandra.example.com -e "DESCRIBE KEYSPACE my_keyspace" > my_keyspace_schema.cql

Create Keyspace in HeliosDB (identical syntax):

-- Example schema
CREATE KEYSPACE IF NOT EXISTS my_keyspace
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
USE my_keyspace;
-- Create table
CREATE TABLE users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
created_at TIMESTAMP,
profile MAP<TEXT, TEXT>
);
-- Create table with clustering
CREATE TABLE user_events (
user_id UUID,
event_time TIMESTAMP,
event_type TEXT,
payload MAP<TEXT, TEXT>,
PRIMARY KEY ((user_id), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
-- Create secondary index
CREATE INDEX ON users (email);
-- Create user-defined type
CREATE TYPE address (
street TEXT,
city TEXT,
zip_code TEXT
);
CREATE TABLE user_addresses (
user_id UUID PRIMARY KEY,
home_address FROZEN<address>,
work_address FROZEN<address>
);

Automated Schema Migration Script:

migrate_schema.sh
#!/bin/bash
CASSANDRA_HOST="cassandra.example.com"
HELIOSDB_HOST="heliosdb.example.com"
# Get list of non-system keyspaces
KEYSPACES=$(cqlsh $CASSANDRA_HOST -e "DESCRIBE KEYSPACES" | grep -v "system")
for KEYSPACE in $KEYSPACES; do
echo "Migrating schema for keyspace: $KEYSPACE"
# Export schema
cqlsh $CASSANDRA_HOST -e "DESCRIBE KEYSPACE $KEYSPACE" > /tmp/${KEYSPACE}_schema.cql
# Import to HeliosDB
cqlsh $HELIOSDB_HOST -f /tmp/${KEYSPACE}_schema.cql
echo "$KEYSPACE schema migrated"
done
echo "Schema migration complete!"

Step 3: Data Migration

Option A: COPY Command (Small to Medium Datasets <100 GB)

Export:

Terminal window
# Export table to CSV
cqlsh cassandra.example.com -e "
COPY my_keyspace.users TO '/tmp/users.csv'
WITH HEADER = TRUE;
"

Import:

Terminal window
# Import CSV to HeliosDB
cqlsh heliosdb.example.com -e "
COPY my_keyspace.users FROM '/tmp/users.csv'
WITH HEADER = TRUE;
"

Automated COPY Script:

migrate_data_copy.sh
#!/bin/bash
CASSANDRA_HOST="cassandra.example.com"
HELIOSDB_HOST="heliosdb.example.com"
KEYSPACE="my_keyspace"
TABLES=("users" "orders" "events" "user_events")
for TABLE in "${TABLES[@]}"; do
echo "Migrating $KEYSPACE.$TABLE..."
# Export
cqlsh $CASSANDRA_HOST -e "
COPY $KEYSPACE.$TABLE TO '/tmp/${TABLE}.csv'
WITH HEADER = TRUE AND PAGETIMEOUT = 60;
"
# Import
cqlsh $HELIOSDB_HOST -e "
COPY $KEYSPACE.$TABLE FROM '/tmp/${TABLE}.csv'
WITH HEADER = TRUE AND CHUNKSIZE = 5000;
"
# Verify count
CASSANDRA_COUNT=$(cqlsh $CASSANDRA_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$TABLE" | grep -Eo '[0-9]+')
HELIOSDB_COUNT=$(cqlsh $HELIOSDB_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$TABLE" | grep -Eo '[0-9]+')
if [ "$CASSANDRA_COUNT" == "$HELIOSDB_COUNT" ]; then
echo "$TABLE migrated successfully ($CASSANDRA_COUNT rows)"
else
echo "$TABLE row count mismatch (C: $CASSANDRA_COUNT, H: $HELIOSDB_COUNT)"
exit 1
fi
done
echo "Data migration complete!"

Option B: sstableloader (Large Datasets >100 GB)

Export SSTables:

Terminal window
# Take snapshot
nodetool snapshot my_keyspace -t migration_snapshot
# Find snapshot directory
find /var/lib/cassandra/data/my_keyspace -name migration_snapshot

Load to HeliosDB:

Terminal window
# Load SSTables
sstableloader -d heliosdb.example.com \
-u admin \
-pw password \
/var/lib/cassandra/data/my_keyspace/users-abc123/snapshots/migration_snapshot

Option C: Spark Bulk Load (Very Large Datasets >1 TB)

Spark Migration Job:

import org.apache.spark.sql.SparkSession
import com.datastax.spark.connector._
val spark = SparkSession.builder()
.appName("Cassandra to HeliosDB Migration")
.config("spark.cassandra.connection.host", "cassandra.example.com")
.config("spark.cassandra.auth.username", "cassandra")
.config("spark.cassandra.auth.password", "password")
.getOrCreate()
// Read from Cassandra
val cassandraDF = spark.read
.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "my_keyspace", "table" -> "users"))
.load()
// Write to HeliosDB
cassandraDF.write
.format("org.apache.spark.sql.cassandra")
.options(Map(
"keyspace" -> "my_keyspace",
"table" -> "users",
"spark.cassandra.connection.host" -> "heliosdb.example.com",
"spark.cassandra.auth.username" -> "admin",
"spark.cassandra.auth.password" -> "password"
))
.mode("append")
.save()

Step 4: Application Migration

No Code Changes Needed! Just update connection configuration.

Before (Cassandra):

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
auth = PlainTextAuthProvider(username='cassandra', password='password')
cluster = Cluster(['cassandra.example.com'], port=9042, auth_provider=auth)
session = cluster.connect('my_keyspace')
# All CQL queries work identically
rows = session.execute("SELECT * FROM users WHERE user_id = ?", [user_id])
for row in rows:
print(row.username, row.email)

After (HeliosDB) - Only hostname changes:

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
auth = PlainTextAuthProvider(username='admin', password='password')
cluster = Cluster(['heliosdb.example.com'], port=9042, auth_provider=auth) # ← Only change
session = cluster.connect('my_keyspace')
# Same queries, no changes!
rows = session.execute("SELECT * FROM users WHERE user_id = ?", [user_id])
for row in rows:
print(row.username, row.email)

Configuration-Based Connection (best practice):

config.py
import os
CASSANDRA_HOST = os.environ.get('CASSANDRA_HOST', 'cassandra.example.com')
CASSANDRA_PORT = int(os.environ.get('CASSANDRA_PORT', '9042'))
CASSANDRA_USERNAME = os.environ.get('CASSANDRA_USERNAME', 'cassandra')
CASSANDRA_PASSWORD = os.environ.get('CASSANDRA_PASSWORD')
# app.py
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import config
auth = PlainTextAuthProvider(username=config.CASSANDRA_USERNAME, password=config.CASSANDRA_PASSWORD)
cluster = Cluster([config.CASSANDRA_HOST], port=config.CASSANDRA_PORT, auth_provider=auth)
session = cluster.connect('my_keyspace')

Update Environment Variables:

Terminal window
# For Cassandra
export CASSANDRA_HOST=cassandra.example.com
# For HeliosDB (only change this!)
export CASSANDRA_HOST=heliosdb.example.com

Spring Boot Configuration:

application.yml
spring:
data:
cassandra:
contact-points: ${CASSANDRA_HOST:cassandra.example.com}
port: 9042
keyspace-name: my_keyspace
username: ${CASSANDRA_USERNAME}
password: ${CASSANDRA_PASSWORD}
local-datacenter: datacenter1

Step 5: Validation

Row Count Validation:

-- Run on both Cassandra and HeliosDB
SELECT COUNT(*) FROM my_keyspace.users;
SELECT COUNT(*) FROM my_keyspace.orders;
SELECT COUNT(*) FROM my_keyspace.events;

Sample Data Comparison:

-- Compare first 100 rows (sorted by primary key)
SELECT * FROM users LIMIT 100;

Automated Validation Script:

validate_migration.py
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
# Connect to both
cassandra_cluster = Cluster(['cassandra.example.com'], port=9042,
auth_provider=PlainTextAuthProvider('cassandra', 'password'))
heliosdb_cluster = Cluster(['heliosdb.example.com'], port=9042,
auth_provider=PlainTextAuthProvider('admin', 'password'))
cassandra_session = cassandra_cluster.connect('my_keyspace')
heliosdb_session = heliosdb_cluster.connect('my_keyspace')
# Tables to validate
tables = ['users', 'orders', 'events']
validation_results = []
for table in tables:
# Count validation
cass_count = cassandra_session.execute(f"SELECT COUNT(*) FROM {table}").one()[0]
helios_count = heliosdb_session.execute(f"SELECT COUNT(*) FROM {table}").one()[0]
count_match = cass_count == helios_count
# Sample validation (first 1000 rows)
cass_sample = list(cassandra_session.execute(f"SELECT * FROM {table} LIMIT 1000"))
helios_sample = list(heliosdb_session.execute(f"SELECT * FROM {table} LIMIT 1000"))
sample_match = cass_sample == helios_sample
validation_results.append({
'table': table,
'cassandra_count': cass_count,
'heliosdb_count': helios_count,
'count_match': count_match,
'sample_match': sample_match,
'status': 'PASS' if count_match and sample_match else 'FAIL'
})
# Print results
for result in validation_results:
status_icon = '' if result['status'] == 'PASS' else ''
print(f"{status_icon} {result['table']}: {result['cassandra_count']} rows (match: {result['count_match']}, sample: {result['sample_match']})")
# Exit with error if any failures
if any(r['status'] == 'FAIL' for r in validation_results):
print("\n✗ Validation FAILED")
exit(1)
else:
print("\n✓ Validation PASSED")

Checksum Validation (for critical data):

import hashlib
def compute_table_checksum(session, table, sample_size=10000):
"""Compute checksum of table data"""
rows = session.execute(f"SELECT * FROM {table} LIMIT {sample_size}")
# Sort rows by primary key for deterministic order
sorted_rows = sorted(rows, key=lambda r: str(r))
# Compute checksum
hasher = hashlib.sha256()
for row in sorted_rows:
hasher.update(str(row).encode())
return hasher.hexdigest()
# Compare checksums
for table in tables:
cass_checksum = compute_table_checksum(cassandra_session, table)
helios_checksum = compute_table_checksum(heliosdb_session, table)
if cass_checksum == helios_checksum:
print(f"✓ {table} checksum match: {cass_checksum[:8]}...")
else:
print(f"✗ {table} checksum MISMATCH")
print(f" Cassandra: {cass_checksum}")
print(f" HeliosDB: {helios_checksum}")

Step 6: Performance Tuning

Connection Pooling:

from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT
from cassandra.policies import DCAwareRoundRobinPolicy, TokenAwarePolicy
# Optimized connection profile
profile = ExecutionProfile(
load_balancing_policy=TokenAwarePolicy(DCAwareRoundRobinPolicy()),
request_timeout=15.0,
consistency_level=ConsistencyLevel.LOCAL_QUORUM
)
cluster = Cluster(
['heliosdb.example.com'],
port=9042,
execution_profiles={EXEC_PROFILE_DEFAULT: profile},
protocol_version=5, # Use latest protocol
compression=True, # Enable compression
max_schema_agreement_wait=10
)

Consistency Level Adjustment:

-- Use local consistency levels for lower latency
-- (especially important for multi-region deployments)
-- Before (Cassandra)
SELECT * FROM users WHERE user_id = ? USING CONSISTENCY QUORUM;
-- After (HeliosDB - optimized)
SELECT * FROM users WHERE user_id = ? USING CONSISTENCY LOCAL_QUORUM;

Prepared Statement Caching:

# Cache prepared statements for better performance
prepared_statements = {}
def get_prepared_statement(session, query):
if query not in prepared_statements:
prepared_statements[query] = session.prepare(query)
return prepared_statements[query]
# Usage
stmt = get_prepared_statement(session, "SELECT * FROM users WHERE user_id = ?")
rows = session.execute(stmt, [user_id])

Query Optimization:

-- Before: Secondary index scan (slow)
SELECT * FROM orders WHERE customer_email = 'alice@example.com';
-- After: Partition key lookup (fast)
-- Option 1: Add materialized view (Phase 2 Week 3)
CREATE MATERIALIZED VIEW orders_by_email AS
SELECT * FROM orders
WHERE customer_email IS NOT NULL AND order_id IS NOT NULL
PRIMARY KEY (customer_email, order_id);
SELECT * FROM orders_by_email WHERE customer_email = 'alice@example.com';
-- Option 2: Use HeliosDB SQL interface for complex queries
SELECT * FROM orders WHERE customer_email = 'alice@example.com'
ALLOW FILTERING; -- Faster in HeliosDB due to query optimizer

Index Creation:

-- Create secondary indexes for common queries
CREATE INDEX ON users (email);
CREATE INDEX ON orders (status);
CREATE INDEX ON events (event_type);
-- Check index usage
SELECT * FROM system_schema.indexes WHERE keyspace_name = 'my_keyspace';

Feature Mapping

Data Types

Cassandra TypeHeliosDB StorageNotes
ASCIITEXTIdentical
BIGINTBIGINTIdentical
BLOBBYTEABinary data
BOOLEANBOOLEANIdentical
COUNTERBIGINTSpecial handling for increments
DATEINTEGERDays since epoch
DECIMALDECIMALArbitrary precision
DOUBLEDOUBLE PRECISION64-bit float
DURATIONINTERVALTime duration
FLOATREAL32-bit float
INETINETIP address
INTINTEGER32-bit int
LISTJSONBStored as JSON array
MAP<K,V>JSONBStored as JSON object
SETJSONBStored as JSON array (unique)
SMALLINTSMALLINT16-bit int
TEXTTEXTUTF-8 string
TIMETIMENanoseconds since midnight
TIMESTAMPBIGINTMilliseconds since epoch
TIMEUUIDUUIDVersion 1 UUID
TINYINTSMALLINT8-bit int → 16-bit
TUPLEJSONBStored as JSON array
UUIDUUIDVersion 4 UUID
UDTJSONBUser-defined type as JSON
VARCHARTEXTAlias for TEXT
VARINTNUMERICArbitrary precision integer

CQL Features

FeatureCassandraHeliosDBCompatibility
Queries
SELECT100%
INSERT100%
UPDATE100%
DELETE100%
BEGIN BATCH100%
Clauses
WHERE100%
ORDER BY100% (clustering only)
LIMIT100%
ALLOW FILTERING100% (faster in HeliosDB)
IF EXISTS / IF NOT EXISTS100%
IF 80% (basic LWT only)
USING TTL100%
USING TIMESTAMP100%
Functions
uuid()100%
now()100%
toTimestamp()100%
dateOf()100%
token()100%
ttl()100%
writetime()100%
COUNT(*)100%
SUM/AVG/MIN/MAX100%
Advanced
Prepared Statements100%
Paging100%
LOGGED BATCH100%
UNLOGGED BATCH100%
COUNTER BATCH100%
Static Columns100%
Frozen Collections100%
Materialized Views50% (parser only, execution Week 3)
Secondary Indexes100%
UDFs0% (Phase 3)
UDAs0% (Phase 3)
Triggers0% (not planned)

Troubleshooting

Common Issues

1. Consistency Level Errors

Problem: UnavailableException: Cannot achieve consistency level QUORUM

Cause: Not enough replicas available

Solution:

-- Use LOCAL_ONE or LOCAL_QUORUM instead
SELECT * FROM users WHERE user_id = ? USING CONSISTENCY LOCAL_ONE;

Configuration:

from cassandra.cluster import Cluster, ConsistencyLevel
cluster = Cluster(['heliosdb.example.com'])
session = cluster.connect()
session.default_consistency_level = ConsistencyLevel.LOCAL_QUORUM # Set default

2. Authentication Failures

Problem: AuthenticationException: Username and/or password are incorrect

Cause: PasswordAuthenticator mismatch

Solution:

Terminal window
# Check HeliosDB authentication configuration
heliosdb config get auth.authenticator
# Should output: PasswordAuthenticator
# Reset admin password
heliosdb user reset-password admin

Application Fix:

from cassandra.auth import PlainTextAuthProvider
# Ensure correct username/password
auth = PlainTextAuthProvider(username='admin', password='correct_password')
cluster = Cluster(['heliosdb.example.com'], auth_provider=auth)

3. Performance Degradation

Problem: Queries slower in HeliosDB than Cassandra

Diagnosis:

-- Check query plan
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

Solution 1: Missing Indexes

-- Create secondary index
CREATE INDEX ON users (email);

Solution 2: Optimize Consistency Level

# Use lower consistency for read-heavy workloads
session.default_consistency_level = ConsistencyLevel.LOCAL_ONE

Solution 3: Enable Query Cache

-- Enable intelligent caching (HeliosDB feature)
ALTER SYSTEM SET intelligent_cache = 'on';
ALTER SYSTEM SET cache_policy = 'ml_hybrid';

4. Data Type Mismatch

Problem: Collection data not readable after migration

Cause: Frozen vs. non-frozen collections

Solution:

-- Check table schema
DESC TABLE my_keyspace.users;
-- If collections are frozen, they're immutable
-- Recreate table with non-frozen collections if needed
CREATE TABLE users_v2 (
user_id UUID PRIMARY KEY,
tags SET<TEXT>, -- Non-frozen (mutable)
metadata MAP<TEXT, TEXT> -- Non-frozen (mutable)
);

5. Timeout Errors

Problem: OperationTimedOut: Operation timed out - received only 0 responses

Cause: Network latency, overloaded cluster, or large result set

Solution:

# Increase timeout
from cassandra.cluster import Cluster, ExecutionProfile
profile = ExecutionProfile(
request_timeout=30.0 # Increase from default 10s
)
cluster = Cluster(['heliosdb.example.com'],
execution_profiles={EXEC_PROFILE_DEFAULT: profile})

For Large Queries:

# Use paging for large result sets
from cassandra.query import SimpleStatement
stmt = SimpleStatement("SELECT * FROM large_table", fetch_size=1000)
for row in session.execute(stmt):
process(row)

6. Connection Pool Exhausted

Problem: NoHostAvailable: Unable to complete the operation against any hosts

Cause: Too many concurrent connections

Solution:

# Increase connection pool size
from cassandra.cluster import Cluster
cluster = Cluster(
['heliosdb.example.com'],
protocol_version=5,
executor_threads=4, # Increase thread pool
max_schema_agreement_wait=10,
control_connection_timeout=10,
idle_heartbeat_interval=30,
compression=True
)

Best Practices

1. Start with Non-Production Environments

Recommended Progression:

  1. Development (1-2 days): Test basic functionality
  2. Staging (1 week): Full integration testing
  3. Canary Production (1 week): 10% traffic to HeliosDB
  4. Production Rollout (2 weeks): Gradual 10% → 25% → 50% → 100%

2. Use Dual-Write for Critical Systems

Advantages:

  • Zero downtime
  • Easy rollback
  • Gradual confidence building

Implementation: See Strategy 1: Dual-Write


3. Monitor Replication Lag During Migration

Key Metrics:

  • Replication lag (target: <1 second)
  • Write throughput (both clusters)
  • Read latency (both clusters)
  • Error rates

Monitoring Script:

monitor_migration.sh
#!/bin/bash
while true; do
# Check replication lag
LAG=$(curl -s http://debezium:8083/connectors/cassandra-heliosdb/status | jq '.lag_seconds')
# Check write throughput
CASSANDRA_WRITES=$(nodetool tpstats | grep 'MutationStage' | awk '{print $5}')
HELIOSDB_WRITES=$(heliosdb stats writes)
# Print dashboard
clear
echo "=== Migration Monitor ==="
echo "Replication Lag: ${LAG}s"
echo "Cassandra Writes: ${CASSANDRA_WRITES}/s"
echo "HeliosDB Writes: ${HELIOSDB_WRITES}/s"
echo "========================"
sleep 5
done

4. Keep Cassandra as Fallback Initially

Recommendation: Keep Cassandra cluster running for 1-2 weeks after cutover

Rollback Preparation:

Terminal window
# Keep Cassandra config readily available
export CASSANDRA_HOST_PRIMARY=heliosdb.example.com
export CASSANDRA_HOST_FALLBACK=cassandra.example.com
# Quick rollback script
cat > rollback.sh <<'EOF'
#!/bin/bash
kubectl set env deployment/my-app CASSANDRA_HOST=$CASSANDRA_HOST_FALLBACK
kubectl rollout restart deployment/my-app
EOF
chmod +x rollback.sh

5. Test Rollback Procedure

Before Cutover:

Terminal window
# Test rollback in staging
1. Switch to HeliosDB
2. Verify application works
3. Execute rollback script
4. Verify application still works with Cassandra

6. Document Your Migration

Migration Runbook Template:

# Migration Runbook: Cassandra → HeliosDB
## Pre-Migration Checklist
- [ ] HeliosDB cluster provisioned and tested
- [ ] Schema migrated and validated
- [ ] Data migration method selected
- [ ] Rollback procedure tested
- [ ] Monitoring dashboards configured
- [ ] On-call team notified
## Migration Steps
1. [ ] Enable dual-write (or start replication)
2. [ ] Begin data backfill
3. [ ] Validate data consistency
4. [ ] Route 10% read traffic to HeliosDB
5. [ ] Monitor for 24 hours
6. [ ] Route 50% read traffic to HeliosDB
7. [ ] Monitor for 24 hours
8. [ ] Route 100% traffic to HeliosDB
9. [ ] Monitor for 1 week
10. [ ] Decommission Cassandra
## Rollback Procedure
1. Execute: ./rollback.sh
2. Verify traffic routing to Cassandra
3. Monitor error rates
4. Investigate HeliosDB issue
## Success Criteria
- Row counts match (100%)
- Query latency < baseline + 10%
- Error rate < 0.01%
- No data loss

Rollback Plan

Scenario 1: Issues During Dual-Write Phase

Symptoms: Data inconsistency, high error rates

Action:

Terminal window
# 1. Stop dual-write
# Comment out dual-write code or set environment variable
export ENABLE_DUAL_WRITE=false
# 2. Restart application
kubectl rollout restart deployment/my-app
# 3. Continue writing to Cassandra only
# 4. Investigate HeliosDB issues
heliosdb logs --tail 1000 | grep ERROR
# 5. Retry after fixes

Impact: No downtime (still using Cassandra)


Scenario 2: Issues After Partial Cutover

Symptoms: Performance degradation, query errors

Action:

Terminal window
# 1. Immediately route all traffic back to Cassandra
kubectl set env deployment/my-app CASSANDRA_HOST=cassandra.example.com
kubectl rollout restart deployment/my-app
# 2. Verify application health
kubectl logs -f deployment/my-app
# 3. Analyze HeliosDB issues
heliosdb diagnose --export /tmp/diagnostics.tar.gz
# 4. Contact support if needed

Impact: Brief downtime during restart (2-5 minutes)


Scenario 3: Data Loss Detected

Symptoms: Row count mismatch, missing records

Action:

Terminal window
# 1. IMMEDIATE ROLLBACK
./rollback.sh
# 2. Stop all writes to HeliosDB
heliosdb cluster pause --writes-only
# 3. Export HeliosDB data for forensics
cqlsh heliosdb.example.com -e "COPY my_keyspace.users TO '/tmp/heliosdb_users.csv'"
# 4. Compare with Cassandra
diff <(wc -l /tmp/cassandra_users.csv) <(wc -l /tmp/heliosdb_users.csv)
# 5. Re-run migration with identified fixes

Impact: Rollback to Cassandra (5-10 minutes downtime)


Case Studies

Case Study 1: E-Commerce Platform (500M Daily Writes)

Customer: Large e-commerce company

Cassandra Setup:

  • 30-node cluster
  • 100 TB data
  • 500M writes/day
  • 2B reads/day

Migration Strategy: Dual-write

Timeline:

  • Week 1: Schema migration, initial backfill
  • Week 2: Dual-write enabled, 10% read traffic
  • Week 3: 50% read traffic
  • Week 4: 100% read traffic, write migration
  • Week 5: Cassandra decommissioned

Results:

  • Zero downtime
  • 35% cost reduction
  • 2.5x faster analytics queries
  • No data loss

Key Success Factors:

  • Comprehensive monitoring
  • Gradual rollout
  • 24/7 on-call support during migration

Case Study 2: IoT Sensor Network (10K Devices)

Customer: Industrial IoT platform

Cassandra Setup:

  • 6-node cluster
  • 20 TB time-series data
  • 10K devices × 1 msg/sec = 10K writes/sec

Migration Strategy: Replication-based

Timeline:

  • Day 1: Schema migration
  • Day 2-3: Initial data replication (20 TB)
  • Day 4: Replication caught up (lag < 1s)
  • Day 5: Cutover (3-minute downtime)
  • Day 6-12: Monitoring

Results:

  • 3-minute downtime
  • 50% cost reduction (tiered storage)
  • 10x faster downsampling queries
  • Successful migration

Key Success Factors:

  • Low downtime tolerance (not 24/7 critical)
  • Time-series data (less complex than transactional)
  • HeliosDB edge sync for IoT devices

Case Study 3: SaaS Multi-Tenant Application

Customer: B2B SaaS provider (5000 tenants)

Cassandra Setup:

  • 12-node cluster
  • 50 TB data
  • Multi-tenant (5000 tenants)

Migration Strategy: Tenant-by-tenant migration

Timeline:

  • Week 1-2: Migrate 10 test tenants
  • Week 3-4: Migrate 100 small tenants
  • Week 5-6: Migrate 500 medium tenants
  • Week 7-10: Migrate 4390 remaining tenants

Results:

  • Zero downtime (per-tenant migration)
  • 40% cost reduction
  • Multi-protocol access (CQL + SQL)
  • Tenant isolation improved

Key Success Factors:

  • Tenant-by-tenant rollout reduced risk
  • Automated migration tooling
  • HeliosDB multi-tenancy features

Next Steps

1. Join HeliosDB Community

Resources:


2. Schedule Migration Planning Call

Contact: migrations@heliosdb.com

Topics:

  • Migration strategy selection
  • Timeline estimation
  • Resource requirements
  • Risk assessment

3. Access Enterprise Features

Enterprise Add-Ons:

  • 24/7 migration support
  • Dedicated migration engineer
  • Advanced monitoring and alerting
  • Custom tooling development
  • SLA guarantees

Contact Sales: sales@heliosdb.com


4. Training and Certification

HeliosDB Academy:

  • Cassandra to HeliosDB Migration (4-hour course)
  • HeliosDB Administration (8-hour course)
  • Multi-Protocol Database Design (4-hour course)

Register: https://academy.heliosdb.com


Appendix A: Glossary

TermDefinition
CQLCassandra Query Language
KeyspaceTop-level data container (similar to database)
Partition KeyColumn(s) determining data distribution across nodes
Clustering KeyColumn(s) determining sort order within partition
Consistency LevelHow many replicas must respond to a query (ONE, QUORUM, ALL)
LWTLightweight Transactions (compare-and-set operations)
UDTUser-Defined Type (custom data type)
TTLTime-to-Live (automatic data expiration)
SSTableSorted String Table (Cassandra’s on-disk format)
CompactionBackground process to merge SSTables

Appendix B: Client Driver Compatibility

DriverVersionHeliosDB Compatibility
DataStax Python Driver3.25+100% compatible
DataStax Java Driver4.15+100% compatible
DataStax C++ Driver2.16+100% compatible
DataStax Node.js Driver4.6+100% compatible
DataStax C# Driver3.18+100% compatible
GoCQL1.0+100% compatible
Rust Driver (Scylla)0.9+100% compatible
cqlsh5.0.1+100% compatible

Appendix C: HeliosDB-Specific Enhancements

1. SQL Interface on CQL Data

-- Query Cassandra data with SQL
SELECT u.username, COUNT(o.order_id) AS order_count
FROM my_keyspace.users u
LEFT JOIN my_keyspace.orders o ON u.user_id = o.user_id
GROUP BY u.username
HAVING COUNT(o.order_id) > 10
ORDER BY order_count DESC;

2. Intelligent Query Caching

-- Enable ML-based query caching (HeliosDB feature)
ALTER SYSTEM SET intelligent_cache = 'on';
-- Cached queries are automatically faster on subsequent runs
SELECT * FROM large_table WHERE status = 'active'; -- 500ms (first run)
SELECT * FROM large_table WHERE status = 'active'; -- 2ms (cached)

3. Multi-Protocol Access

# Access same data via CQL and PostgreSQL protocols
from cassandra.cluster import Cluster
import psycopg2
# CQL protocol
cql_cluster = Cluster(['heliosdb.example.com'], port=9042)
cql_session = cql_cluster.connect('my_keyspace')
cql_session.execute("INSERT INTO users (user_id, name) VALUES (?, ?)", [uuid4(), 'Alice'])
# PostgreSQL protocol (same data!)
pg_conn = psycopg2.connect(host='heliosdb.example.com', port=5432, database='my_keyspace')
pg_cursor = pg_conn.cursor()
pg_cursor.execute("SELECT * FROM users WHERE name = %s", ['Alice'])
print(pg_cursor.fetchall())

Document Information

Version: 1.0 Last Updated: November 11, 2025 Maintainers: HeliosDB Migration Team Feedback: migrations@heliosdb.com

Related Documentation: