Cassandra to HeliosDB Migration Guide
Cassandra to HeliosDB Migration Guide
Version: 1.0 Last Updated: January 2026
Table of Contents
- Introduction
- Compatibility Overview
- Pre-Migration Assessment
- Conceptual Mapping
- Step-by-Step Migration
- CQL Query Compatibility
- Feature Mapping
- Consistency and Replication
- Post-Migration Validation
- Performance Tuning
- Common Issues and Troubleshooting
1. Introduction
Why Migrate from Cassandra to HeliosDB?
HeliosDB provides full CQL wire protocol compatibility while offering significant advantages over Apache Cassandra:
Performance Benefits
- Lower Latency: HeliosDB’s unified storage engine provides sub-millisecond query response times
- Higher Throughput: Optimized query execution delivers 100K+ operations per second
- Efficient Compression: Advanced compression algorithms (LZ4, Snappy) with better compression ratios
- Intelligent Caching: Automatic query and data caching without manual tuning
Operational Benefits
- Simplified Operations: No need for nodetool, repair operations, or complex tuning
- Zero-Downtime Scaling: Automatic rebalancing without manual intervention
- Unified Management: Single interface for all protocols (SQL, CQL, MongoDB, etc.)
- Built-in Monitoring: Comprehensive metrics without JMX configuration
Feature Enhancements
- Multi-Protocol Access: Access same data via SQL, CQL, REST, or GraphQL
- Advanced Analytics: Full SQL support including window functions and CTEs
- ACID Transactions: True ACID compliance beyond Cassandra’s lightweight transactions
- AI Integration: Built-in ML/AI capabilities and natural language queries
Cost Reduction
- Lower Hardware Requirements: More efficient resource utilization
- Reduced Operational Overhead: Fewer administrators needed
- No Licensing Fees: Open-source with enterprise features included
Migration Compatibility
HeliosDB supports Apache Cassandra Native Protocol v4/v5 with 100% CQL language compatibility. This means:
- Existing Cassandra drivers work without modification
- CQL queries execute without changes
- Application code requires minimal to no changes
- Schema can be migrated directly
2. Compatibility Overview
CQL Wire Protocol Support
| Feature | Compatibility | Notes |
|---|---|---|
| Native Protocol v4 | 100% | Full support |
| Native Protocol v5 | 100% | Full support |
| Frame Compression (LZ4) | 100% | Recommended |
| Frame Compression (Snappy) | 100% | Supported |
| SASL Authentication | 100% | Password, LDAP |
| Prepared Statements | 100% | Full caching |
| Batch Operations | 100% | All batch types |
| Paging | 100% | Cursor-based |
| Event Notifications | 100% | SCHEMA_CHANGE, etc. |
Driver Compatibility Matrix
| Language | Driver | Version | Status |
|---|---|---|---|
| Java | DataStax Java Driver | 4.x | Fully Compatible |
| Python | cassandra-driver | 3.x | Fully Compatible |
| Node.js | cassandra-driver | 4.x | Fully Compatible |
| Go | gocql | 1.x | Fully Compatible |
| C# | CassandraCSharpDriver | 3.x | Fully Compatible |
| C++ | cpp-driver | 2.x | Fully Compatible |
| Rust | scylla-rust-driver | 0.x | Fully Compatible |
cqlsh Compatibility
# Connect to HeliosDB using standard cqlshcqlsh localhost 9042
# All cqlsh commands work identicallyDESCRIBE KEYSPACES;DESCRIBE TABLES;TRACING ON;3. Pre-Migration Assessment
3.1 Pre-Migration Checklist
- Complete backup of Cassandra cluster
- Document current cluster topology
- Inventory all keyspaces and tables
- Analyze partition key distribution
- Estimate total data volume
- Identify data with TTL requirements
- Review materialized views
- Catalog user-defined types (UDTs)
- List secondary indexes
- Document consistency level requirements
- Inventory application connection configurations
- Plan migration window and rollback strategy
- Test migration in development environment
3.2 Keyspace and Table Analysis
Export Keyspace Schema
# Export all keyspace schemascqlsh -e "DESCRIBE KEYSPACES" > keyspace_list.txt
# Export detailed schema for each keyspacefor ks in $(cqlsh -e "DESCRIBE KEYSPACES" | tr ' ' '\n' | grep -v '^$'); do cqlsh -e "DESCRIBE KEYSPACE $ks" > schema_${ks}.cqldone
# Export full cluster schemacqlsh -e "DESCRIBE CLUSTER" > cluster_info.txtcqlsh -e "DESCRIBE SCHEMA" > full_schema.cqlAnalyze Table Structure
-- List all tables in a keyspaceSELECT table_name, idFROM system_schema.tablesWHERE keyspace_name = 'your_keyspace';
-- Get table detailsSELECT *FROM system_schema.columnsWHERE keyspace_name = 'your_keyspace'AND table_name = 'your_table';
-- Check for clustering columnsSELECT column_name, clustering_order, kindFROM system_schema.columnsWHERE keyspace_name = 'your_keyspace'AND table_name = 'your_table'AND kind IN ('partition_key', 'clustering');3.3 Data Volume Estimation
Cassandra Data Size Analysis
# Check data directory sizedu -sh /var/lib/cassandra/data/*
# Per-table statistics using nodetoolnodetool tablestats your_keyspace.your_tableQuery Table Statistics
-- Estimate row counts (approximate)SELECT count(*) FROM your_keyspace.your_table;
-- For large tables, use token rangesSELECT count(*) FROM your_keyspace.your_tableWHERE token(partition_key) > token(0)AND token(partition_key) <= token(1000000);Storage Planning for HeliosDB
| Cassandra Size | Recommended HeliosDB Storage | Notes |
|---|---|---|
| < 100 GB | 1.2x Cassandra size | Compression efficiency |
| 100 GB - 1 TB | 1.0x - 1.1x | Index optimization |
| > 1 TB | 0.9x - 1.0x | Better compression ratio |
3.4 Partition Key Analysis
Understanding partition key distribution is critical for migration planning:
-- Analyze partition key cardinalitySELECT partition_key, count(*) as row_countFROM your_keyspace.your_tableGROUP BY partition_keyLIMIT 1000;
-- Identify hot partitionsSELECT partition_key, count(*) as row_countFROM your_keyspace.your_tableGROUP BY partition_keyHAVING count(*) > 10000ALLOW FILTERING;Identify Large Partitions
# Use nodetool to find large partitionsnodetool tablehistograms your_keyspace your_table
# Check for partition tombstonesnodetool cfstats your_keyspace.your_table | grep -E "Tombstones|Partition"3.5 Feature Usage Inventory
Document usage of the following features:
| Feature | Used? | Tables Affected | Notes |
|---|---|---|---|
| Materialized Views | |||
| Secondary Indexes | |||
| User-Defined Types | |||
| Counter Columns | |||
| TTL (Time-To-Live) | |||
| Lightweight Transactions | |||
| Collections (List/Set/Map) | |||
| Custom Types | |||
| UDFs/UDAs |
4. Conceptual Mapping
4.1 Keyspaces to Schemas
Cassandra keyspaces map directly to HeliosDB schemas:
| Cassandra | HeliosDB | Notes |
|---|---|---|
| Keyspace | Schema | 1:1 mapping |
| Replication Strategy | Replication Config | Simplified management |
| Durable Writes | Transaction Mode | Always durable |
Keyspace Migration Example
-- Cassandra keyspace definitionCREATE KEYSPACE production WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 2} AND DURABLE_WRITES = true;
-- HeliosDB equivalent (same syntax works)CREATE KEYSPACE production WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 2} AND DURABLE_WRITES = true;
-- Note: HeliosDB manages replication automatically-- Replication strategy is honored for compatibility4.2 Tables with Clustering Columns
Cassandra’s wide-column model maps to HeliosDB’s relational storage:
| Cassandra Concept | HeliosDB Mapping | Description |
|---|---|---|
| Partition Key | Primary Key (first column) | Hash distribution |
| Clustering Columns | Composite Primary Key | Sort order within partition |
| Regular Columns | Standard Columns | Data columns |
| Static Columns | Indexed Columns | Shared within partition |
Table Migration Example
-- Cassandra table with clustering columnsCREATE TABLE user_events ( user_id UUID, event_date DATE, event_time TIMESTAMP, event_type TEXT, event_data MAP<TEXT, TEXT>, PRIMARY KEY ((user_id, event_date), event_time)) WITH CLUSTERING ORDER BY (event_time DESC) AND default_time_to_live = 86400 AND gc_grace_seconds = 864000;
-- HeliosDB: Same DDL works identicallyCREATE TABLE user_events ( user_id UUID, event_date DATE, event_time TIMESTAMP, event_type TEXT, event_data MAP<TEXT, TEXT>, PRIMARY KEY ((user_id, event_date), event_time)) WITH CLUSTERING ORDER BY (event_time DESC) AND default_time_to_live = 86400 AND gc_grace_seconds = 864000;4.3 Materialized Views Mapping
Cassandra materialized views are fully supported:
-- Cassandra materialized viewCREATE MATERIALIZED VIEW users_by_email AS SELECT * FROM users WHERE email IS NOT NULL AND user_id IS NOT NULL PRIMARY KEY (email, user_id);
-- HeliosDB: Identical syntax, automatic maintenanceCREATE MATERIALIZED VIEW users_by_email AS SELECT * FROM users WHERE email IS NOT NULL AND user_id IS NOT NULL PRIMARY KEY (email, user_id);
-- HeliosDB advantage: Views can also be created as SQL views-- for more complex transformationsCREATE VIEW users_with_orders AS SELECT u.*, COUNT(o.order_id) as order_count FROM users u LEFT JOIN orders o ON u.user_id = o.user_id GROUP BY u.user_id;4.4 Data Type Mapping
| Cassandra Type | HeliosDB Type | Notes |
|---|---|---|
| ASCII | TEXT | UTF-8 normalized |
| BIGINT | BIGINT | Identical |
| BLOB | BYTEA | Binary storage |
| BOOLEAN | BOOLEAN | Identical |
| COUNTER | BIGINT + trigger | Counter semantics |
| DATE | DATE | Identical |
| DECIMAL | DECIMAL/NUMERIC | Arbitrary precision |
| DOUBLE | DOUBLE PRECISION | 64-bit IEEE |
| DURATION | INTERVAL | Time intervals |
| FLOAT | REAL | 32-bit IEEE |
| INET | INET | IPv4/IPv6 |
| INT | INTEGER | 32-bit signed |
| SMALLINT | SMALLINT | 16-bit signed |
| TEXT | TEXT | UTF-8 string |
| TIME | TIME | Time of day |
| TIMESTAMP | TIMESTAMPTZ | With timezone |
| TIMEUUID | UUID | Time-based UUID |
| TINYINT | SMALLINT | Mapped to 16-bit |
| UUID | UUID | 128-bit UUID |
| VARCHAR | TEXT | UTF-8 string |
| VARINT | NUMERIC | Arbitrary integer |
| LIST | JSONB array | Ordered collection |
| SET | JSONB array | Unique values |
| MAP<K,V> | JSONB object | Key-value pairs |
| TUPLE<…> | JSONB array | Fixed-length |
| FROZEN | JSONB | Immutable |
| UDT | JSONB | User-defined type |
5. Step-by-Step Migration
5.1 Phase 1: Schema Translation
Export Cassandra Schema
# Export complete schemacqlsh cassandra-host -e "DESCRIBE SCHEMA" > cassandra_schema.cql
# Export specific keyspacecqlsh cassandra-host -e "DESCRIBE KEYSPACE production" > production_schema.cqlSchema Validation Script
#!/usr/bin/env python3"""Cassandra to HeliosDB Schema ValidatorValidates CQL schema compatibility before migration."""
import reimport sys
def validate_schema(schema_file): """Validate CQL schema for HeliosDB compatibility.""" with open(schema_file, 'r') as f: schema = f.read()
warnings = []
# Check for unsupported features if 'CUSTOM INDEX' in schema.upper(): warnings.append("CUSTOM INDEX: SASI indexes have partial support")
if 'TINYINT' in schema.upper(): warnings.append("TINYINT: Mapped to SMALLINT in HeliosDB")
if 'COUNTER' in schema.upper(): warnings.append("COUNTER: Uses trigger-based implementation")
# Report findings if warnings: print("Schema Compatibility Warnings:") for w in warnings: print(f" - {w}") else: print("Schema is fully compatible with HeliosDB")
return len(warnings) == 0
if __name__ == "__main__": validate_schema(sys.argv[1])Import Schema to HeliosDB
# Connect to HeliosDB and import schemacqlsh heliosdb-host 9042 -f cassandra_schema.cql
# Verify schema importcqlsh heliosdb-host 9042 -e "DESCRIBE KEYSPACES"cqlsh heliosdb-host 9042 -e "DESCRIBE KEYSPACE production"5.2 Phase 2: Data Export from Cassandra
Method 1: COPY TO (Recommended for < 100GB)
-- Export table data to CSVCOPY production.users TO '/data/export/users.csv'WITH HEADER = trueAND DELIMITER = '|'AND NULL = ''AND MAXREQUESTS = 6AND PAGESIZE = 5000;
-- Export with specific columnsCOPY production.events (user_id, event_time, event_type, event_data)TO '/data/export/events.csv'WITH HEADER = true;Method 2: Spark for Large Datasets
// Spark DataFrame export for large tablesimport org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.cassandra._
val spark = SparkSession.builder() .appName("CassandraExport") .config("spark.cassandra.connection.host", "cassandra-host") .getOrCreate()
// Read from Cassandraval df = spark.read .format("org.apache.spark.sql.cassandra") .options(Map("keyspace" -> "production", "table" -> "users")) .load()
// Export to Parquet for efficient transferdf.write .mode("overwrite") .parquet("/data/export/users_parquet")Method 3: sstableloader (Advanced)
# Export SSTables (for very large datasets)# Note: This requires direct access to Cassandra data directory
# First, flush memtablesnodetool flush production users
# Copy SSTable filescp -r /var/lib/cassandra/data/production/users-*/* /data/export/sstables/
# For HeliosDB import, convert to CQL or CSV format# HeliosDB provides conversion utilityheliosdb-tools convert-sstable \ --input /data/export/sstables \ --output /data/export/users.cql \ --keyspace production \ --table users5.3 Phase 3: Data Import into HeliosDB
Method 1: COPY FROM
-- Import data from CSVUSE production;
COPY users FROM '/data/export/users.csv'WITH HEADER = trueAND DELIMITER = '|'AND NULL = ''AND CHUNKSIZE = 1000;
-- Verify importSELECT COUNT(*) FROM users;Method 2: Bulk Insert with Python
#!/usr/bin/env python3"""Bulk data migration from Cassandra to HeliosDB."""
from cassandra.cluster import Clusterfrom cassandra.query import BatchStatement, BatchTypeimport csvimport uuid
# Source: Cassandrasource_cluster = Cluster(['cassandra-host'])source_session = source_cluster.connect('production')
# Target: HeliosDBtarget_cluster = Cluster(['heliosdb-host'])target_session = target_cluster.connect('production')
def migrate_table(table_name, batch_size=100): """Migrate a single table with batching."""
# Get all rows from source rows = source_session.execute(f"SELECT * FROM {table_name}")
# Prepare insert statement columns = rows.column_names placeholders = ', '.join(['?' for _ in columns]) insert_cql = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})" prepared = target_session.prepare(insert_cql)
# Batch insert batch = BatchStatement(batch_type=BatchType.UNLOGGED) count = 0 total = 0
for row in rows: batch.add(prepared, list(row)) count += 1
if count >= batch_size: target_session.execute(batch) total += count print(f"Migrated {total} rows...") batch = BatchStatement(batch_type=BatchType.UNLOGGED) count = 0
# Final batch if count > 0: target_session.execute(batch) total += count
print(f"Migration complete: {total} rows migrated to {table_name}") return total
# Migrate tablestables = ['users', 'events', 'orders']for table in tables: migrate_table(table)
source_cluster.shutdown()target_cluster.shutdown()Method 3: Parallel Migration for Large Datasets
#!/usr/bin/env python3"""Parallel token-range migration for large datasets."""
from cassandra.cluster import Clusterfrom concurrent.futures import ThreadPoolExecutor, as_completedimport math
def get_token_ranges(num_ranges=16): """Generate token ranges for parallel processing.""" min_token = -2**63 max_token = 2**63 - 1 range_size = (max_token - min_token) // num_ranges
ranges = [] current = min_token for i in range(num_ranges): end = current + range_size if i < num_ranges - 1 else max_token ranges.append((current, end)) current = end return ranges
def migrate_token_range(source_session, target_session, table, token_range, prepared): """Migrate a single token range.""" start_token, end_token = token_range
query = f""" SELECT * FROM {table} WHERE token(partition_key) >= {start_token} AND token(partition_key) < {end_token} """
rows = source_session.execute(query) count = 0
for row in rows: target_session.execute(prepared, list(row)) count += 1
return count
def parallel_migrate(table_name, partition_key_column, num_workers=8): """Migrate table using parallel token ranges.""" source = Cluster(['cassandra-host']).connect('production') target = Cluster(['heliosdb-host']).connect('production')
# Prepare insert # ... (similar to previous example)
token_ranges = get_token_ranges(num_workers * 2) total_migrated = 0
with ThreadPoolExecutor(max_workers=num_workers) as executor: futures = [ executor.submit(migrate_token_range, source, target, table_name, tr, prepared) for tr in token_ranges ]
for future in as_completed(futures): total_migrated += future.result() print(f"Progress: {total_migrated} rows migrated")
return total_migrated5.4 Phase 4: Application CQL Driver Configuration
Java Application Configuration
// Before (Cassandra)CqlSession session = CqlSession.builder() .addContactPoint(new InetSocketAddress("cassandra-host", 9042)) .withLocalDatacenter("dc1") .withKeyspace("production") .build();
// After (HeliosDB) - Minimal changesCqlSession session = CqlSession.builder() .addContactPoint(new InetSocketAddress("heliosdb-host", 9042)) // Change host .withLocalDatacenter("datacenter1") // HeliosDB default datacenter .withKeyspace("production") .build();Python Application Configuration
# Before (Cassandra)from cassandra.cluster import Cluster
cluster = Cluster(['cassandra-node1', 'cassandra-node2', 'cassandra-node3'])session = cluster.connect('production')
# After (HeliosDB) - Change contact points onlycluster = Cluster(['heliosdb-host']) # Single endpoint, HeliosDB handles routingsession = cluster.connect('production')Node.js Application Configuration
// Before (Cassandra)const cassandra = require('cassandra-driver');
const client = new cassandra.Client({ contactPoints: ['cassandra-node1', 'cassandra-node2'], localDataCenter: 'dc1', keyspace: 'production'});
// After (HeliosDB)const client = new cassandra.Client({ contactPoints: ['heliosdb-host'], localDataCenter: 'datacenter1', keyspace: 'production'});Go Application Configuration
// Before (Cassandra)cluster := gocql.NewCluster("cassandra-node1", "cassandra-node2")cluster.Keyspace = "production"cluster.Consistency = gocql.Quorum
// After (HeliosDB)cluster := gocql.NewCluster("heliosdb-host")cluster.Keyspace = "production"cluster.Consistency = gocql.Quorum // Consistency levels work identically6. CQL Query Compatibility
6.1 Supported CQL Statements
Data Definition Language (DDL)
| Statement | Support | Example |
|---|---|---|
| CREATE KEYSPACE | Full | CREATE KEYSPACE ks WITH REPLICATION = {...} |
| ALTER KEYSPACE | Full | ALTER KEYSPACE ks WITH REPLICATION = {...} |
| DROP KEYSPACE | Full | DROP KEYSPACE IF EXISTS ks |
| CREATE TABLE | Full | CREATE TABLE t (id UUID PRIMARY KEY, ...) |
| ALTER TABLE | Full | ALTER TABLE t ADD column TEXT |
| DROP TABLE | Full | DROP TABLE IF EXISTS t |
| TRUNCATE | Full | TRUNCATE TABLE t |
| CREATE INDEX | Full | CREATE INDEX ON t (column) |
| DROP INDEX | Full | DROP INDEX idx_name |
| CREATE TYPE | Full | CREATE TYPE address (street TEXT, ...) |
| DROP TYPE | Full | DROP TYPE address |
| CREATE FUNCTION | Full | CREATE FUNCTION fn(...) ... |
| DROP FUNCTION | Full | DROP FUNCTION fn |
| CREATE MATERIALIZED VIEW | Full | CREATE MATERIALIZED VIEW mv AS ... |
| DROP MATERIALIZED VIEW | Full | DROP MATERIALIZED VIEW mv |
Data Manipulation Language (DML)
-- All standard DML operations are supported
-- SELECT with all clausesSELECT * FROM usersWHERE user_id = ?AND event_time >= ?AND event_time < ?ORDER BY event_time DESCLIMIT 100;
-- SELECT with IN clauseSELECT * FROM users WHERE user_id IN (?, ?, ?);
-- SELECT with token rangeSELECT * FROM users WHERE token(user_id) > ? LIMIT 1000;
-- SELECT with ALLOW FILTERINGSELECT * FROM users WHERE email = ? ALLOW FILTERING;
-- SELECT as JSONSELECT JSON * FROM users WHERE user_id = ?;
-- INSERT with optionsINSERT INTO users (user_id, name, email)VALUES (?, ?, ?)USING TTL 86400 AND TIMESTAMP 1234567890;
-- INSERT IF NOT EXISTSINSERT INTO users (user_id, name) VALUES (?, ?)IF NOT EXISTS;
-- UPDATE with conditionsUPDATE users SET email = ?WHERE user_id = ?IF name = ?;
-- DELETE with conditionsDELETE FROM users WHERE user_id = ? IF EXISTS;
-- BATCH operationsBEGIN BATCH INSERT INTO users (user_id, name) VALUES (?, ?); UPDATE user_stats SET login_count = login_count + 1 WHERE user_id = ?;APPLY BATCH;6.2 Lightweight Transactions (LWT)
HeliosDB fully supports Cassandra’s lightweight transactions:
-- Insert if not existsINSERT INTO users (user_id, username, email)VALUES (uuid(), 'newuser', 'new@example.com')IF NOT EXISTS;
-- Update with conditionUPDATE users SET email = 'verified@example.com'WHERE user_id = 550e8400-e29b-41d4-a716-446655440000IF email = 'unverified@example.com';
-- Multiple conditionsUPDATE users SET status = 'active'WHERE user_id = ?IF status = 'pending' AND created_at < ?;
-- Delete with existence checkDELETE FROM usersWHERE user_id = ?IF EXISTS;
-- LWT with IN operatorUPDATE users SET verified = trueWHERE user_id = ?IF status IN ('pending', 'unverified');
-- LWT returns applied status-- [applied] | user_id | ...-- True | ... | ...-- False | ... | ... (shows current values if not applied)6.3 Time-Series Data Handling
HeliosDB excels at time-series workloads common in Cassandra:
-- Time-series table design (identical to Cassandra best practices)CREATE TABLE sensor_readings ( sensor_id UUID, bucket_date DATE, reading_time TIMESTAMP, temperature DOUBLE, humidity DOUBLE, pressure DOUBLE, PRIMARY KEY ((sensor_id, bucket_date), reading_time)) WITH CLUSTERING ORDER BY (reading_time DESC) AND default_time_to_live = 2592000; -- 30 days
-- Efficient time-range queriesSELECT * FROM sensor_readingsWHERE sensor_id = ?AND bucket_date = ?AND reading_time >= ?AND reading_time < ?ORDER BY reading_time DESC;
-- Aggregations (HeliosDB enhancement: more aggregate functions)SELECT sensor_id, bucket_date, AVG(temperature) as avg_temp, MAX(temperature) as max_temp, MIN(temperature) as min_temp, COUNT(*) as reading_countFROM sensor_readingsWHERE sensor_id = ? AND bucket_date = ?;
-- HeliosDB bonus: Window functions for time-series analysis-- (Via SQL interface, same data accessible)SELECT reading_time, temperature, AVG(temperature) OVER ( PARTITION BY sensor_id ORDER BY reading_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW ) as moving_avgFROM sensor_readingsWHERE sensor_id = ? AND bucket_date = ?;7. Feature Mapping
7.1 TTL Support
TTL (Time-To-Live) is fully supported at both row and column level:
-- Table-level default TTLCREATE TABLE sessions ( session_id UUID PRIMARY KEY, user_id UUID, data BLOB) WITH default_time_to_live = 3600; -- 1 hour
-- Row-level TTL on INSERTINSERT INTO sessions (session_id, user_id, data)VALUES (uuid(), ?, ?)USING TTL 7200; -- 2 hours
-- Row-level TTL on UPDATEUPDATE sessions USING TTL 1800SET data = ?WHERE session_id = ?;
-- Query TTL valueSELECT session_id, TTL(data) FROM sessions WHERE session_id = ?;
-- HeliosDB extension: Modify TTL without changing dataUPDATE sessions USING TTL 3600SET data = data -- No-op assignmentWHERE session_id = ?;7.2 Collections (Sets, Lists, Maps)
All collection types are supported with full operation semantics:
SET Operations
-- Create table with SETCREATE TABLE user_tags ( user_id UUID PRIMARY KEY, tags SET<TEXT>);
-- Add to setUPDATE user_tags SET tags = tags + {'premium', 'verified'}WHERE user_id = ?;
-- Remove from setUPDATE user_tags SET tags = tags - {'trial'}WHERE user_id = ?;
-- Replace entire setUPDATE user_tags SET tags = {'member'}WHERE user_id = ?;
-- Query set contentsSELECT tags FROM user_tags WHERE user_id = ?;LIST Operations
-- Create table with LISTCREATE TABLE user_history ( user_id UUID PRIMARY KEY, recent_pages LIST<TEXT>);
-- Append to listUPDATE user_history SET recent_pages = recent_pages + ['/dashboard']WHERE user_id = ?;
-- Prepend to listUPDATE user_history SET recent_pages = ['/home'] + recent_pagesWHERE user_id = ?;
-- Remove by valueUPDATE user_history SET recent_pages = recent_pages - ['/old-page']WHERE user_id = ?;
-- Update by indexUPDATE user_history SET recent_pages[0] = '/new-home'WHERE user_id = ?;
-- Delete by indexDELETE recent_pages[0] FROM user_history WHERE user_id = ?;MAP Operations
-- Create table with MAPCREATE TABLE user_preferences ( user_id UUID PRIMARY KEY, settings MAP<TEXT, TEXT>);
-- Add/update map entriesUPDATE user_preferences SET settings['theme'] = 'dark'WHERE user_id = ?;
UPDATE user_preferences SET settings = settings + {'lang': 'en', 'tz': 'UTC'}WHERE user_id = ?;
-- Remove map entriesDELETE settings['old_setting'] FROM user_preferences WHERE user_id = ?;
UPDATE user_preferences SET settings = settings - {'deprecated'}WHERE user_id = ?;
-- Query mapSELECT settings['theme'] FROM user_preferences WHERE user_id = ?;7.3 User-Defined Types (UDT)
UDTs are fully supported and stored efficiently as JSONB:
-- Create UDTCREATE TYPE address ( street TEXT, city TEXT, state TEXT, zip_code TEXT, country TEXT);
CREATE TYPE contact_info ( email TEXT, phone TEXT, address FROZEN<address>);
-- Use UDT in tableCREATE TABLE customers ( customer_id UUID PRIMARY KEY, name TEXT, primary_contact FROZEN<contact_info>, shipping_addresses LIST<FROZEN<address>>);
-- Insert with UDTINSERT INTO customers (customer_id, name, primary_contact, shipping_addresses)VALUES ( uuid(), 'John Doe', { email: 'john@example.com', phone: '+1-555-1234', address: { street: '123 Main St', city: 'Springfield', state: 'IL', zip_code: '62701', country: 'USA' } }, [ {street: '456 Oak Ave', city: 'Chicago', state: 'IL', zip_code: '60601', country: 'USA'} ]);
-- Query UDT fieldsSELECT name, primary_contact.email, primary_contact.address.cityFROM customersWHERE customer_id = ?;
-- Update UDT fieldUPDATE customersSET primary_contact.phone = '+1-555-5678'WHERE customer_id = ?;7.4 Counter Columns
Counter columns are supported with atomic increment/decrement:
-- Counter tableCREATE TABLE page_views ( page_id TEXT PRIMARY KEY, view_count COUNTER, unique_visitors COUNTER);
-- Increment countersUPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home';UPDATE page_views SET unique_visitors = unique_visitors + 1 WHERE page_id = '/home';
-- Decrement countersUPDATE page_views SET view_count = view_count - 1 WHERE page_id = '/home';
-- Counter batchesBEGIN COUNTER BATCH UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/home'; UPDATE page_views SET view_count = view_count + 1 WHERE page_id = '/about';APPLY BATCH;
-- Note: Counter limitations (same as Cassandra)-- - Counter columns cannot be indexed-- - Counter tables can only contain counter columns plus the primary key-- - Cannot mix counter and non-counter columns-- - Cannot use TTL with counters7.5 Secondary Indexes
-- Create secondary indexCREATE INDEX users_by_email ON users (email);
-- Create index with custom nameCREATE INDEX idx_users_country ON users (country);
-- Index on collection elementsCREATE INDEX ON users (KEYS(preferences)); -- Index map keysCREATE INDEX ON users (VALUES(tags)); -- Index set/list valuesCREATE INDEX ON users (ENTRIES(preferences)); -- Index map entries
-- Query using indexSELECT * FROM users WHERE email = 'user@example.com';
-- Drop indexDROP INDEX IF EXISTS users_by_email;8. Consistency and Replication
8.1 Consistency Level Mapping
All Cassandra consistency levels are supported:
| Consistency Level | HeliosDB Behavior | Recommendation |
|---|---|---|
| ANY | Write acknowledged | Development only |
| ONE | Single replica | Low-latency reads |
| TWO | Two replicas | Moderate durability |
| THREE | Three replicas | High durability |
| QUORUM | Majority replicas | Recommended default |
| ALL | All replicas | Maximum consistency |
| LOCAL_QUORUM | Local DC quorum | Multi-DC deployments |
| EACH_QUORUM | Quorum in each DC | Strong multi-DC |
| LOCAL_ONE | Single local replica | DC-aware reads |
| SERIAL | Linearizable (LWT) | LWT operations |
| LOCAL_SERIAL | Local linearizable | Local LWT |
Setting Consistency Levels
from cassandra import ConsistencyLevelfrom cassandra.query import SimpleStatement
# Per-query consistencyquery = SimpleStatement( "SELECT * FROM users WHERE user_id = ?", consistency_level=ConsistencyLevel.LOCAL_QUORUM)session.execute(query, [user_id])
# Session default consistencyfrom cassandra.cluster import ExecutionProfilefrom cassandra.policies import RoundRobinPolicy
profile = ExecutionProfile( load_balancing_policy=RoundRobinPolicy(), consistency_level=ConsistencyLevel.LOCAL_QUORUM, serial_consistency_level=ConsistencyLevel.LOCAL_SERIAL)
cluster = Cluster( ['heliosdb-host'], execution_profiles={'default': profile})8.2 Replication Strategy Configuration
-- SimpleStrategy (single datacenter)CREATE KEYSPACE development WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1};
-- NetworkTopologyStrategy (multi-datacenter)CREATE KEYSPACE production WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3, 'dc3': 2};
-- Modify replicationALTER KEYSPACE production WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3, 'dc3': 3};8.3 HeliosDB Replication Advantages
Unlike Cassandra, HeliosDB provides:
- Automatic Rebalancing: No manual repair operations needed
- Faster Replica Sync: Streaming replication with parallel recovery
- Cross-Protocol Replication: Data replicated across protocol interfaces
- Integrated Backup: Point-in-time recovery built-in
9. Post-Migration Validation
9.1 Data Integrity Validation
Row Count Verification
#!/usr/bin/env python3"""Validate data integrity after migration."""
from cassandra.cluster import Cluster
def validate_row_counts(keyspace, tables): """Compare row counts between Cassandra and HeliosDB.""" source = Cluster(['cassandra-host']).connect(keyspace) target = Cluster(['heliosdb-host']).connect(keyspace)
results = {} for table in tables: source_count = source.execute(f"SELECT COUNT(*) FROM {table}").one()[0] target_count = target.execute(f"SELECT COUNT(*) FROM {table}").one()[0]
results[table] = { 'source': source_count, 'target': target_count, 'match': source_count == target_count }
status = "OK" if results[table]['match'] else "MISMATCH" print(f"{table}: Source={source_count}, Target={target_count} [{status}]")
source.cluster.shutdown() target.cluster.shutdown() return results
# Validate all tablestables = ['users', 'events', 'orders', 'products']validate_row_counts('production', tables)Data Sampling Validation
def validate_sample_data(keyspace, table, primary_key_column, sample_size=100): """Compare sample rows between systems.""" source = Cluster(['cassandra-host']).connect(keyspace) target = Cluster(['heliosdb-host']).connect(keyspace)
# Get sample primary keys from source sample_query = f"SELECT {primary_key_column} FROM {table} LIMIT {sample_size}" sample_keys = [row[0] for row in source.execute(sample_query)]
mismatches = [] for key in sample_keys: source_row = source.execute( f"SELECT * FROM {table} WHERE {primary_key_column} = ?", [key] ).one() target_row = target.execute( f"SELECT * FROM {table} WHERE {primary_key_column} = ?", [key] ).one()
if source_row != target_row: mismatches.append({ 'key': key, 'source': dict(source_row._asdict()) if source_row else None, 'target': dict(target_row._asdict()) if target_row else None })
if mismatches: print(f"Found {len(mismatches)} mismatches in {table}") for m in mismatches[:5]: # Show first 5 print(f" Key: {m['key']}") else: print(f"{table}: All {sample_size} samples match")
return mismatches9.2 Schema Validation
-- Compare schema definitions-- On Cassandra:DESCRIBE KEYSPACE production;
-- On HeliosDB:DESCRIBE KEYSPACE production;
-- Verify specific tableDESCRIBE TABLE production.users;
-- Check indexesSELECT * FROM system_schema.indexes WHERE keyspace_name = 'production';
-- Check materialized viewsSELECT * FROM system_schema.views WHERE keyspace_name = 'production';
-- Check UDTsSELECT * FROM system_schema.types WHERE keyspace_name = 'production';9.3 Application Validation Checklist
- All application endpoints responding correctly
- Query latencies within acceptable range
- No increase in error rates
- Authentication working for all users
- Batch operations completing successfully
- TTL expiration working correctly
- Counter increments accurate
- Collection operations (add/remove) working
- LWT operations returning expected results
- Prepared statements caching effectively
9.4 Automated Validation Script
#!/bin/bash# Post-migration validation script
CASSANDRA_HOST="cassandra-host"HELIOSDB_HOST="heliosdb-host"KEYSPACE="production"
echo "=== Post-Migration Validation ==="
# 1. Connectivity testecho "Testing connectivity..."cqlsh $HELIOSDB_HOST -e "SELECT now() FROM system.local" || exit 1
# 2. Schema comparisonecho "Comparing schemas..."cqlsh $CASSANDRA_HOST -e "DESCRIBE KEYSPACE $KEYSPACE" > /tmp/cassandra_schema.cqlcqlsh $HELIOSDB_HOST -e "DESCRIBE KEYSPACE $KEYSPACE" > /tmp/heliosdb_schema.cqldiff /tmp/cassandra_schema.cql /tmp/heliosdb_schema.cql
# 3. Row count comparisonecho "Comparing row counts..."for table in users events orders; do cassandra_count=$(cqlsh $CASSANDRA_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$table" | grep -E '^\s*[0-9]+') heliosdb_count=$(cqlsh $HELIOSDB_HOST -e "SELECT COUNT(*) FROM $KEYSPACE.$table" | grep -E '^\s*[0-9]+') echo "$table: Cassandra=$cassandra_count, HeliosDB=$heliosdb_count"done
# 4. Query performance testecho "Testing query performance..."time cqlsh $HELIOSDB_HOST -e "SELECT * FROM $KEYSPACE.users LIMIT 1000"
echo "=== Validation Complete ==="10. Performance Tuning
10.1 HeliosDB Configuration Optimization
# heliosdb.toml - Cassandra protocol optimization
[cassandra]# Network settingslisten_address = "0.0.0.0"port = 9042max_connections = 2000
[cassandra.protocol]# Enable compression for better throughputcompression_enabled = truecompression_algorithms = ["lz4"]
# Increase frame size for large batchesmax_frame_size = 536870912 # 512MB
[cassandra.query]# Optimize prepared statement cacheprepared_cache_enabled = trueprepared_cache_size = 5000
# Adjust page size for your workloaddefault_page_size = 10000max_page_size = 100000
[cassandra.batch]# Increase batch limitsmax_batch_size = 100000batch_timeout = 30000
[cassandra.performance]# Enable query cache for repeated queriesquery_cache_enabled = truequery_cache_size = 512 # MB
# Log slow queries for optimizationlog_slow_queries = trueslow_query_threshold_ms = 10010.2 Client-Side Optimization
Connection Pool Tuning (Java)
CqlSession session = CqlSession.builder() .addContactPoint(new InetSocketAddress("heliosdb-host", 9042)) .withLocalDatacenter("datacenter1") .withConfigLoader( DriverConfigLoader.programmaticBuilder() .withInt(DefaultDriverOption.CONNECTION_POOL_LOCAL_SIZE, 4) .withInt(DefaultDriverOption.CONNECTION_POOL_REMOTE_SIZE, 2) .withInt(DefaultDriverOption.CONNECTION_MAX_REQUESTS, 32768) .withDuration(DefaultDriverOption.REQUEST_TIMEOUT, Duration.ofSeconds(10)) .build() ) .build();Connection Pool Tuning (Python)
from cassandra.cluster import Cluster, ExecutionProfilefrom cassandra.policies import DCAwareRoundRobinPolicy, TokenAwarePolicy
# Optimize execution profileprofile = ExecutionProfile( load_balancing_policy=TokenAwarePolicy( DCAwareRoundRobinPolicy(local_dc='datacenter1') ), request_timeout=10.0, row_factory=tuple_factory # More efficient than named tuples)
cluster = Cluster( ['heliosdb-host'], execution_profiles={'default': profile}, protocol_version=5, compression=True, idle_heartbeat_interval=30, connection_class=ShardAwareConnection # If using multiple shards)10.3 Query Optimization Tips
Use Prepared Statements
# Always use prepared statements for repeated queriesprepared = session.prepare( "SELECT * FROM users WHERE user_id = ? AND event_time >= ? AND event_time < ?")
# Execute with parametersfor user_id in user_ids: result = session.execute(prepared, [user_id, start_time, end_time])Optimize Batch Operations
from cassandra.query import BatchStatement, BatchType
# Use UNLOGGED batches for better performance when atomicity not requiredbatch = BatchStatement(batch_type=BatchType.UNLOGGED)
# Keep batches reasonable size (100-500 statements)for i, row in enumerate(rows): batch.add(prepared, row) if (i + 1) % 100 == 0: session.execute(batch) batch = BatchStatement(batch_type=BatchType.UNLOGGED)Avoid Anti-Patterns
-- AVOID: Full table scansSELECT * FROM users; -- BadSELECT * FROM users WHERE status = 'active' ALLOW FILTERING; -- Bad
-- BETTER: Use partition keysSELECT * FROM users WHERE user_id = ?; -- GoodSELECT * FROM users WHERE user_id IN (?, ?, ?); -- Good
-- AVOID: Large IN clausesSELECT * FROM users WHERE user_id IN (/* 1000 values */); -- Bad
-- BETTER: Multiple queries or token rangeSELECT * FROM users WHERE token(user_id) >= ? AND token(user_id) < ?;10.4 Monitoring Performance
-- HeliosDB performance views (accessible via SQL interface)SELECT * FROM heliosdb_stats.cql_query_latency ORDER BY avg_latency_ms DESC LIMIT 10;
SELECT * FROM heliosdb_stats.cql_throughput WHERE timestamp > now() - interval '1 hour';
-- Via CQLSELECT * FROM system.local; -- Connection infoSELECT * FROM system.peers; -- Cluster topology11. Common Issues and Troubleshooting
11.1 Connection Issues
Issue: Connection Refused
NoHostAvailable: Unable to connect to any serversSolutions:
# Check HeliosDB is runningsystemctl status heliosdb
# Verify port is listeningnetstat -tlnp | grep 9042
# Check firewallfirewall-cmd --list-ports | grep 9042
# Test connectivitync -zv heliosdb-host 9042Issue: Authentication Failed
AuthenticationFailed: Authentication failedSolutions:
# Ensure credentials are correctfrom cassandra.auth import PlainTextAuthProvider
auth = PlainTextAuthProvider( username='your_username', password='your_password')cluster = Cluster(['heliosdb-host'], auth_provider=auth)11.2 Schema Issues
Issue: Keyspace Not Found
InvalidRequest: Keyspace 'xxx' does not existSolutions:
-- Create keyspace if missingCREATE KEYSPACE IF NOT EXISTS xxx WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1};
-- Or import schema firstcqlsh heliosdb-host -f schema.cqlIssue: Type Mismatch
InvalidRequest: Expected type X but got type YSolutions:
# Ensure proper type conversionfrom cassandra.util import uuid_from_timeimport uuid
# For UUID columnsuser_id = uuid.UUID('550e8400-e29b-41d4-a716-446655440000')
# For TIMESTAMP columnsfrom datetime import datetimeevent_time = datetime.utcnow()
# For collectionstags = {'tag1', 'tag2'} # Setprefs = {'key': 'value'} # Mapitems = ['item1', 'item2'] # List11.3 Query Issues
Issue: Timeout on Large Queries
OperationTimedOut: errors={}, last_host=xxxSolutions:
# Increase timeoutfrom cassandra.cluster import Cluster, ExecutionProfile
profile = ExecutionProfile(request_timeout=30.0) # 30 seconds
cluster = Cluster( ['heliosdb-host'], execution_profiles={'default': profile})
# Or use paging for large resultsquery = SimpleStatement( "SELECT * FROM large_table", fetch_size=1000 # Page size)for row in session.execute(query): process(row)Issue: ALLOW FILTERING Warning
Cannot execute this query as it might involve data filtering and thus may have unpredictable performanceSolutions:
-- Add index for frequently filtered columnsCREATE INDEX ON users (email);
-- Or redesign schema with proper partition keys-- Instead of:SELECT * FROM users WHERE status = 'active' ALLOW FILTERING;
-- Create a table partitioned by status:CREATE TABLE users_by_status ( status TEXT, user_id UUID, name TEXT, PRIMARY KEY (status, user_id));11.4 Data Migration Issues
Issue: Migration Performance Degradation
Solutions:
# Use parallel migrationfrom concurrent.futures import ThreadPoolExecutor
def migrate_partition(partition_key): # Migrate single partition pass
with ThreadPoolExecutor(max_workers=8) as executor: executor.map(migrate_partition, partition_keys)Issue: Out of Memory During Migration
Solutions:
# Use streaming instead of loading all datarows = session.execute("SELECT * FROM large_table")
# Process in chunksbatch = []for row in rows: batch.append(row) if len(batch) >= 1000: process_batch(batch) batch = []11.5 Performance Issues
Issue: High Latency After Migration
Diagnosis:
-- Enable tracingTRACING ON;SELECT * FROM users WHERE user_id = ?;TRACING OFF;Solutions:
- Check if indexes are properly created
- Verify partition key access patterns
- Enable query caching
- Increase connection pool size
[cassandra.performance]query_cache_enabled = truequery_cache_size = 512Issue: Inconsistent Query Results
Solutions:
# Use appropriate consistency levelfrom cassandra import ConsistencyLevelfrom cassandra.query import SimpleStatement
query = SimpleStatement( "SELECT * FROM users WHERE user_id = ?", consistency_level=ConsistencyLevel.QUORUM)11.6 Collection Operation Issues
Issue: Collection Update Failures
InvalidRequest: Invalid operation on frozen collectionSolutions:
-- Frozen collections are immutable; replace entirely-- Instead of:UPDATE users SET address.street = 'New St' WHERE user_id = ?; -- Fails if frozen
-- Replace the entire frozen field:UPDATE users SET address = {street: 'New St', city: 'Chicago', ...}WHERE user_id = ?;
-- Or use non-frozen for mutable fieldsALTER TYPE address DROP street;ALTER TYPE address ADD street TEXT;11.7 TTL Issues
Issue: Data Expiring Unexpectedly
Diagnosis:
-- Check TTL valuesSELECT user_id, TTL(name), TTL(email) FROM users WHERE user_id = ?;
-- Check table default TTLDESCRIBE TABLE users;Solutions:
-- Update TTL on existing dataUPDATE users USING TTL 86400SET name = name, email = emailWHERE user_id = ?;
-- Remove TTL (set to 0)UPDATE users USING TTL 0SET name = nameWHERE user_id = ?;Quick Reference Card
Essential Commands
| Task | Command |
|---|---|
| Connect | cqlsh heliosdb-host 9042 |
| List keyspaces | DESCRIBE KEYSPACES; |
| Use keyspace | USE keyspace_name; |
| List tables | DESCRIBE TABLES; |
| Table schema | DESCRIBE TABLE table_name; |
| Enable tracing | TRACING ON; |
| Import schema | cqlsh -f schema.cql |
Migration Checklist Summary
- Pre-migration assessment complete
- Schema exported and validated
- Schema imported to HeliosDB
- Data exported from Cassandra
- Data imported to HeliosDB
- Row counts validated
- Sample data verified
- Application configuration updated
- Application tested end-to-end
- Performance benchmarked
- Monitoring configured
- Rollback plan documented
Related Documentation
- Cassandra Protocol README - Protocol overview
- Cassandra Compatibility Matrix - Feature compatibility
- Cassandra Configuration - Configuration reference
- Cassandra CQL Examples - Usage examples
- General Migration Guide - Migration best practices
- Protocol Compatibility Matrix - Cross-protocol comparison
Document Version: 1.0 Last Updated: January 2026 Compatibility: HeliosDB 7.0+, Apache Cassandra 3.x/4.x