HeliosDB Global Secondary Indexes
HeliosDB Global Secondary Indexes
Global Secondary Indexes (GSI) module for HeliosDB, enabling efficient queries on non-sharding-key columns across all shards in a distributed database.
Features
- Global Secondary Indexes: Query data across all shards using non-sharding-key columns
- Multiple Index Types: B-Tree (range queries) and Hash (equality queries)
- Covering Indexes: Include additional columns to avoid base table lookups
- Partial Indexes: Index only rows matching a filter predicate
- Sparse Indexes: Index only non-null values
- Distributed Maintenance: Automatic cross-shard index updates
- Async Updates: Eventual consistency with async index maintenance
- Strong Consistency: Optional 2PC for synchronous updates
- Query Planning: Automatic index selection based on cost estimation
- Consistency Checking: Verify and repair index inconsistencies
Installation
Add this to your Cargo.toml:
[dependencies]heliosdb-indexes = "3.0"Quick Start
use heliosdb_indexes::*;use std::sync::Arc;
#[tokio::main]async fn main() -> Result<()> { // Create the index coordinator let config = CoordinatorConfig::default(); let coordinator = Arc::new(IndexCoordinator::new(config).await?);
// Create a global secondary index let index_def = IndexDefinition::new( "idx_user_email", "users", vec!["email".to_string()] ); coordinator.create_index(index_def).await?;
// Insert data (indexes are updated automatically) let mut row = Row::new(); row.set("email".to_string(), serde_json::json!("alice@example.com")); row.set("name".to_string(), serde_json::json!("Alice"));
coordinator.on_data_change( "users", &Key::from_string("user1"), "shard1", None, Some(&row) ).await?;
// Query using the index let results = coordinator.execute_index_query( "idx_user_email", &Predicate::Eq(IndexValue::String("alice@example.com".to_string())) ).await?;
println!("Found {} user(s)", results.len());
Ok(())}Index Types
Simple Index
Basic index on one or more columns:
let index = IndexDefinition::new( "idx_email", "users", vec!["email".to_string()]);coordinator.create_index(index).await?;Covering Index
Includes additional columns to satisfy queries without accessing the base table:
let covering_index = IndexDefinition::covering( "idx_email_profile", "users", vec!["email".to_string()], vec!["name".to_string(), "age".to_string()]);coordinator.create_index(covering_index).await?;Partial Index
Only indexes rows matching a filter:
let partial_index = IndexDefinition::partial( "idx_active_users", "users", vec!["status".to_string()], "status = 'active'".to_string());coordinator.create_index(partial_index).await?;Sparse Index
Only indexes non-null values:
let sparse_index = IndexDefinition::sparse( "idx_phone", "users", vec!["phone_number".to_string()]);coordinator.create_index(sparse_index).await?;Query Operations
Equality Query
let results = coordinator.execute_index_query( "idx_email", &Predicate::Eq(IndexValue::String("alice@example.com".to_string()))).await?;Range Query
// Ages between 25 and 40let results = coordinator.execute_index_query( "idx_age", &Predicate::Between(IndexValue::Int64(25), IndexValue::Int64(40))).await?;
// Age less than 30let results = coordinator.execute_index_query( "idx_age", &Predicate::Lt(IndexValue::Int64(30))).await?;IN Query
let results = coordinator.execute_index_query( "idx_status", &Predicate::In(vec![ IndexValue::String("active".to_string()), IndexValue::String("pending".to_string()) ])).await?;Composite Index Query
// Create composite indexlet composite = IndexDefinition::new( "idx_status_dept", "employees", vec!["status".to_string(), "department".to_string()]);coordinator.create_index(composite).await?;
// Query with composite predicatelet results = coordinator.execute_index_query( "idx_status_dept", &Predicate::Eq(IndexValue::Composite(vec![ IndexValue::String("active".to_string()), IndexValue::String("Engineering".to_string()) ]))).await?;Consistency Models
Eventual Consistency (Default)
Uses async updates for better write performance:
let mut config = CoordinatorConfig::default();config.default_consistency = ConsistencyLevel::Eventual;let coordinator = IndexCoordinator::new(config).await?;Strong Consistency
Uses 2PC for synchronous updates:
let mut config = CoordinatorConfig::default();config.default_consistency = ConsistencyLevel::Strong;let coordinator = IndexCoordinator::new(config).await?;Index Maintenance
Rebuild Index
coordinator.rebuild_index("idx_email").await?;Consistency Check
let base_data = vec![ (Key::from_string("user1"), "shard1".to_string(), row1), (Key::from_string("user2"), "shard1".to_string(), row2)];
let report = coordinator.check_consistency("idx_email", &base_data).await?;
if !report.is_consistent { println!("Missing entries: {}", report.missing_entries.len()); coordinator.repair_index("idx_email", &base_data).await?;}Get Statistics
let stats = coordinator.get_index_stats("idx_email").await?;println!("Entries: {}", stats.entry_count);println!("Unique values: {}", stats.unique_values);println!("Size: {} bytes", stats.size_bytes);Advanced Usage
With Maintenance Coordinator
Enable distributed maintenance for cross-shard updates:
let coordinator = IndexCoordinator::new(config) .await? .with_maintenance() .await?;With Async Updates
Enable async update processing with change log:
let change_log = Arc::new(ChangeLog::new());let coordinator = IndexCoordinator::new(config) .await? .with_async_updates(change_log) .await?;With Query Executor
Enable query planning and execution:
let shard_fetcher = Arc::new(MyShardFetcher::new());let coordinator = IndexCoordinator::new(config) .await? .with_query_executor(shard_fetcher);DDL Syntax Support
The module supports SQL-like DDL for index creation:
-- Simple indexCREATE GLOBAL INDEX idx_email ON users(email);
-- Covering indexCREATE GLOBAL INDEX idx_user_profile ON users(email)INCLUDE (name, age, city);
-- Partial indexCREATE GLOBAL INDEX idx_active_users ON users(status)WHERE status = 'active';
-- Sparse indexCREATE GLOBAL INDEX idx_phone ON users(phone_number)WHERE phone_number IS NOT NULL;
-- Composite indexCREATE GLOBAL INDEX idx_status_dept ON employees(status, department);
-- Hash indexCREATE GLOBAL INDEX idx_email_hash ON users(email) USING HASH;Architecture
Index Entry Storage
Indexed Value → [(Shard ID, Row Key, Included Data)]Each index entry maps an indexed value to a list of row locations across shards.
Query Flow
- Query planner analyzes predicate and available indexes
- Cost estimator calculates query cost for each plan
- Best index is selected (or table scan if no suitable index)
- Index is queried to get row locations
- Rows are fetched from shards (unless covering index)
- Results are returned
Update Flow
Eventual Consistency
- Data change occurs in base table
- Change is logged to change log
- Update is queued for async processing
- Workers process updates in batches
- Index entries are updated
Strong Consistency
- Data change occurs in base table
- 2PC prepare phase for all affected indexes
- All indexes are updated atomically
- 2PC commit phase
Performance Considerations
Index Selection
- Use Hash indexes for equality queries only
- Use B-Tree indexes for range queries and sorting
- Create covering indexes to avoid base table lookups
- Use partial indexes to reduce index size
- Use sparse indexes for columns with many nulls
Consistency Trade-offs
- Eventual consistency: Better write performance, possible stale reads
- Strong consistency: Guaranteed up-to-date reads, slower writes
Query Optimization
The query planner automatically selects the best index based on:
- Predicate type (equality, range, etc.)
- Index type (Hash, B-Tree)
- Selectivity estimation
- Whether the index is covering
Examples
See the examples/ directory for complete examples:
gsi_usage.rs- Comprehensive usage example
Run examples with:
cargo run --example gsi_usageTesting
Run all tests:
cargo testRun with logging:
RUST_LOG=debug cargo testRun integration tests only:
cargo test --test integration_testBenchmarks
Run benchmarks:
cargo benchLicense
MIT OR Apache-2.0
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.