Skip to content

HeliosDB Global Secondary Indexes

HeliosDB Global Secondary Indexes

Global Secondary Indexes (GSI) module for HeliosDB, enabling efficient queries on non-sharding-key columns across all shards in a distributed database.

Features

  • Global Secondary Indexes: Query data across all shards using non-sharding-key columns
  • Multiple Index Types: B-Tree (range queries) and Hash (equality queries)
  • Covering Indexes: Include additional columns to avoid base table lookups
  • Partial Indexes: Index only rows matching a filter predicate
  • Sparse Indexes: Index only non-null values
  • Distributed Maintenance: Automatic cross-shard index updates
  • Async Updates: Eventual consistency with async index maintenance
  • Strong Consistency: Optional 2PC for synchronous updates
  • Query Planning: Automatic index selection based on cost estimation
  • Consistency Checking: Verify and repair index inconsistencies

Installation

Add this to your Cargo.toml:

[dependencies]
heliosdb-indexes = "3.0"

Quick Start

use heliosdb_indexes::*;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<()> {
// Create the index coordinator
let config = CoordinatorConfig::default();
let coordinator = Arc::new(IndexCoordinator::new(config).await?);
// Create a global secondary index
let index_def = IndexDefinition::new(
"idx_user_email",
"users",
vec!["email".to_string()]
);
coordinator.create_index(index_def).await?;
// Insert data (indexes are updated automatically)
let mut row = Row::new();
row.set("email".to_string(), serde_json::json!("alice@example.com"));
row.set("name".to_string(), serde_json::json!("Alice"));
coordinator.on_data_change(
"users",
&Key::from_string("user1"),
"shard1",
None,
Some(&row)
).await?;
// Query using the index
let results = coordinator.execute_index_query(
"idx_user_email",
&Predicate::Eq(IndexValue::String("alice@example.com".to_string()))
).await?;
println!("Found {} user(s)", results.len());
Ok(())
}

Index Types

Simple Index

Basic index on one or more columns:

let index = IndexDefinition::new(
"idx_email",
"users",
vec!["email".to_string()]
);
coordinator.create_index(index).await?;

Covering Index

Includes additional columns to satisfy queries without accessing the base table:

let covering_index = IndexDefinition::covering(
"idx_email_profile",
"users",
vec!["email".to_string()],
vec!["name".to_string(), "age".to_string()]
);
coordinator.create_index(covering_index).await?;

Partial Index

Only indexes rows matching a filter:

let partial_index = IndexDefinition::partial(
"idx_active_users",
"users",
vec!["status".to_string()],
"status = 'active'".to_string()
);
coordinator.create_index(partial_index).await?;

Sparse Index

Only indexes non-null values:

let sparse_index = IndexDefinition::sparse(
"idx_phone",
"users",
vec!["phone_number".to_string()]
);
coordinator.create_index(sparse_index).await?;

Query Operations

Equality Query

let results = coordinator.execute_index_query(
"idx_email",
&Predicate::Eq(IndexValue::String("alice@example.com".to_string()))
).await?;

Range Query

// Ages between 25 and 40
let results = coordinator.execute_index_query(
"idx_age",
&Predicate::Between(IndexValue::Int64(25), IndexValue::Int64(40))
).await?;
// Age less than 30
let results = coordinator.execute_index_query(
"idx_age",
&Predicate::Lt(IndexValue::Int64(30))
).await?;

IN Query

let results = coordinator.execute_index_query(
"idx_status",
&Predicate::In(vec![
IndexValue::String("active".to_string()),
IndexValue::String("pending".to_string())
])
).await?;

Composite Index Query

// Create composite index
let composite = IndexDefinition::new(
"idx_status_dept",
"employees",
vec!["status".to_string(), "department".to_string()]
);
coordinator.create_index(composite).await?;
// Query with composite predicate
let results = coordinator.execute_index_query(
"idx_status_dept",
&Predicate::Eq(IndexValue::Composite(vec![
IndexValue::String("active".to_string()),
IndexValue::String("Engineering".to_string())
]))
).await?;

Consistency Models

Eventual Consistency (Default)

Uses async updates for better write performance:

let mut config = CoordinatorConfig::default();
config.default_consistency = ConsistencyLevel::Eventual;
let coordinator = IndexCoordinator::new(config).await?;

Strong Consistency

Uses 2PC for synchronous updates:

let mut config = CoordinatorConfig::default();
config.default_consistency = ConsistencyLevel::Strong;
let coordinator = IndexCoordinator::new(config).await?;

Index Maintenance

Rebuild Index

coordinator.rebuild_index("idx_email").await?;

Consistency Check

let base_data = vec![
(Key::from_string("user1"), "shard1".to_string(), row1),
(Key::from_string("user2"), "shard1".to_string(), row2)
];
let report = coordinator.check_consistency("idx_email", &base_data).await?;
if !report.is_consistent {
println!("Missing entries: {}", report.missing_entries.len());
coordinator.repair_index("idx_email", &base_data).await?;
}

Get Statistics

let stats = coordinator.get_index_stats("idx_email").await?;
println!("Entries: {}", stats.entry_count);
println!("Unique values: {}", stats.unique_values);
println!("Size: {} bytes", stats.size_bytes);

Advanced Usage

With Maintenance Coordinator

Enable distributed maintenance for cross-shard updates:

let coordinator = IndexCoordinator::new(config)
.await?
.with_maintenance()
.await?;

With Async Updates

Enable async update processing with change log:

let change_log = Arc::new(ChangeLog::new());
let coordinator = IndexCoordinator::new(config)
.await?
.with_async_updates(change_log)
.await?;

With Query Executor

Enable query planning and execution:

let shard_fetcher = Arc::new(MyShardFetcher::new());
let coordinator = IndexCoordinator::new(config)
.await?
.with_query_executor(shard_fetcher);

DDL Syntax Support

The module supports SQL-like DDL for index creation:

-- Simple index
CREATE GLOBAL INDEX idx_email ON users(email);
-- Covering index
CREATE GLOBAL INDEX idx_user_profile ON users(email)
INCLUDE (name, age, city);
-- Partial index
CREATE GLOBAL INDEX idx_active_users ON users(status)
WHERE status = 'active';
-- Sparse index
CREATE GLOBAL INDEX idx_phone ON users(phone_number)
WHERE phone_number IS NOT NULL;
-- Composite index
CREATE GLOBAL INDEX idx_status_dept ON employees(status, department);
-- Hash index
CREATE GLOBAL INDEX idx_email_hash ON users(email) USING HASH;

Architecture

Index Entry Storage

Indexed Value → [(Shard ID, Row Key, Included Data)]

Each index entry maps an indexed value to a list of row locations across shards.

Query Flow

  1. Query planner analyzes predicate and available indexes
  2. Cost estimator calculates query cost for each plan
  3. Best index is selected (or table scan if no suitable index)
  4. Index is queried to get row locations
  5. Rows are fetched from shards (unless covering index)
  6. Results are returned

Update Flow

Eventual Consistency

  1. Data change occurs in base table
  2. Change is logged to change log
  3. Update is queued for async processing
  4. Workers process updates in batches
  5. Index entries are updated

Strong Consistency

  1. Data change occurs in base table
  2. 2PC prepare phase for all affected indexes
  3. All indexes are updated atomically
  4. 2PC commit phase

Performance Considerations

Index Selection

  • Use Hash indexes for equality queries only
  • Use B-Tree indexes for range queries and sorting
  • Create covering indexes to avoid base table lookups
  • Use partial indexes to reduce index size
  • Use sparse indexes for columns with many nulls

Consistency Trade-offs

  • Eventual consistency: Better write performance, possible stale reads
  • Strong consistency: Guaranteed up-to-date reads, slower writes

Query Optimization

The query planner automatically selects the best index based on:

  • Predicate type (equality, range, etc.)
  • Index type (Hash, B-Tree)
  • Selectivity estimation
  • Whether the index is covering

Examples

See the examples/ directory for complete examples:

  • gsi_usage.rs - Comprehensive usage example

Run examples with:

Terminal window
cargo run --example gsi_usage

Testing

Run all tests:

Terminal window
cargo test

Run with logging:

Terminal window
RUST_LOG=debug cargo test

Run integration tests only:

Terminal window
cargo test --test integration_test

Benchmarks

Run benchmarks:

Terminal window
cargo bench

License

MIT OR Apache-2.0

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.