Getting Started with HeliosDB-Lite v2.0

Version: 2.0.0 Last Updated: November 21, 2025 Difficulty: Beginner Estimated Time: 30 minutes

Introduction
Installation
Quick Start (5-Minute Tutorial)
Core Concepts
Basic Operations
Vector Search
Time-Travel Queries
Database Branching
Materialized Views
Next Steps

Introduction

What is HeliosDB-Lite?

HeliosDB-Lite is a PostgreSQL-compatible embedded database that brings enterprise-grade features to your applications without the complexity of a traditional database server. Think of it as SQLite with superpowers.

Key Features:

Embedded Mode: Run directly in your application process (like SQLite)
Server Mode: Optional PostgreSQL-compatible network server
Vector Search: Built-in HNSW index for AI/ML embeddings (OpenAI, Cohere, etc.)
Product Quantization: 8-16x vector compression with minimal accuracy loss
Time-Travel Queries: Query historical data at any point in time
Database Branching: Git-like workflows for zero-downtime deployments
Built-in Encryption: AES-256-GCM with less than 3% overhead
Multi-Tenant Ready: Native tenant isolation with Row-Level Security

Who Should Use HeliosDB-Lite?

HeliosDB-Lite is perfect for:

AI/ML Applications: Need vector search for embeddings (RAG, semantic search)
Edge Computing: Embedded database with server capabilities
SaaS Applications: Multi-tenant isolation without complexity
Mobile/Desktop Apps: Rich SQL database without external dependencies
Prototyping: PostgreSQL compatibility means easy migration to production databases

What Makes It Different?

Feature	SQLite	PostgreSQL	DuckDB	HeliosDB-Lite
Embedded Mode	✅	❌	✅	✅
Server Mode	❌	✅	❌	✅
Vector Search	❌	Extension	❌	✅ Built-in
Product Quantization	❌	❌	❌	✅ 8-16x compression
Time-Travel Queries	❌	❌	❌	✅ AS OF TIMESTAMP
Database Branching	❌	❌	❌	✅ Git-like
Built-in Encryption	❌	Extension	❌	✅ TDE
PostgreSQL Compatible	❌	✅	❌	✅ 95%+

Installation

Prerequisites

Rust: Version 1.75 or higher
Operating System: Linux, macOS, or Windows
Memory: 512MB RAM minimum (2GB+ recommended for vector workloads)
Disk Space: 100MB for installation + your data

Option 1: Install with Cargo (Recommended)

# Install from crates.io
cargo install heliosdb-lite

# Verify installation
heliosdb-lite --version
# Output: heliosdb-lite 2.0.0

Option 2: Build from Source

# Clone repository
git clone https://github.com/dimensigon/HeliosDB-Lite.git
cd HeliosDB-Lite

# Build release binary
cargo build --release

# Binary location
./target/release/heliosdb-lite --version

Option 3: Add as Library Dependency

Add to your Cargo.toml:

[dependencies]
heliosdb-lite = "2.0.0"

Option 4: Docker (Coming Soon)

# Pull image
docker pull heliosdb/lite:2.0.0

# Run container
docker run -p 5432:5432 heliosdb/lite:2.0.0

Verify Installation

# Check version
heliosdb-lite --version

# Show help
heliosdb-lite --help

# Start REPL (interactive shell)
heliosdb-lite repl --memory

If you see the HeliosDB-Lite welcome banner, you’re ready to go!

Quick Start (5-Minute Tutorial)

Let’s create your first database and run some queries. This tutorial will take you from zero to querying data in 5 minutes.

Step 1: Start the Interactive REPL

# Start in-memory database (data lost on exit)
heliosdb-lite repl --memory

You’ll see:

 HeliosDB Lite v2.0.0
 PostgreSQL-compatible embedded database with Phase 3 features

 ✨ New in v2.0:
   • 8-16x vector compression (Product Quantization)
   • Database branching (git-like workflows)
   • Time-travel queries (AS OF TIMESTAMP)
   • Intelligent materialized views

 Type \h for help, \dS for system views, \q to quit

heliosdb>

Step 2: Create Your First Table

CREATE TABLE users (
    id INT PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL,
    age INT
);

Output:

Query OK (0.002s)

Step 3: Insert Data

INSERT INTO users VALUES (1, 'Alice Johnson', 'alice@example.com', 30);
INSERT INTO users VALUES (2, 'Bob Smith', 'bob@example.com', 25);
INSERT INTO users VALUES (3, 'Charlie Brown', 'charlie@example.com', 35);

Output:

Query OK, 1 row affected (0.001s)
Query OK, 1 row affected (0.001s)
Query OK, 1 row affected (0.001s)

Step 4: Query Data

SELECT * FROM users;

Output:

+----+----------------+---------------------+-----+
| id | name           | email               | age |
+----+----------------+---------------------+-----+
| 1  | Alice Johnson  | alice@example.com   | 30  |
| 2  | Bob Smith      | bob@example.com     | 25  |
| 3  | Charlie Brown  | charlie@example.com | 35  |
+----+----------------+---------------------+-----+
(3 rows) (0.001s)

Step 5: Filter and Aggregate

-- Find users older than 28
SELECT name, age FROM users WHERE age > 28;

-- Count total users
SELECT COUNT(*) as total_users FROM users;

-- Average age
SELECT AVG(age) as average_age FROM users;

Step 6: Update and Delete

-- Update Alice's age
UPDATE users SET age = 31 WHERE name = 'Alice Johnson';

-- Delete young users
DELETE FROM users WHERE age < 28;

-- Verify changes
SELECT * FROM users;

Step 7: Exit the REPL

\q

Congratulations! You’ve just created your first database, inserted data, and run queries. In the next sections, we’ll explore more advanced features.

Core Concepts

Databases and Tables

HeliosDB-Lite organizes data into tables within a database. Unlike PostgreSQL, which supports multiple databases per server, HeliosDB-Lite focuses on a single database per instance (similar to SQLite).

Table Structure:

CREATE TABLE table_name (
    column1 TYPE constraints,
    column2 TYPE constraints,
    ...
);

Supported Data Types:

Numeric: INT, BIGINT, FLOAT, DOUBLE
Text: TEXT, VARCHAR(n)
Binary: BLOB, BYTEA
JSON: JSONB (binary JSON for efficient storage)
Vector: VECTOR(n) (for embeddings, n = dimensions)
Temporal: TIMESTAMP, DATE, TIME
Boolean: BOOLEAN

Indexes

Indexes speed up queries by creating optimized data structures:

Standard B-Tree Index:

CREATE INDEX idx_users_email ON users(email);

Vector Index (HNSW):

CREATE INDEX idx_embeddings ON documents
USING hnsw (embedding vector_cosine_ops);

Composite Index:

CREATE INDEX idx_user_age_name ON users(age, name);

Transactions

Transactions ensure data integrity with ACID guarantees:

ACID Properties:

Atomicity: All operations succeed or all fail
Consistency: Database moves from one valid state to another
Isolation: Concurrent transactions don’t interfere
Durability: Committed changes survive crashes

Transaction Isolation: HeliosDB-Lite uses Snapshot Isolation (similar to PostgreSQL’s REPEATABLE READ):

Each transaction sees a consistent snapshot
No dirty reads or non-repeatable reads
Write conflicts are detected and prevented

Storage Modes

In-Memory Mode:

let db = EmbeddedDatabase::new_in_memory()?;

Data stored in RAM only
Fastest performance
Data lost on exit
Perfect for testing

Persistent Mode:

let db = EmbeddedDatabase::new("./my_database")?;

Data written to disk
Survives restarts
ACID guarantees
Production use

Encrypted Mode:

let config = Config {
    encryption_key: Some(key),
    ..Default::default()
};
let db = EmbeddedDatabase::with_config("./my_db", config)?;

AES-256-GCM encryption
Transparent encryption/decryption
Less than 3% overhead
Compliance-ready

Basic Operations

Creating Tables

Simple Table:

CREATE TABLE products (
    id INT PRIMARY KEY,
    name TEXT NOT NULL,
    price FLOAT,
    stock INT DEFAULT 0
);

Table with Constraints:

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT CHECK (quantity > 0),
    total_price FLOAT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (customer_id) REFERENCES users(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Viewing Table Schema:

-- In REPL
\d products

-- SQL standard
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'products';

Inserting Data

Single Row:

INSERT INTO products (id, name, price, stock)
VALUES (1, 'Laptop', 1299.99, 50);

Multiple Rows:

INSERT INTO products VALUES
    (2, 'Mouse', 29.99, 200),
    (3, 'Keyboard', 79.99, 150),
    (4, 'Monitor', 399.99, 75);

Insert with SELECT:

INSERT INTO high_value_products (id, name, price)
SELECT id, name, price FROM products WHERE price > 100;

Returning Inserted Data:

INSERT INTO products (id, name, price)
VALUES (5, 'Headphones', 199.99)
RETURNING id, name;

Querying Data

Basic SELECT:

SELECT * FROM products;
SELECT name, price FROM products;

Filtering with WHERE:

-- Comparison operators
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE stock <= 50;

-- Logical operators
SELECT * FROM products WHERE price > 50 AND stock > 0;
SELECT * FROM products WHERE name = 'Laptop' OR name = 'Monitor';

-- Pattern matching
SELECT * FROM products WHERE name LIKE '%top%';
SELECT * FROM products WHERE name LIKE 'Key%';

-- Range queries
SELECT * FROM products WHERE price BETWEEN 50 AND 500;
SELECT * FROM products WHERE id IN (1, 3, 5, 7);

-- NULL checks
SELECT * FROM products WHERE stock IS NULL;
SELECT * FROM products WHERE stock IS NOT NULL;

Sorting with ORDER BY:

-- Ascending (default)
SELECT * FROM products ORDER BY price;

-- Descending
SELECT * FROM products ORDER BY price DESC;

-- Multiple columns
SELECT * FROM products ORDER BY price DESC, name ASC;

Limiting Results:

-- Top 5 most expensive products
SELECT * FROM products ORDER BY price DESC LIMIT 5;

-- Skip first 10, get next 20
SELECT * FROM products LIMIT 20 OFFSET 10;

-- Pagination (page 3, 10 items per page)
SELECT * FROM products LIMIT 10 OFFSET 20;

Distinct Values:

-- Unique categories
SELECT DISTINCT category FROM products;

-- Count unique customers
SELECT COUNT(DISTINCT customer_id) FROM orders;

Updating Data

Update Single Column:

UPDATE products SET price = 1199.99 WHERE id = 1;

Update Multiple Columns:

UPDATE products
SET price = price * 0.9, stock = stock - 10
WHERE category = 'electronics';

Conditional Updates:

-- Mark out-of-stock products
UPDATE products SET stock = 0 WHERE stock < 0;

-- Update with calculation
UPDATE products SET price = price * 1.1 WHERE last_updated < '2025-01-01';

Deleting Data

Delete Specific Rows:

DELETE FROM products WHERE id = 5;
DELETE FROM products WHERE stock = 0 AND discontinued = true;

Delete All Rows (careful!):

DELETE FROM products;  -- Keeps table structure

Drop Table:

DROP TABLE products;  -- Removes table entirely
DROP TABLE IF EXISTS products;  -- No error if doesn't exist

Aggregations

Basic Aggregates:

SELECT COUNT(*) FROM products;
SELECT SUM(price) FROM products;
SELECT AVG(price) FROM products;
SELECT MIN(price) FROM products;
SELECT MAX(price) FROM products;

GROUP BY:

-- Products per category
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category;

-- Revenue per customer
SELECT customer_id, SUM(total_price) as total_revenue
FROM orders
GROUP BY customer_id;

-- Multiple aggregates
SELECT
    category,
    COUNT(*) as count,
    AVG(price) as avg_price,
    MIN(price) as min_price,
    MAX(price) as max_price
FROM products
GROUP BY category;

HAVING Clause:

-- Categories with more than 10 products
SELECT category, COUNT(*) as count
FROM products
GROUP BY category
HAVING COUNT(*) > 10;

-- High-value customers
SELECT customer_id, SUM(total_price) as revenue
FROM orders
GROUP BY customer_id
HAVING SUM(total_price) > 10000
ORDER BY revenue DESC;

Joins

INNER JOIN:

-- Orders with customer names
SELECT
    orders.order_id,
    users.name as customer_name,
    orders.total_price
FROM orders
INNER JOIN users ON orders.customer_id = users.id;

LEFT JOIN:

-- All customers, with or without orders
SELECT
    users.name,
    COUNT(orders.order_id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.customer_id
GROUP BY users.id, users.name;

Multiple Joins:

-- Full order details
SELECT
    orders.order_id,
    users.name as customer,
    products.name as product,
    orders.quantity,
    orders.total_price
FROM orders
INNER JOIN users ON orders.customer_id = users.id
INNER JOIN products ON orders.product_id = products.id;

Transactions

Automatic Transaction:

// Embedded API - auto-commit
db.execute("INSERT INTO users VALUES (1, 'Alice')")?;

Explicit Transaction:

// Begin transaction
let tx = db.begin_transaction()?;

// Multiple operations
tx.execute("INSERT INTO accounts VALUES (1, 1000)")?;
tx.execute("INSERT INTO accounts VALUES (2, 500)")?;
tx.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")?;
tx.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")?;

// Commit (or rollback on error)
tx.commit()?;

SQL Transactions:

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- If everything looks good
COMMIT;

-- Or if there's an issue
ROLLBACK;

Vector Search

Vector search enables semantic similarity searches for AI/ML applications. Instead of exact text matching, you can find items based on meaning.

Use Cases

RAG (Retrieval-Augmented Generation): Find relevant context for LLMs
Semantic Search: “Find products similar to this description”
Recommendation Systems: “Find users with similar preferences”
Image Search: Find visually similar images using embeddings
Duplicate Detection: Find near-duplicate documents

Creating a Vector Table

CREATE TABLE documents (
    id INT PRIMARY KEY,
    content TEXT,
    embedding VECTOR(384)  -- 384 dimensions (common for embeddings)
);

Common Dimensions:

384: sentence-transformers/all-MiniLM-L6-v2
768: BERT, OpenAI text-embedding-ada-002 (v1)
1536: OpenAI text-embedding-ada-002 (v2)
512: CLIP image embeddings

Creating a Vector Index

-- Cosine distance (most common for embeddings)
CREATE INDEX idx_doc_embeddings ON documents
USING hnsw (embedding vector_cosine_ops);

-- L2 distance (Euclidean)
CREATE INDEX idx_doc_embeddings ON documents
USING hnsw (embedding vector_l2_ops);

-- Inner product
CREATE INDEX idx_doc_embeddings ON documents
USING hnsw (embedding vector_ip_ops);

With Product Quantization (8-16x compression):

CREATE INDEX idx_doc_embeddings ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
    quantization = 'product',
    pq_subquantizers = 8,
    pq_centroids = 256
);

Inserting Vectors

From Application Code:

use heliosdb_lite::EmbeddedDatabase;

let db = EmbeddedDatabase::new_in_memory()?;

// Create table
db.execute("
    CREATE TABLE documents (
        id INT PRIMARY KEY,
        content TEXT,
        embedding VECTOR(384)
    )
")?;

// Sample embedding (in real app, from OpenAI/Cohere/etc.)
let embedding = vec![0.1, 0.2, 0.3, /* ... 381 more values */];
let embedding_str = format!("[{}]", embedding.iter()
    .map(|v| v.to_string())
    .collect::<Vec<_>>()
    .join(", "));

// Insert with embedding
db.execute(&format!(
    "INSERT INTO documents (id, content, embedding) VALUES (1, 'Rust tutorial', '{}')",
    embedding_str
))?;

SQL with Literal Vector:

INSERT INTO documents (id, content, embedding) VALUES
    (1, 'Machine learning basics', '[0.12, 0.34, 0.56, ...]'),
    (2, 'Deep learning tutorial', '[0.15, 0.38, 0.52, ...]');

Searching with Vectors

Find Top-K Similar Items:

-- Find 10 most similar documents (cosine distance)
SELECT
    id,
    content,
    embedding <=> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY distance
LIMIT 10;

Distance Operators:

<->: L2 distance (Euclidean)
<=>: Cosine distance (most common)
<#>: Inner product (dot product)

With Filters:

-- Semantic search within category
SELECT
    id,
    content,
    embedding <=> $1 AS distance
FROM documents
WHERE category = 'technology' AND published_date > '2025-01-01'
ORDER BY distance
LIMIT 10;

Distance Threshold:

-- Only return very similar items
SELECT id, content, embedding <=> $1 AS distance
FROM documents
WHERE (embedding <=> $1) < 0.3  -- Distance threshold
ORDER BY distance;

Real-World Example: RAG System

use heliosdb_lite::EmbeddedDatabase;

// Initialize database
let db = EmbeddedDatabase::new("./rag_db")?;

// Create knowledge base table
db.execute("
    CREATE TABLE knowledge_base (
        id BIGINT PRIMARY KEY,
        source TEXT,
        content TEXT,
        embedding VECTOR(1536)  -- OpenAI embeddings
    )
")?;

// Create index
db.execute("
    CREATE INDEX idx_kb_embeddings ON knowledge_base
    USING hnsw (embedding vector_cosine_ops)
    WITH (quantization = 'product', pq_subquantizers = 8)
")?;

// Function to get embedding from OpenAI
async fn get_embedding(text: &str) -> Vec<f32> {
    // Call OpenAI API
    // Return embedding
}

// Index a document
async fn index_document(db: &EmbeddedDatabase, doc: &str) -> Result<()> {
    let embedding = get_embedding(doc).await;
    let embedding_str = format!("[{}]", embedding.iter()
        .map(|v| v.to_string())
        .collect::<Vec<_>>()
        .join(", "));

    db.execute(&format!(
        "INSERT INTO knowledge_base (id, source, content, embedding)
         VALUES ({}, 'docs', '{}', '{}')",
        generate_id(), doc, embedding_str
    ))?;
    Ok(())
}

// Retrieve relevant context
async fn retrieve_context(db: &EmbeddedDatabase, query: &str, k: usize) -> Result<Vec<String>> {
    let query_embedding = get_embedding(query).await;
    let embedding_str = format!("[{}]", query_embedding.iter()
        .map(|v| v.to_string())
        .collect::<Vec<_>>()
        .join(", "));

    let results = db.query(&format!(
        "SELECT content FROM knowledge_base
         ORDER BY embedding <=> '{}'
         LIMIT {}",
        embedding_str, k
    ), &[])?;

    Ok(results.iter()
        .map(|row| row.get(0).unwrap().to_string())
        .collect())
}

// Usage in RAG pipeline
async fn generate_answer(query: &str) -> String {
    let db = EmbeddedDatabase::new("./rag_db").unwrap();

    // 1. Retrieve relevant context
    let context = retrieve_context(&db, query, 5).await.unwrap();

    // 2. Build prompt
    let prompt = format!(
        "Context: {}\n\nQuestion: {}\n\nAnswer:",
        context.join("\n\n"),
        query
    );

    // 3. Call LLM (GPT-4, Claude, etc.)
    call_llm(&prompt).await
}

Vector Performance Tips

1. Use Product Quantization for Large Datasets:

-- 100K+ vectors: Use PQ to save memory
CREATE INDEX idx WITH (quantization = 'product', pq_subquantizers = 8);

2. Batch Insert Vectors:

// Better: Batch insert
let mut values = Vec::new();
for (id, content, embedding) in documents {
    values.push(format!("({}, '{}', '{}')", id, content, embedding_str));
}
db.execute(&format!("INSERT INTO documents VALUES {}", values.join(", ")))?;

// Worse: Individual inserts
for doc in documents {
    db.execute(&format!("INSERT INTO documents VALUES ..."))?;
}

3. Choose Right Distance Metric:

Cosine: Normalized embeddings (OpenAI, most models) - USE THIS if unsure
L2: Absolute distances matter
Inner Product: Pre-normalized vectors, maximum similarity

4. Tune HNSW Parameters:

CREATE INDEX idx ON documents USING hnsw (embedding vector_cosine_ops)
WITH (
    m = 16,              -- Graph connectivity (higher = better recall, more memory)
    ef_construction = 200 -- Build quality (higher = better quality, slower build)
);

Time-Travel Queries

Time-travel queries let you view data as it existed at any point in time. This is powerful for auditing, debugging, and recovering from mistakes.

How It Works

HeliosDB-Lite uses Multi-Version Concurrency Control (MVCC) to keep historical versions of data. Every change creates a new version without overwriting the old one.

Storage Retention: By default, all versions are kept. Configure retention policy:

let config = Config {
    retention_days: Some(30),  // Keep 30 days of history
    ..Default::default()
};

AS OF TIMESTAMP

Query data as it existed at a specific timestamp:

-- View orders table as of yesterday
SELECT * FROM orders
AS OF TIMESTAMP '2025-11-20 00:00:00';

-- Count orders from last week
SELECT COUNT(*) FROM orders
AS OF TIMESTAMP '2025-11-14 00:00:00';

-- Compare current vs historical
SELECT
    current.product_id,
    current.price as current_price,
    historical.price as yesterday_price,
    current.price - historical.price as price_change
FROM products current
JOIN products AS OF TIMESTAMP '2025-11-20 00:00:00' historical
    ON current.product_id = historical.product_id
WHERE current.price != historical.price;

AS OF TRANSACTION

Query data as of a specific transaction ID:

-- View data at transaction 12345
SELECT * FROM orders AS OF TRANSACTION 12345;

-- Get transaction ID
SELECT pg_current_xact_id();  -- Returns current transaction ID

AS OF SCN

Query data as of a System Change Number:

-- View data at SCN 987654321
SELECT * FROM orders AS OF SCN 987654321;

Real-World Use Cases

1. Audit Trail:

-- Show what customer saw when they complained
SELECT * FROM product_catalog
AS OF TIMESTAMP '2025-11-15 14:23:00'
WHERE product_id = 'WIDGET-123';

-- Verify price at time of purchase
SELECT price FROM products
AS OF TIMESTAMP (SELECT created_at FROM orders WHERE order_id = 5678)
WHERE product_id = 42;

2. Data Recovery:

-- Find accidentally deleted rows
SELECT * FROM customers
AS OF TIMESTAMP '2025-11-20 23:00:00'  -- Before deletion
WHERE customer_id NOT IN (SELECT customer_id FROM customers);

-- Restore deleted data
INSERT INTO customers
SELECT * FROM customers AS OF TIMESTAMP '2025-11-20 23:00:00'
WHERE customer_id = 999;

3. Debugging:

-- What changed between two points in time?
SELECT
    before.order_id,
    before.status as old_status,
    after.status as new_status,
    before.total as old_total,
    after.total as new_total
FROM orders AS OF TIMESTAMP '2025-11-20 00:00:00' before
FULL OUTER JOIN orders AS OF TIMESTAMP '2025-11-21 00:00:00' after
    USING (order_id)
WHERE before.status != after.status OR before.total != after.total;

4. Trend Analysis:

-- Daily inventory levels over past week
SELECT
    '2025-11-14' as date, SUM(stock) FROM products AS OF TIMESTAMP '2025-11-14 23:59:59'
UNION ALL
SELECT
    '2025-11-15', SUM(stock) FROM products AS OF TIMESTAMP '2025-11-15 23:59:59'
UNION ALL
SELECT
    '2025-11-16', SUM(stock) FROM products AS OF TIMESTAMP '2025-11-16 23:59:59'
-- ... etc

5. Compliance and Reporting:

-- End-of-quarter snapshot for financial reporting
SELECT
    customer_id,
    SUM(total_price) as quarterly_revenue
FROM orders
AS OF TIMESTAMP '2025-09-30 23:59:59'
WHERE created_at BETWEEN '2025-07-01' AND '2025-09-30'
GROUP BY customer_id;

Performance Considerations

1. Recent Data is Faster:

Queries on recent timestamps are faster (fewer versions to traverse)
Very old timestamps may be slower

2. Index Usage:

-- Indexes work with time-travel queries
CREATE INDEX idx_orders_date ON orders(created_at);

SELECT * FROM orders
AS OF TIMESTAMP '2025-11-20 00:00:00'
WHERE created_at > '2025-11-01';  -- Uses index

3. Storage Management:

// Configure retention to manage storage
let config = Config {
    retention_days: Some(90),     // Keep 90 days
    compact_interval_hours: 24,   // Compact daily
    ..Default::default()
};

Database Branching

Database branching brings git-like workflows to your database. Create isolated copies, make changes, test thoroughly, then merge or discard.

Why Use Branching?

Traditional Problem:

Testing on production is risky
Staging databases drift from production
Rolling back is hard
Zero-downtime deployments are complex

Branching Solution:

Create instant branch from production
Test changes in isolation
Merge back when ready
Or discard if tests fail

Creating Branches

Basic Branch:

-- Create branch from current state
CREATE DATABASE BRANCH staging FROM CURRENT AS OF NOW;

Branch from Specific Time:

-- Create branch from yesterday
CREATE DATABASE BRANCH test FROM main
AS OF TIMESTAMP '2025-11-20 00:00:00';

Branch from Transaction:

-- Create branch from specific transaction
CREATE DATABASE BRANCH backup FROM CURRENT
AS OF TRANSACTION 987654;

With Options:

CREATE DATABASE BRANCH prod_copy FROM main AS OF NOW
WITH (
    replication_factor = 3,
    region = 'us-west-2'
);

Working with Branches

Switch Between Branches:

// In application code
let db = EmbeddedDatabase::new_branch("./mydb", "staging")?;

List Branches:

-- View all branches
SELECT * FROM pg_database_branches();

-- Output:
-- branch_name | parent_branch | created_at          | size_bytes
-- main        | NULL          | 2025-11-01 00:00:00 | 157286400
-- staging     | main          | 2025-11-20 10:00:00 | 157286400
-- dev         | main          | 2025-11-20 11:00:00 | 157286400

Branch Metadata:

-- Check branch details
SELECT
    branch_name,
    ROUND(size_mb::FLOAT / 1024, 2) as size_gb,
    created_at,
    status
FROM pg_database_branches()
WHERE status = 'active';

Merging Branches

Simple Merge:

-- Merge staging into main
MERGE DATABASE BRANCH staging INTO main;

Merge with Conflict Resolution:

MERGE DATABASE BRANCH staging INTO main
WITH (
    conflict_resolution = 'branch_wins',  -- Source wins conflicts
    delete_branch_after = true             -- Auto-cleanup
);

Conflict Resolution Options:

branch_wins: Source branch changes override target
target_wins: Target branch changes override source
fail: Fail on any conflict (manual resolution required)

Deleting Branches

-- Delete branch
DROP DATABASE BRANCH staging;

-- Delete if exists (no error)
DROP DATABASE BRANCH IF EXISTS staging;

Real-World Workflows

1. Development Workflow:

-- 1. Create dev branch
CREATE DATABASE BRANCH dev FROM main AS OF NOW;

-- 2. Developer works in dev branch
-- (make changes, test, iterate)

-- 3. Run tests
-- (application-level testing)

-- 4. Merge to staging for QA
CREATE DATABASE BRANCH staging FROM dev AS OF NOW;

-- 5. QA approves, merge to main
MERGE DATABASE BRANCH staging INTO main
WITH (conflict_resolution = 'branch_wins', delete_branch_after = true);

-- 6. Cleanup dev branch
DROP DATABASE BRANCH dev;

2. Blue-Green Deployment:

-- 1. Production runs on 'blue' branch
-- (current production environment)

-- 2. Create 'green' branch for new version
CREATE DATABASE BRANCH green FROM blue AS OF NOW;

-- 3. Deploy new application version to green
-- (application connects to green branch)

-- 4. Run tests on green
-- (smoke tests, integration tests)

-- 5. Switch traffic to green (at application layer)
-- (update connection string to use green)

-- 6. Monitor for issues
-- (if issues found, switch back to blue instantly)

-- 7. If successful, make green the new blue
MERGE DATABASE BRANCH green INTO blue;
DROP DATABASE BRANCH green;
CREATE DATABASE BRANCH green FROM blue AS OF NOW;  -- Prepare for next deployment

3. Testing Schema Changes:

-- 1. Create test branch
CREATE DATABASE BRANCH schema_test FROM main AS OF NOW;

-- 2. Apply schema changes in test branch
ALTER TABLE users ADD COLUMN preferences JSONB;
CREATE INDEX idx_user_prefs ON users(preferences);

-- 3. Test queries
SELECT * FROM users WHERE preferences->>'theme' = 'dark';

-- 4. If tests pass, apply to main
MERGE DATABASE BRANCH schema_test INTO main;

-- 5. Cleanup
DROP DATABASE BRANCH schema_test;

4. A/B Testing:

-- 1. Create two branches for A/B test
CREATE DATABASE BRANCH variant_a FROM main AS OF NOW;
CREATE DATABASE BRANCH variant_b FROM main AS OF NOW;

-- 2. Apply different changes to each
-- In variant_a:
UPDATE product_recommendations SET algorithm = 'collaborative_filtering';

-- In variant_b:
UPDATE product_recommendations SET algorithm = 'content_based';

-- 3. Run A/B test (application routes users to branches)
-- (collect metrics, analyze results)

-- 4. Merge winning variant
MERGE DATABASE BRANCH variant_a INTO main;

-- 5. Cleanup
DROP DATABASE BRANCH variant_a;
DROP DATABASE BRANCH variant_b;

5. Disaster Recovery:

-- Regular backups as branches
CREATE DATABASE BRANCH backup_daily FROM main AS OF NOW;

-- If disaster strikes:
-- 1. Create new main from backup
CREATE DATABASE BRANCH main_new FROM backup_daily AS OF NOW;

-- 2. Verify data integrity
SELECT COUNT(*) FROM users;

-- 3. Replace main
DROP DATABASE BRANCH main;
ALTER DATABASE BRANCH main_new RENAME TO main;

Branch Management Best Practices

1. Naming Convention:

<purpose>-<timestamp>
<purpose>-<version>
<team>-<feature>

Examples:
- dev-20251121
- staging-v2.1
- data-team-analytics
- backup-daily

2. Lifecycle:

Create → Work → Test → Merge → Delete

3. Monitor Size:

-- Find large branches
SELECT branch_name, size_mb
FROM pg_database_branches()
WHERE size_mb > 1000
ORDER BY size_mb DESC;

4. Automated Cleanup:

-- Drop old test branches (pseudocode for scheduled job)
DO $$
DECLARE
  branch_rec RECORD;
BEGIN
  FOR branch_rec IN
    SELECT branch_name FROM pg_database_branches()
    WHERE branch_name LIKE 'test-%'
      AND created_at < NOW() - INTERVAL '7 days'
  LOOP
    EXECUTE 'DROP DATABASE BRANCH ' || branch_rec.branch_name;
  END LOOP;
END $$;

Materialized Views

Materialized Views (MVs) are precomputed query results stored as tables. They speed up complex queries at the cost of some staleness.

Why Use Materialized Views?

Problem: Complex aggregate queries are slow

-- This query is slow on millions of rows
SELECT
    customer_id,
    COUNT(*) as order_count,
    SUM(total) as revenue,
    AVG(total) as avg_order_value
FROM orders
GROUP BY customer_id;

Solution: Precompute and cache the results

CREATE MATERIALIZED VIEW customer_stats AS
SELECT
    customer_id,
    COUNT(*) as order_count,
    SUM(total) as revenue,
    AVG(total) as avg_order_value
FROM orders
GROUP BY customer_id;

-- Now this is instant
SELECT * FROM customer_stats WHERE customer_id = 123;

Creating Materialized Views

Basic MV:

CREATE MATERIALIZED VIEW daily_sales AS
SELECT
    DATE(created_at) as date,
    COUNT(*) as orders,
    SUM(total) as revenue
FROM orders
GROUP BY DATE(created_at);

MV with Indexes:

-- Create MV
CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) as order_count, SUM(total) as revenue
FROM orders
GROUP BY user_id;

-- Add index for fast lookups
CREATE INDEX idx_user_stats_revenue ON user_stats(revenue DESC);

Auto-Refresh MV:

CREATE MATERIALIZED VIEW product_popularity AS
SELECT
    product_id,
    COUNT(*) as view_count,
    COUNT(DISTINCT user_id) as unique_viewers
FROM product_views
GROUP BY product_id
WITH (
    auto_refresh = true,           -- Enable automatic refresh
    max_cpu_percent = 15,           -- Limit CPU usage to 15%
    threshold_dml_rate = 100,       -- Refresh when 100+ DMLs/sec
    lazy_update = true,             -- Defer during high load
    lazy_catchup_window = '1 hour' -- Max staleness allowed
);

Configuration Options

Option	Type	Default	Description
`auto_refresh`	boolean	false	Enable background auto-refresh
`max_cpu_percent`	float	15.0	Maximum CPU usage (0-100)
`threshold_table_size`	string	’0’	Min base table size to trigger refresh
`threshold_dml_rate`	int	0	DMLs/sec to trigger refresh
`lazy_update`	boolean	false	Defer refresh during high load
`lazy_catchup_window`	string	’1h’	Maximum staleness allowed

Refreshing Materialized Views

Manual Refresh:

-- Blocking refresh (locks the MV)
REFRESH MATERIALIZED VIEW user_stats;

-- Concurrent refresh (non-blocking, users can still query)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

Automatic Refresh:

-- HeliosDB-Lite automatically refreshes based on:
-- 1. DML rate (inserts/updates/deletes per second)
-- 2. CPU availability (respects max_cpu_percent)
-- 3. Staleness (lazy_catchup_window)

-- Check auto-refresh status
SELECT * FROM pg_mv_staleness();

Monitoring Materialized Views

Check Staleness:

SELECT
    view_name,
    staleness_sec / 60 as staleness_minutes,
    pending_changes,
    status
FROM pg_mv_staleness()
ORDER BY staleness_sec DESC;

Output:

view_name        | staleness_minutes | pending_changes | status
-----------------+-------------------+-----------------+-----------
user_stats       | 12.5              | 1250            | stale
product_stats    | 0.5               | 50              | fresh
daily_sales      | 0.0               | 0               | fresh

Check CPU Usage:

SELECT
    view_name,
    max_cpu_percent,
    current_cpu_percent,
    status
FROM pg_mv_cpu_usage();

Dropping Materialized Views

-- Drop MV
DROP MATERIALIZED VIEW user_stats;

-- Drop if exists
DROP MATERIALIZED VIEW IF EXISTS user_stats;

Real-World Use Cases

1. Analytics Dashboard:

-- Instead of slow live queries
CREATE MATERIALIZED VIEW dashboard_metrics AS
SELECT
    DATE(created_at) as date,
    COUNT(*) as total_orders,
    SUM(total) as revenue,
    COUNT(DISTINCT customer_id) as unique_customers,
    AVG(total) as avg_order_value
FROM orders
GROUP BY DATE(created_at)
WITH (
    auto_refresh = true,
    max_cpu_percent = 10,
    threshold_dml_rate = 50
);

-- Dashboard queries are now instant
SELECT * FROM dashboard_metrics
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY date DESC;

2. Leaderboard:

CREATE MATERIALIZED VIEW user_leaderboard AS
SELECT
    user_id,
    users.username,
    COUNT(*) as games_played,
    SUM(score) as total_score,
    AVG(score) as avg_score,
    MAX(score) as high_score,
    RANK() OVER (ORDER BY SUM(score) DESC) as rank
FROM game_results
JOIN users ON game_results.user_id = users.id
GROUP BY user_id, users.username
WITH (
    auto_refresh = true,
    max_cpu_percent = 20,
    lazy_catchup_window = '5 minutes'
);

-- Leaderboard API is fast
SELECT * FROM user_leaderboard ORDER BY rank LIMIT 100;

3. E-commerce Product Stats:

CREATE MATERIALIZED VIEW product_stats AS
SELECT
    products.product_id,
    products.name,
    COUNT(DISTINCT orders.customer_id) as buyers,
    COUNT(orders.order_id) as times_purchased,
    SUM(orders.quantity) as units_sold,
    SUM(orders.total) as revenue,
    AVG(reviews.rating) as avg_rating,
    COUNT(reviews.review_id) as review_count
FROM products
LEFT JOIN orders ON products.product_id = orders.product_id
LEFT JOIN reviews ON products.product_id = reviews.product_id
GROUP BY products.product_id, products.name
WITH (auto_refresh = true, max_cpu_percent = 15);

-- Product page loads instantly
SELECT * FROM product_stats WHERE product_id = 123;

4. Real-Time Inventory:

CREATE MATERIALIZED VIEW inventory_status AS
SELECT
    warehouse_id,
    product_id,
    SUM(quantity) as on_hand,
    SUM(CASE WHEN status = 'reserved' THEN quantity ELSE 0 END) as reserved,
    SUM(quantity) - SUM(CASE WHEN status = 'reserved' THEN quantity ELSE 0 END) as available
FROM inventory
GROUP BY warehouse_id, product_id
WITH (
    auto_refresh = true,
    max_cpu_percent = 25,
    threshold_dml_rate = 200,  -- High rate for inventory updates
    lazy_catchup_window = '30 seconds'  -- Near real-time
);

Performance Tuning

1. Start Conservative:

-- Begin with safe settings
WITH (
    max_cpu_percent = 10,
    lazy_update = true,
    lazy_catchup_window = '1 hour'
)

-- Gradually increase based on monitoring

2. Index the MV:

-- Create indexes for common queries
CREATE INDEX idx_user_stats_revenue ON user_stats(revenue DESC);
CREATE INDEX idx_daily_sales_date ON daily_sales(date DESC);

3. Partition Large MVs:

-- Instead of one huge MV
CREATE MATERIALIZED VIEW sales_2025 AS
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

CREATE MATERIALIZED VIEW sales_2024 AS
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

4. Choose Right Refresh Strategy:

-- Low-traffic: Aggressive refresh
WITH (max_cpu_percent = 30, threshold_dml_rate = 10)

-- High-traffic: Conservative refresh
WITH (max_cpu_percent = 10, lazy_update = true)

-- Critical: Manual refresh during maintenance window
WITH (auto_refresh = false)
-- Then: REFRESH MATERIALIZED VIEW CONCURRENTLY during off-peak

Next Steps

Congratulations! You’ve learned the fundamentals of HeliosDB-Lite. Here’s what to explore next:

Advanced Features

1. Multi-Tenancy:

-- Row-Level Security for SaaS applications
CREATE TABLE documents (
    id BIGINT PRIMARY KEY,
    tenant_id INT NOT NULL,
    content TEXT
);

CREATE POLICY tenant_isolation ON documents
    USING (tenant_id = current_tenant_id());

2. Full-Text Search:

-- Coming in v2.1
CREATE INDEX idx_fts ON documents USING gin(to_tsvector('english', content));
SELECT * FROM documents WHERE to_tsvector(content) @@ to_tsquery('database & vector');

3. JSONB Operations:

-- Store and query JSON efficiently
CREATE TABLE users (
    id INT PRIMARY KEY,
    metadata JSONB
);

INSERT INTO users VALUES (1, '{"name": "Alice", "age": 30, "tags": ["vip", "premium"]}');

SELECT * FROM users WHERE metadata->>'age' > '28';
SELECT * FROM users WHERE metadata->'tags' ? 'vip';

4. Encryption:

use heliosdb_lite::{EmbeddedDatabase, Config};
use heliosdb_lite::crypto::EncryptionKey;

// Generate encryption key
let key = EncryptionKey::generate()?;

// Create encrypted database
let config = Config {
    encryption_key: Some(key),
    ..Default::default()
};
let db = EmbeddedDatabase::with_config("./secure_db", config)?;

// All data is transparently encrypted

Learning Resources

Documentation:

SQL Reference - Complete SQL syntax
Vector Search Guide - Deep dive into embeddings
Phase 3 User Guide - Advanced v2.0 features
System Catalog - System views reference
API Documentation - Rust API reference

Examples:

# Browse example code
ls examples/
# - quickstart.rs (basics)
# - vector_search_demo.rs (semantic search)
# - encryption_demo.rs (TDE)
# - concurrent_mv_refresh_demo.rs (materialized views)

Configuration:

Integration Guides

ORM Support:

# Read ORM integration guide
cat docs/ORM_SUPPORT.md

Server Mode:

# Start PostgreSQL-compatible server
heliosdb-lite server --port 5432 --data ./mydb

# Connect with any PostgreSQL client
psql postgresql://localhost:5432/mydb

Migration:

# Migrate from PostgreSQL/SQLite
cat docs/MIGRATION.md

Community and Support

Get Help:

GitHub Issues: https://github.com/dimensigon/HeliosDB-Lite/issues
Documentation: https://docs.heliosdb.com/lite
Community Forum: [Coming Soon]

Contributing:

Read CONTRIBUTING.md
Check open issues
Submit pull requests

Stay Updated:

Watch the repository
Follow release notes
Join mailing list [Coming Soon]

Project Ideas

1. Build a RAG System:

Index documentation with embeddings
Semantic search for relevant context
Feed to LLM for accurate answers

2. E-commerce Backend:

Multi-tenant product catalog
Vector search for product recommendations
Materialized views for analytics

3. Git-like Version Control:

Use database branching for workflows
Time-travel for audit trails
Blue-green deployments

4. Real-Time Dashboard:

Materialized views for aggregates
Auto-refresh for near real-time data
Vector clustering for insights

5. Mobile App Backend:

Embedded mode for offline-first
Server mode for sync
Encryption for security

Quick Reference Card

Common Commands:

-- Tables
CREATE TABLE name (col TYPE);
DROP TABLE name;
\d name  -- Describe table (REPL)

-- CRUD
INSERT INTO table VALUES (...);
SELECT * FROM table WHERE condition;
UPDATE table SET col = val WHERE condition;
DELETE FROM table WHERE condition;

-- Indexes
CREATE INDEX idx_name ON table(col);
CREATE INDEX idx_vec ON table USING hnsw (vec vector_cosine_ops);

-- Transactions
BEGIN; ... COMMIT;
BEGIN; ... ROLLBACK;

-- Vector Search
SELECT * FROM docs ORDER BY embedding <=> '[...]' LIMIT 10;

-- Time-Travel
SELECT * FROM table AS OF TIMESTAMP '2025-11-20 00:00:00';

-- Branching
CREATE DATABASE BRANCH dev FROM main AS OF NOW;
MERGE DATABASE BRANCH dev INTO main;
DROP DATABASE BRANCH dev;

-- Materialized Views
CREATE MATERIALIZED VIEW mv AS SELECT ...;
REFRESH MATERIALIZED VIEW mv;
DROP MATERIALIZED VIEW mv;

-- System Views
SELECT * FROM pg_database_branches();
SELECT * FROM pg_mv_staleness();
SELECT * FROM pg_vector_index_stats();

REPL Meta-Commands:

\h           Help
\d           List tables
\d table     Describe table
\dS          List system views
\dS view     Describe system view
\q           Quit

Rust API:

// Create database
let db = EmbeddedDatabase::new("./mydb")?;
let db = EmbeddedDatabase::new_in_memory()?;

// Execute SQL
db.execute("CREATE TABLE ...")?;
let count = db.execute("INSERT ...")?;

// Query
let results = db.query("SELECT * FROM users", &[])?;

// Transaction
let tx = db.begin_transaction()?;
tx.execute("INSERT ...")?;
tx.commit()?;

Troubleshooting

Problem: “Table already exists” error

-- Solution: Use IF NOT EXISTS
CREATE TABLE IF NOT EXISTS users (...);

Problem: Vector dimensions mismatch

-- Solution: Ensure all vectors have same dimension
CREATE TABLE docs (embedding VECTOR(384));  -- All must be 384-dim

Problem: Slow vector search

-- Solution: Create HNSW index
CREATE INDEX idx ON docs USING hnsw (embedding vector_cosine_ops);

Problem: Materialized view is stale

-- Solution: Manual refresh
REFRESH MATERIALIZED VIEW mv;

-- Or check auto-refresh settings
SELECT * FROM pg_mv_staleness();

Problem: Out of memory with large vectors

-- Solution: Use Product Quantization
CREATE INDEX idx ON docs USING hnsw (embedding vector_cosine_ops)
WITH (quantization = 'product', pq_subquantizers = 8);

Glossary

ACID: Atomicity, Consistency, Isolation, Durability (transaction guarantees)
Embedding: Vector representation of data (text, image, etc.)
HNSW: Hierarchical Navigable Small World (vector index algorithm)
MVCC: Multi-Version Concurrency Control (enables time-travel)
MV: Materialized View (cached query result)
PQ: Product Quantization (vector compression technique)
RAG: Retrieval-Augmented Generation (LLM + knowledge base)
Semantic Search: Search by meaning, not keywords
TDE: Transparent Data Encryption

Document Version: 1.0 Last Updated: November 21, 2025 Applies To: HeliosDB-Lite v2.0.0 and later

For questions or feedback, please open an issue on GitHub.

Happy coding with HeliosDB-Lite! 🚀

Getting Started with HeliosDB-Lite v2.0

Getting Started with HeliosDB-Lite v2.0

Table of Contents

Introduction

What is HeliosDB-Lite?

Who Should Use HeliosDB-Lite?

What Makes It Different?

Installation

Prerequisites

Option 1: Install with Cargo (Recommended)

Option 2: Build from Source

Option 3: Add as Library Dependency

Option 4: Docker (Coming Soon)

Verify Installation

Quick Start (5-Minute Tutorial)

Step 1: Start the Interactive REPL

Step 2: Create Your First Table

Step 3: Insert Data

Step 4: Query Data

Step 5: Filter and Aggregate

Step 6: Update and Delete

Step 7: Exit the REPL

Core Concepts

Databases and Tables

Indexes

Transactions

Storage Modes

Basic Operations

Creating Tables

Inserting Data

Querying Data

Updating Data

Deleting Data

Aggregations

Joins

Transactions

Vector Search

Use Cases

Creating a Vector Table

Creating a Vector Index

Inserting Vectors

Searching with Vectors

Real-World Example: RAG System

Vector Performance Tips

Time-Travel Queries

How It Works

AS OF TIMESTAMP

AS OF TRANSACTION

AS OF SCN

Real-World Use Cases

Performance Considerations

Database Branching

Why Use Branching?

Creating Branches

Working with Branches

Merging Branches

Deleting Branches

Real-World Workflows

Branch Management Best Practices

Materialized Views

Why Use Materialized Views?

Creating Materialized Views

Configuration Options

Refreshing Materialized Views

Monitoring Materialized Views

Dropping Materialized Views

Real-World Use Cases

Performance Tuning

Next Steps

Advanced Features

Learning Resources

Integration Guides

Community and Support

Project Ideas

Quick Reference Card

Troubleshooting

Glossary