HeliosDB Nano SQLite Layer Architecture - User Guide

A user-friendly explanation of how HeliosDB Nano works under the hood, designed for developers who want to understand the system without diving into the source code.

Overview: What Makes HeliosDB Nano Different?

HeliosDB Nano is built from the ground up in Rust for performance, safety, and advanced features - but it’s designed to look and feel exactly like SQLite from a Python perspective.

The magic: HeliosDB Nano implements the Python DB-API 2.0 interface (same as SQLite) while using a completely different engine underneath.

┌─────────────────────────────────────────────────────────────┐
│                    Your Python Application                  │
│                  import heliosdb_sqlite as sqlite3          │
└──────────────────────────────┬──────────────────────────────┘
                               │
                               │ Python DB-API 2.0
                               │ (Same interface as SQLite)
                               │
┌──────────────────────────────▼──────────────────────────────┐
│              HeliosDB Nano SQLite Compatibility Layer       │
│                         (Python bindings)                    │
└──────────────────────────────┬──────────────────────────────┘
                               │
                               │ PyO3 (Python ↔ Rust bridge)
                               │
┌──────────────────────────────▼──────────────────────────────┐
│                   HeliosDB Nano Core Engine                 │
│                      (Written in Rust)                      │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ SQL Parser & │  │ Query Exec   │  │  Storage     │     │
│  │   Planner    │  │   Engine     │  │  Engine      │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Concurrency  │  │ Vector Search│  │ Time-Travel  │     │
│  │   Control    │  │   (HNSW)     │  │   (MVCC)     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└──────────────────────────────────────────────────────────┘

How HeliosDB Nano SQLite Layer Works

When You Import

import heliosdb_sqlite as sqlite3

What happens:

Python loads the HeliosDB Nano module (compiled Rust library with Python bindings)
The module registers itself as a SQLite-compatible database driver
All standard DB-API 2.0 methods (connect, execute, etc.) become available
No configuration required - it just works

Behind the scenes:

The Python module is a thin wrapper around the Rust core
PyO3 (Python-Rust bridge) handles type conversions automatically
Function calls are fast (near-native performance)

When You Connect to a Database

conn = sqlite3.connect('myapp.db')

What happens:

File detection:
- HeliosDB Nano checks if the file exists
- If it’s an existing SQLite database, it can read it directly
- If it’s a new file, it creates a HeliosDB Nano database
Engine initialization:
- Opens or creates the database file
- Initializes the storage engine
- Sets up transaction management
- Prepares connection handle
Connection object returned:
- You get a Python connection object
- It implements all standard SQLite connection methods
- Plus optional HeliosDB Nano advanced features

File format:

HeliosDB Nano uses a compatible format for basic data
SQLite can read HeliosDB Nano databases (for basic tables)
HeliosDB Nano can read SQLite databases
Advanced features (vectors, branches) stored in extended sections

When You Execute a Query

cursor = conn.cursor()
cursor.execute('SELECT * FROM users WHERE age > ?', (18,))
results = cursor.fetchall()

What happens:

1. Python → Rust Bridge
   ┌─────────────────────────────────────────────────────┐
   │ cursor.execute(sql, params)                         │
   │   ↓                                                 │
   │ PyO3 converts Python types to Rust types            │
   │   ↓                                                 │
   │ Rust receives: &str (SQL) + Vec<Value> (params)    │
   └─────────────────────────────────────────────────────┘

2. SQL Parsing
   ┌─────────────────────────────────────────────────────┐
   │ Parse SQL into Abstract Syntax Tree (AST)           │
   │ Validate syntax                                     │
   │ Extract table references, columns, conditions       │
   └─────────────────────────────────────────────────────┘

3. Query Planning
   ┌─────────────────────────────────────────────────────┐
   │ Analyze query and data statistics                   │
   │ Choose optimal execution strategy                   │
   │ Determine index usage                               │
   │ Optimize joins and filters                          │
   └─────────────────────────────────────────────────────┘

4. Execution
   ┌─────────────────────────────────────────────────────┐
   │ Access storage engine for table data                │
   │ Apply filters (WHERE age > 18)                      │
   │ Use indexes if available                            │
   │ Collect matching rows                               │
   └─────────────────────────────────────────────────────┘

5. Result Conversion
   ┌─────────────────────────────────────────────────────┐
   │ Convert Rust data structures to Python types        │
   │ List of tuples/rows returned to Python              │
   │ Cursor stores results for fetchall()/fetchone()     │
   └─────────────────────────────────────────────────────┘

Performance optimization:

Prepared statement caching
Query plan reuse
Zero-copy operations where possible
Parallel execution for complex queries

The Three Modes Explained

HeliosDB Nano can run in three different modes, giving you flexibility in how you deploy and use it.

Mode 1: Embedded Mode (Default)

How it works:

conn = sqlite3.connect('app.db')  # Default mode

┌──────────────────────────────────────────────┐
│          Your Python Application             │
│                                              │
│   ┌────────────────────────────────────┐    │
│   │   HeliosDB Nano Engine             │    │
│   │   (runs in same process)           │    │
│   └────────────────────────────────────┘    │
│                    ↓                         │
│              Database File                   │
│              (app.db)                        │
└──────────────────────────────────────────────┘

Characteristics:

Database engine runs inside your application process
Direct memory access (fastest)
No network overhead
Similar to how SQLite works
Perfect for single-application deployments

Use when:

Single application accessing database
Maximum performance needed
Embedded/desktop applications
Simple deployments

Mode 2: Server Mode (PostgreSQL-Compatible)

How it works:

server = sqlite3.start_server('app.db', port=5432)

┌─────────────────┐         ┌─────────────────┐
│   Application 1 │         │  Application 2  │
│   (Python)      │         │  (Node.js)      │
└────────┬────────┘         └────────┬────────┘
         │                           │
         │ PostgreSQL Protocol       │
         │                           │
         └────────────┬──────────────┘
                      │
         ┌────────────▼─────────────┐
         │  HeliosDB Nano Server    │
         │  (Network daemon)        │
         │                          │
         │    ┌──────────────┐      │
         │    │ Query Router │      │
         │    └──────┬───────┘      │
         │           │              │
         │    ┌──────▼────────┐     │
         │    │ Core Engine   │     │
         │    └──────┬────────┘     │
         └───────────┼──────────────┘
                     │
              ┌──────▼──────┐
              │ Database    │
              │ (app.db)    │
              └─────────────┘

Characteristics:

Database runs as separate server process
Multiple clients can connect
PostgreSQL wire protocol (use psycopg2, pg, etc.)
Network overhead, but enables multi-client access
Horizontal scaling possible

Use when:

Multiple applications need database access
Microservices architecture
Remote database access needed
Team development (shared database)

Mode 3: Hybrid Mode (Best of Both Worlds)

How it works:

conn = sqlite3.connect('app.db', heliosdb_mode='hybrid')

┌─────────────────────────────────────────────┐
│         Your Python Application             │
│                                             │
│  ┌──────────────────────────────────────┐   │
│  │  Direct Embedded Access (Fast)       │   │
│  │  ↓                                   │   │
│  │  HeliosDB Nano Core Engine           │   │
│  └──────────────┬───────────────────────┘   │
│                 │                           │
└─────────────────┼───────────────────────────┘
                  │
         ┌────────▼─────────┐
         │                  │
         │  Database File   │
         │                  │
         └────────▲─────────┘
                  │
┌─────────────────┼───────────────────────────┐
│  Server Thread  │                           │
│  ↓              │                           │
│  PostgreSQL Protocol Server                │
│  (Accepts remote connections)              │
└────────────────────────────────────────────┘
         ↑
         │ PostgreSQL Protocol
         │
┌────────┴─────────┐
│  Remote Client   │
│  (Dashboard,     │
│   Monitoring,    │
│   Admin tools)   │
└──────────────────┘

Characteristics:

Your application gets direct embedded access (fast!)
Server thread runs in background for remote access
Same database, different access methods
No performance penalty for local access
Enables remote monitoring/management

Use when:

Main application needs fast local access
Also need remote admin/monitoring
Want flexibility without choosing
Development/production parity

How Advanced Features Work

Vector Search

Architecture:

cursor.execute('''
    SELECT id, title,
           vector_distance(embedding, ?, 'cosine') as similarity
    FROM articles
    ORDER BY similarity
    LIMIT 10
''', (query_embedding,))

What happens:

Vector column stored as special data type (VECTOR(n))
HNSW (Hierarchical Navigable Small World) index created automatically
Query planner recognizes vector_distance function
Uses specialized vector search algorithm (not full table scan)
Results ranked by similarity

Storage:

Regular Table Storage:
┌──────┬────────┬──────────────────────────────────┐
│  id  │ title  │  content                         │
├──────┼────────┼──────────────────────────────────┤
│  1   │ "AI"   │ "Machine learning introduction"  │
└──────┴────────┴──────────────────────────────────┘

Vector Extension Storage:
┌──────┬──────────────────────────────────────────┐
│  id  │  embedding (384-dim vector)              │
├──────┼──────────────────────────────────────────┤
│  1   │  [0.23, 0.45, 0.12, ..., 0.67]          │
└──────┴──────────────────────────────────────────┘
          ↓
    HNSW Index (graph structure for fast search)

Performance:

Approximate nearest neighbor search: O(log n)
Much faster than brute force: O(n)
Trade-off: slight accuracy for massive speed

Time-Travel Queries

Architecture:

cursor.execute('''
    SELECT * FROM users
    AS OF TIMESTAMP '2024-01-01 10:00:00'
''')

How it works:

MVCC (Multi-Version Concurrency Control)

Current Data View:
┌──────┬─────────┬──────────────────────┐
│  id  │  name   │  version/timestamp   │
├──────┼─────────┼──────────────────────┤
│  1   │ "Alice" │  2024-06-01 15:00    │
└──────┴─────────┴──────────────────────┘

Historical Versions (stored separately):
┌──────┬─────────┬──────────────────────┐
│  id  │  name   │  version/timestamp   │
├──────┼─────────┼──────────────────────┤
│  1   │ "A"     │  2024-01-01 09:00    │
│  1   │ "Al"    │  2024-01-01 10:30    │
│  1   │ "Alice" │  2024-06-01 15:00    │ ← Current
└──────┴─────────┴──────────────────────┘

Query with AS OF TIMESTAMP:
- Looks up version <= specified timestamp
- Returns historical state
- No impact on current data

Storage overhead:

Only changed values stored (delta compression)
Configurable retention period
Automatic cleanup of old versions

Database Branching

Architecture:

branch = conn.create_branch('experiment')
branch_conn = sqlite3.connect('app.db', branch='experiment')

How it works:

Main Database:
┌────────────────────────────────────┐
│  Main branch (active)              │
│                                    │
│  ┌──────────────────────────────┐  │
│  │ Table: users                 │  │
│  │ 1000 rows                    │  │
│  └──────────────────────────────┘  │
└────────────────────────────────────┘
         │
         │ create_branch('experiment')
         │
         ▼
┌────────────────────────────────────┐
│  Experiment branch (copy-on-write) │
│                                    │
│  ┌──────────────────────────────┐  │
│  │ Only stores CHANGES           │  │
│  │ from main branch             │  │
│  │                              │  │
│  │ Modified rows: 50            │  │
│  │ Deleted rows: 10             │  │
│  │ New rows: 5                  │  │
│  └──────────────────────────────┘  │
└────────────────────────────────────┘

Reading from branch:
1. Check if row modified in branch → use branch version
2. If not modified → use main branch version
3. Apply deletes from branch
4. Add new rows from branch

Result: Full database view without duplicating unchanged data

Benefits:

Instant branch creation (no copying)
Minimal storage overhead
Safe experimentation
Easy merge or discard

Data Storage Explanation

File Structure

myapp.db (HeliosDB Nano database file)
│
├─ Header (metadata)
│  ├─ Format version
│  ├─ Page size
│  ├─ Schema version
│  └─ Feature flags
│
├─ Schema section
│  ├─ Table definitions
│  ├─ Index definitions
│  ├─ View definitions
│  └─ Constraints
│
├─ Data pages (table data)
│  ├─ B-tree structure (like SQLite)
│  ├─ Row storage
│  └─ Index structures
│
├─ Advanced features section (optional)
│  ├─ Vector indexes (HNSW graphs)
│  ├─ Time-travel versions (MVCC)
│  ├─ Branch metadata
│  └─ Encryption metadata
│
└─ Transaction log (WAL)
   ├─ Uncommitted changes
   └─ Recovery information

Compatibility:

Basic sections compatible with SQLite
Advanced sections ignored by SQLite (safe)
HeliosDB Nano reads both formats seamlessly

Performance Characteristics

Memory Usage

Per Connection:
- Base overhead: ~100 KB
- Query cache: ~10 MB (configurable)
- Connection buffers: ~1 MB

Per Table:
- Schema metadata: ~1-10 KB
- Index overhead: Varies (5-20% of table size)

Vector Search:
- HNSW index: ~20-40% of vector data size
- In-memory: Optional, speeds up queries 10-100x

Time-Travel:
- Historical versions: Configurable retention
- Delta compression reduces overhead to ~5-15%

Query Performance

Operation             | SQLite      | HeliosDB Nano
----------------------|-------------|---------------
Single row read       | 10 μs       | 12 μs         (comparable)
Indexed search        | 100 μs      | 105 μs        (comparable)
Full table scan       | 10 ms       | 11 ms         (comparable)
Single write          | 50 μs       | 75 μs         (slight overhead)
Concurrent reads      | Excellent   | Excellent     (same)
Concurrent writes     | Blocked     | Parallel!     (major win)
Vector search (10k)   | N/A         | 2 ms          (exclusive)
Time-travel query     | N/A         | 15 ms         (exclusive)

Key takeaway:

Similar performance for standard operations
Much better with concurrent writes
Advanced features add new capabilities (not slower alternatives)

Concurrency Model

How HeliosDB Nano Handles Concurrent Access

SQLite approach (locks):

Thread 1: BEGIN → Write → [Lock database] → COMMIT → [Unlock]
Thread 2:                    [WAIT...]                  → BEGIN → Write
Thread 3:                    [WAIT...]                           → BEGIN

HeliosDB Nano approach (MVCC):

Thread 1: BEGIN → Write to version 1 → COMMIT
Thread 2: BEGIN → Write to version 2 → COMMIT  (simultaneous!)
Thread 3: BEGIN → Read version 1    → done     (doesn't block writes)

Conflict resolution:
- Non-conflicting writes: Both succeed
- Conflicting writes: Last commit wins (configurable)
- Readers never block writers
- Writers never block readers

Implementation:

Transaction Log:

Time ─────────────────────────────▶

T1: [BEGIN] ────── [Write row A] ────── [COMMIT v1]
T2:         [BEGIN] ────── [Write row B] ────── [COMMIT v2]
T3:                 [BEGIN] [Read] [COMMIT]

All transactions see consistent snapshot
No locks needed!

What Happens When You Import

Detailed Import Process

import heliosdb_sqlite as sqlite3

Step-by-step:

Python loads.so/.dll (compiled Rust library)

HeliosDB Nano module ────▶ [OS loads shared library]

Module initialization (happens once)

// Rust code (simplified)
#[pymodule]
fn heliosdb_sqlite(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Connection>()?;
    m.add_class::<Cursor>()?;
    m.add_function(wrap_pyfunction!(connect, m)?)?;
    //... register all DB-API 2.0 functions
    Ok(())
}

Python receives module with all methods

# Now available:
sqlite3.connect(...)
sqlite3.Connection
sqlite3.Cursor
sqlite3.Row
# etc. - full DB-API 2.0 interface

No configuration needed - ready to use!

Threading and Async Support

Thread Safety

HeliosDB Nano is thread-safe by default (unlike SQLite):

# This works safely in HeliosDB Nano
conn = sqlite3.connect('app.db', check_same_thread=False)

def worker(thread_id):
    cursor = conn.cursor()  # Safe from multiple threads
    cursor.execute('INSERT INTO logs VALUES (?, ?)', (thread_id, 'message'))
    conn.commit()

threads = [Thread(target=worker, args=(i,)) for i in range(100)]
# All threads can use same connection safely!

How it’s safe:

Rust’s ownership model prevents data races
Internal synchronization (lock-free where possible)
MVCC eliminates most contention

Async/Await Support

import asyncio
import heliosdb_sqlite as sqlite3

async def async_query():
    loop = asyncio.get_event_loop()

    # Run blocking operation in thread pool
    def blocking_query():
        conn = sqlite3.connect('app.db')
        cursor = conn.cursor()
        cursor.execute('SELECT * FROM users')
        return cursor.fetchall()

    results = await loop.run_in_executor(None, blocking_query)
    return results

# Or use async connection pool
from heliosdb_sqlite.asyncio import AsyncConnectionPool

pool = AsyncConnectionPool('app.db')
async with pool.connection() as conn:
    cursor = conn.cursor()
    results = await cursor.execute('SELECT * FROM users')

Summary: Why This Architecture Matters

For Drop-In Compatibility:

Implements exact same Python interface as SQLite
You change one import line, everything works
Existing code, queries, databases all compatible

For Performance:

Rust core = fast and safe
MVCC = no write locks
Parallel execution where possible
Optimized for modern workloads

For Advanced Features:

Vector search: Built-in HNSW indexes
Time-travel: MVCC enables historical queries
Branching: Copy-on-write for safe experimentation
Encryption: Transparent at storage layer

For Flexibility:

Three modes: embedded, server, hybrid
Start simple, scale when needed
No architectural changes required

What’s Next?

Now that you understand how HeliosDB Nano works:

Try it out: HELIOSDB_SQLITE_DROP_IN_GUIDE.md
Migrate your app: HELIOSDB_SQLITE_MIGRATION_PATTERNS.md
Explore advanced features: HELIOSDB_SQLITE_ADVANCED_FEATURES.md
Troubleshoot issues: HELIOSDB_SQLITE_TROUBLESHOOTING.md
Read FAQ: HELIOSDB_SQLITE_FAQ.md

The architecture is designed to be invisible when you don’t need it, and powerful when you do.

Happy building with HeliosDB Nano!