Skip to content

HeliosDB-Lite AI Agent & Automation Technical Guide

HeliosDB-Lite AI Agent & Automation Technical Guide

Overview

This guide is designed for AI agents, automation systems, and programmatic access to HeliosDB-Lite. It provides structured information optimized for parsing and integration by intelligent systems.


1. System Specifications

Database Engine

  • Type: PostgreSQL-compatible embedded database
  • Storage: RocksDB with LSM-tree architecture
  • Transaction Model: MVCC with snapshot isolation
  • Concurrency: ACID transactions with row-level locking
  • Query Language: SQL (95%+ PostgreSQL compatibility)

Supported Deployment Modes

ModeLocationConcurrencyUse Case
embeddedIn-processSingle processDesktop apps, services
serverNetworkMultiple clientsWeb apps, APIs
in_memoryRAM onlySingle processTests, analytics
hybridDisk + cacheMultiple clientsProduction systems

Resource Requirements

ComponentMinimumRecommended
Memory50 MB512 MB - 4 GB
Disk10 MB100 MB - 10 GB
CPU1 core2+ cores

2. API Integration Points

Client Library APIs

Rust API (Native)

use heliosdb_lite::EmbeddedDatabase;
// Core operations
db.execute(sql: &str, params: &[Value]) -> Result<()>;
db.query(sql: &str, params: &[Value]) -> Result<Vec<Row>>;
db.begin_transaction() -> Result<Transaction>;
// Vector operations
db.create_vector_index(table: &str, column: &str, config: IndexConfig) -> Result<()>;
db.vector_search(store: &str, query: Vec<f32>, top_k: usize) -> Result<Vec<SearchResult>>;
// Branching
db.create_branch(name: &str, from: Option<&str>) -> Result<()>;
db.merge_branch(source: &str, target: &str) -> Result<()>;
// Time-travel
db.query_as_of_timestamp(sql: &str, params: &[Value], timestamp: &str) -> Result<Vec<Row>>;

PostgreSQL Wire Protocol

Supported for server mode (port 5432 by default):

postgresql://[user[:password]@][host][:port]/[database]
Example:
postgresql://localhost:5432/mydb
postgresql://user:pass@db.example.com:5432/production

Compatible clients:

  • psql (PostgreSQL CLI)
  • Python: psycopg2, asyncpg, sqlalchemy
  • JavaScript/Node.js: pg, node-postgres
  • Rust: sqlx, tokio-postgres, diesel
  • Java: jdbc
  • Go: database/sql, pgx
  • .NET: Npgsql

SDK Integrations

Official SDKs Available:

LanguageStatusLocation
Python✅ v3.0sdks/python/
TypeScript✅ v3.0sdks/typescript/
Go✅ v3.0sdks/go/
Rust✅ v3.0sdks/rust/

3. Query Specification

SQL Dialect

Supported Features:

  • DML: SELECT, INSERT, UPDATE, DELETE
  • DDL: CREATE/DROP/ALTER TABLE, INDEX, VIEW
  • Transactions: BEGIN, COMMIT, ROLLBACK
  • Temporal: AS OF TIMESTAMP/TRANSACTION/SCN
  • Vector operations: Vector types, HNSW indexes
  • Functions: Aggregate, scalar, window functions
  • Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS

Parameterized Query Format:

SELECT * FROM table WHERE column = $1 AND other = $2;
-- Parameter indices: $1, $2, $3, ... (1-indexed)

Data Types

TypeSizeExample
BOOLEAN1 bytetrue, false
INT22 bytes-32768 to 32767
INT44 bytes-2147483648 to 2147483647
INT88 bytes-2^63 to 2^63-1
FLOAT44 bytesSingle precision
FLOAT88 bytesDouble precision
DECIMAL(p,s)Variable123.45
TEXTVariable’string’
VARCHAR(n)Variable’string’
BYTEAVariableBinary data
TIMESTAMP8 bytes2025-01-15 12:00:00
DATE4 bytes2025-01-15
JSONVariable{“key”: “value”}
JSONBVariable{“key”: “value”}
VECTOR(n)Variable[0.1, 0.2, …]
UUID16 bytes550e8400-e29b-41d4-a716-446655440000
ARRAY[T]VariableARRAY[1,2,3]

Vector Operations

Vector Type Definition:

-- Define columns with vector type
column_name VECTOR(dimensions)
-- Example: 1536-dimensional embedding (OpenAI)
embedding VECTOR(1536)

Distance Operators:

OperatorMetricDistance
<=>Cosine0 to 2
<#>Euclidean≥ 0
<+>Manhattan≥ 0
<~>Dot Product-∞ to ∞

Index Creation:

CREATE INDEX idx_name ON table USING hnsw (vector_column distance_ops)
WITH (
m = 16, -- HNSW connection count
ef_construction = 200, -- Build quality
quantization = 'pq', -- Optional compression
pq_subquantizers = 8 -- PQ settings
);

4. System Views & Monitoring

Information Schema Views

Available system views for introspection:

-- List all tables
SELECT * FROM information_schema.tables WHERE table_schema = 'public';
-- List all columns
SELECT * FROM information_schema.columns WHERE table_name = 'table_name';
-- List all indexes
SELECT * FROM pg_indexes;
-- List all views
SELECT * FROM information_schema.views WHERE table_schema = 'public';
-- Database statistics
SELECT * FROM pg_stat_user_tables;
-- Index statistics
SELECT * FROM pg_stat_user_indexes;
-- Vector index statistics
SELECT * FROM pg_vector_index_stats;
-- Branch information
SELECT * FROM pg_database_branches();
-- Materialized view staleness
SELECT * FROM pg_mv_staleness();
-- Current database size
SELECT pg_database_size(current_database());

5. Configuration Parameters

Configuration File (heliosdb.toml)

[storage]
path = "./heliosdb-data" # Data directory
cache_size = 536870912 # 512 MB
compression = "zstd" # zstd, lz4, none
in_memory = false # Disable persistence
[server]
listen_addr = "0.0.0.0" # Listen address
port = 5432 # PostgreSQL port
max_connections = 100 # Connection pool size
connection_timeout_sec = 30 # Connection timeout
[performance]
enable_simd = true # SIMD acceleration
worker_threads = 4 # Query worker threads
parallel_query = true # Parallel query execution
buffer_pool_size = 268435456 # 256 MB
[encryption]
enabled = false # Encryption at rest
algorithm = "aes256-gcm" # AES-256-GCM
key_source = { environment = "HELIOSDB_KEY" }
[vector]
enabled = true # Vector search support
pq_enabled = true # Product Quantization
pq_auto_compress = true # Auto-compress vectors
pq_training_size = 10000 # PQ training sample size
[temporal]
time_travel_enabled = true # Time-travel queries
snapshot_retention_days = 30 # Keep snapshots for N days

Environment Variables

Terminal window
# Server mode
HELIOSDB_PORT=5432
HELIOSDB_LISTEN_ADDR=0.0.0.0
HELIOSDB_DATA_DIR=./data
# Encryption
HELIOSDB_ENCRYPTION_KEY=your-32-byte-key
# Performance
HELIOSDB_WORKER_THREADS=4
HELIOSDB_CACHE_SIZE=536870912
# Logging
HELIOSDB_LOG_LEVEL=info # trace, debug, info, warn, error
RUST_LOG=heliosdb_lite=debug

6. Error Handling & Status Codes

PostgreSQL Error Codes

Standard PostgreSQL SQLSTATE codes are used:

CodeClassMeaning
00000SUCCESSCommand successful
01000WARNINGWarning (non-fatal)
08000CONNECTIONConnection failure
42P01UNDEFINED_TABLETable does not exist
42P09UNDEFINED_OBJECTObject does not exist
23505UNIQUE_VIOLATIONUnique constraint violation
23503FOREIGN_KEYForeign key constraint violation
25001IN_FAILED_SQL_TRANSACTIONTransaction in failed state

Structured Error Response

{
"error": {
"code": "42P01",
"message": "relation \"table_name\" does not exist",
"detail": "Table 'table_name' was not found in schema 'public'",
"severity": "ERROR",
"sqlstate": "42P01"
}
}

7. Performance Characteristics

Query Performance

OperationThroughputLatency
Single row lookup100K+ QPS<1ms
Full table scan500K-1M rows/secVariable
Vector search (k=10)1K-5K QPS<100ms
Aggregation500K rows/secVariable
Join100K-500K rows/secVariable

Compression Ratios

Data TypeMethodRatio
NumericALP2-10x
StringFSST2-5x
Vector (768-dim)PQ (u8)384x
Vector (1536-dim)PQ (u8)768x

Memory Usage

Base system: ~50 MB
Per connection: ~1-2 MB
Cache (configurable): 50 MB to 4 GB
Index overhead: ~10-20% of data size

8. Integration Patterns

Pattern 1: Embedded Database

Use Case: Local application storage

// Initialization
let db = EmbeddedDatabase::new("./app.helio")?;
// Execute query
let result = db.query("SELECT * FROM users", &[])?;
// Proper cleanup
drop(db); // Implicit SYNC

Lifecycle:

  1. Open database file
  2. Load schema and indexes
  3. Execute queries
  4. Sync data on drop

Concurrency: Single process with multi-threaded internal handling

Pattern 2: Server Mode

Use Case: Network accessible database

Terminal window
# Start server
heliosdb-lite start --port 5432 --data ./mydb
# Client connection (any language)
Connection -> PostgreSQL wire protocol -> Server -> Database

Lifecycle:

  1. Start server process
  2. Accept client connections
  3. Maintain connection pool
  4. Replicate transactions to disk

Concurrency: Multiple client connections (configurable max)

Pattern 3: Vector Search Integration

Use Case: Semantic search in applications

-- Storage
CREATE TABLE documents (
id SERIAL,
content TEXT,
embedding VECTOR(1536)
);
CREATE INDEX idx_embedding ON documents
USING hnsw (embedding vector_cosine_ops);
-- Insert with embedding
INSERT INTO documents VALUES
(1, 'text', array[0.1, 0.2, ...]);
-- Search
SELECT id, content,
embedding <=> ARRAY[0.15, 0.25, ...] AS distance
FROM documents
ORDER BY distance
LIMIT 10;

Pattern 4: Time-Travel Queries

Use Case: Historical data analysis

-- Current data
SELECT * FROM table;
-- Historical data
SELECT * FROM table AS OF TIMESTAMP '2025-01-01 00:00:00';
-- Audit trail
SELECT id, action, timestamp
FROM audit_log
WHERE occurred_at >= '2025-01-01';

9. Compatibility Matrix

PostgreSQL Feature Support

FeatureSupportNotes
SQL (DML/DDL)95%+Full support for common operations
TransactionsMVCC, snapshot isolation
IndexesB-tree, HNSW
Extensions⚠️No PostGIS, pgvector
Replication⚠️Experimental
Procedures⚠️Limited
TriggersNot supported
Foreign DataNot supported

ORM Compatibility

FrameworkLanguageStatus
SQLAlchemyPython✅ Full
DieselRust✅ Full
PrismaTypeScript✅ Full
SQLxRust✅ Full
Entity Framework.NET✅ Full

10. Operational Parameters

Health Checks

-- Basic connectivity
SELECT 1;
-- Database availability
SELECT pg_database_size(current_database());
-- Index health
SELECT * FROM pg_stat_user_indexes;
-- Transaction activity
SELECT * FROM pg_stat_activity;

Maintenance Operations

-- VACUUM (cleanup dead tuples)
VACUUM ANALYZE;
-- REINDEX (rebuild indexes)
REINDEX DATABASE;
-- Backup
BACKUP TO 's3://bucket/backup.sql';
-- Restore
RESTORE FROM 's3://bucket/backup.sql';

Diagnostic Queries

-- Slow queries (>1 second)
SELECT query, mean_time FROM pg_stat_statements
WHERE mean_time > 1000
ORDER BY mean_time DESC;
-- Table bloat
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Connection status
SELECT * FROM pg_stat_activity;

11. SDK Usage Patterns

Python

import heliosdb_client
# Connect
db = heliosdb_client.Client(
base_url="http://localhost:8080",
api_key="key"
)
# Query
result = await db.query("SELECT * FROM users WHERE id = $1", [1])
# Vector search
results = await db.vector_search("documents", "hello world", top_k=5)
# Memory
memory = db.memory("session_id")
await memory.add("user", "Hello")

TypeScript

import { HeliosDBClient } from 'heliosdb-client';
const db = new HeliosDBClient({
baseUrl: 'http://localhost:8080',
apiKey: 'key'
});
const result = await db.query(
'SELECT * FROM users WHERE id = $1',
[1]
);
const vectors = await db.vectorSearch(
'documents',
'hello world',
{ topK: 5 }
);

Go

import "github.com/heliosdb/go-client"
db := heliosdb.NewClient(
"http://localhost:8080",
"api-key",
)
result, err := db.Query(ctx, "SELECT * FROM users", []interface{}{})
vectors, err := db.VectorSearch(ctx, "documents", "hello", &heliosdb.SearchOptions{
TopK: 5,
})

12. Schema Examples

Example 1: E-commerce

CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE,
name TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
price DECIMAL(10,2),
embedding VECTOR(1536)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers,
total DECIMAL(10,2),
status TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Indexes
CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_order_customer ON orders(customer_id);
CREATE INDEX idx_product_embedding ON products
USING hnsw (embedding vector_cosine_ops);
-- Materialized view
CREATE MATERIALIZED VIEW customer_stats AS
SELECT customer_id, COUNT(*) as order_count, SUM(total) as lifetime_value
FROM orders GROUP BY customer_id;

Example 2: Time-Series Analytics

CREATE TABLE metrics (
id SERIAL PRIMARY KEY,
sensor_id INT,
metric_name TEXT,
value FLOAT8,
timestamp TIMESTAMP DEFAULT NOW(),
metadata JSONB
);
CREATE INDEX idx_sensor_time ON metrics(sensor_id, timestamp DESC);
CREATE MATERIALIZED VIEW hourly_avg AS
SELECT
sensor_id,
metric_name,
DATE_TRUNC('hour', timestamp) as hour,
AVG(value) as avg_value,
MAX(value) as max_value,
MIN(value) as min_value
FROM metrics
GROUP BY sensor_id, metric_name, DATE_TRUNC('hour', timestamp);
ALTER MATERIALIZED VIEW hourly_avg SET (
auto_refresh = true,
staleness_threshold_sec = 300
);

13. Troubleshooting Guide for Automation

Connectivity Issues

Problem: Cannot connect to database

Diagnostic Steps:

-- 1. Test local connectivity
SELECT 1;
-- 2. Check server status
SELECT version();
-- 3. Check active connections
SELECT count(*) FROM pg_stat_activity;
-- 4. Check configuration
SHOW listen_addresses;
SHOW port;

Performance Issues

Problem: Slow queries

Analysis:

-- Enable query logging
SET log_min_duration_statement = 1000;
-- Analyze query plan
EXPLAIN ANALYZE SELECT ...;
-- Check statistics
ANALYZE table_name;
-- Check missing indexes
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

Resource Issues

Problem: Out of memory or disk space

Diagnosis:

-- Database size
SELECT pg_database_size(current_database());
-- Table sizes
SELECT tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Clear old data
DELETE FROM table_name WHERE created_at < '2024-01-01';
VACUUM ANALYZE;

References