HeliosDB Edge Deployment Guide

Version: 1.0 Last Updated: 2025-11-30 Status: Complete

Overview
Edge Architecture
Installation & Setup
Embedded Database
Cloud Synchronization
Edge AI Processing
Data Management
Performance Tuning
Best Practices
Troubleshooting

Overview

HeliosDB Edge provides a unified embedded+cloud database system for IoT, mobile, and edge computing scenarios. Run DuckDB-compatible local databases on devices, with seamless synchronization to HeliosDB Cloud.

Key Use Cases

IoT Data Collection - Collect sensor data locally, sync to cloud
Mobile Applications - Offline-first mobile apps with cloud backup
Remote Monitoring - Connected devices with local processing
Edge AI - ML inference on edge with cloud training
Disconnected Operations - Work offline, sync when connected
Hybrid Deployment - Local + cloud for compliance and performance

Architecture Benefits

Aspect	Benefit
Performance	No network latency for local queries
Availability	Works offline, survives network interruptions
Scalability	Distribute data to millions of edge devices
Compliance	Keep sensitive data local, sync metadata to cloud
Cost	Reduce cloud compute for simple edge operations

Edge Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────┐
│           HeliosDB Cloud (Central)                  │
│  ├─ Primary Data Store                             │
│  ├─ Cloud Analytics                                │
│  ├─ Master Catalog                                 │
│  └─ Sync Coordination                              │
└──────────────┬──────────────────────────────────────┘
               │
        ┌──────┴──────┬───────────────┐
        │             │               │
   ┌────▼─────┐ ┌────▼──────┐ ┌─────▼─────┐
   │  Mobile  │ │   IoT     │ │  Gateway  │
   │   Edge   │ │   Sensor  │ │   Server  │
   │  Device  │ │   Network │ │   (Local) │
   └──────────┘ └───────────┘ └───────────┘

Edge Device Stack

Application Layer
    ↓
HeliosDB Edge SDK
    ↓
Embedded Database Engine (DuckDB-compatible)
    ↓
Local Storage (RocksDB / SQLite)
    ↓
Sync Manager
    ↓
Network Transport
    ↓
Cloud Connection

Installation & Setup

Prerequisites

Device Storage: 100MB minimum (varies by dataset)
Memory: 64MB for embedded database, 256MB+ recommended
Network: Periodic connectivity (can work offline)
OS: Linux, macOS, Windows, iOS, Android, RTOS

Step 1: Install HeliosDB Edge SDK

Python:

pip install heliosdb-edge

JavaScript/Node.js:

npm install @heliosdb/edge

Rust:

[dependencies]
heliosdb-edge = "7.0"

Go:

go get github.com/heliosdb/edge-go

Step 2: Initialize Edge Database

Python:

from heliosdb_edge import EdgeDatabase

# Create local database
db = EdgeDatabase(
    name="local_data",
    path="/data/heliosdb.db",
    schema_sync_interval=3600,  # Sync schema every hour
    data_sync_interval=300      # Sync data every 5 minutes
)

# Connect to cloud
db.cloud_connect(
    endpoint="heliosdb.cloud.example.com",
    api_key="your-api-key",
    device_id="device-001"
)

JavaScript:

const { EdgeDatabase } = require('@heliosdb/edge');

const db = new EdgeDatabase({
  name: 'local_data',
  path: '/data/heliosdb.db',
  schemaSyncInterval: 3600,
  dataSyncInterval: 300
});

await db.cloudConnect({
  endpoint: 'heliosdb.cloud.example.com',
  apiKey: 'your-api-key',
  deviceId: 'device-001'
});

Rust:

use heliosdb_edge::EdgeDatabase;

let db = EdgeDatabase::new(
    "local_data",
    "/data/heliosdb.db",
    Some(3600),  // schema sync interval
    Some(300)    // data sync interval
)?;

db.cloud_connect(
    "heliosdb.cloud.example.com",
    "your-api-key",
    "device-001"
).await?;

Step 3: Configure Sync Settings

-- Configure sync parameters
ALTER EDGE DATABASE SET (
    sync_interval = 300,           -- Sync every 5 minutes
    batch_size = 1000,             -- Sync 1000 rows per batch
    compression = 'ZSTD',          -- Use compression
    conflict_resolution = 'CLOUD_WINS',  -- Conflict strategy
    selective_sync = true          -- Only sync needed tables
);

-- Mark tables for cloud sync
ALTER TABLE sensor_data ENABLE CLOUD_SYNC;
ALTER TABLE metadata ENABLE CLOUD_SYNC WITH RETENTION = 30;  -- Keep 30 days

-- Keep tables local only (don't sync)
ALTER TABLE cache DISABLE CLOUD_SYNC;

Embedded Database

Creating Tables

-- Create local tables
CREATE TABLE sensor_readings (
    id SERIAL PRIMARY KEY,
    device_id VARCHAR(50),
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create index for fast queries
CREATE INDEX idx_sensor_device_time ON sensor_readings(device_id, timestamp);

Local Queries

-- All queries work offline
SELECT AVG(temperature) as avg_temp
FROM sensor_readings
WHERE device_id = 'DEVICE-001'
AND timestamp >= NOW() - INTERVAL '1 hour';

-- Aggregate data locally
SELECT
    DATE(timestamp) as date,
    COUNT(*) as reading_count,
    AVG(temperature) as avg_temp,
    MAX(temperature) as max_temp,
    MIN(temperature) as min_temp
FROM sensor_readings
GROUP BY DATE(timestamp)
ORDER BY date DESC;

Data Insertion During Offline

# Insert data while offline
db.execute("""
    INSERT INTO sensor_readings (device_id, temperature, humidity)
    VALUES (?, ?, ?)
""", ['DEVICE-001', 23.5, 45.2])

# Batch insert
for reading in readings_batch:
    db.execute("""
        INSERT INTO sensor_readings (device_id, temperature, humidity)
        VALUES (?, ?, ?)
    """, [reading['device'], reading['temp'], reading['humidity']])

# Commit locally
db.commit()

# Data will sync when network is available

Storage Management

# Check database size
size_mb = db.get_size()
print(f"Database size: {size_mb}MB")

# Cleanup old data
db.execute("""
    DELETE FROM sensor_readings
    WHERE timestamp < NOW() - INTERVAL '90 days'
""")

# Vacuum to reclaim space
db.vacuum()

# Set data retention policy
db.set_retention_policy(
    table='sensor_readings',
    retention_days=90,
    cleanup_interval=3600  # Run cleanup hourly
)

Cloud Synchronization

Sync Strategies

1. Full Sync (Default)

# Sync all data to cloud
db.sync(mode='FULL')

# With progress tracking
def on_sync_progress(synced, total):
    print(f"Synced {synced}/{total} records")

db.sync(mode='FULL', progress_callback=on_sync_progress)

2. Selective Sync

# Sync only specific tables
db.sync(
    tables=['sensor_readings', 'device_status'],
    mode='INCREMENTAL'
)

# Sync with conditions
db.sync(
    tables={'sensor_readings': 'temperature > 25'},
    mode='SELECTIVE'
)

3. Background Sync

# Automatic sync in background
db.start_background_sync(
    interval=300,           # Every 5 minutes
    batch_size=1000,       # 1000 rows per batch
    network_aware=True     # Pause if no connectivity
)

# Stop background sync
db.stop_background_sync()

Conflict Resolution

# Configure conflict resolution strategy
db.set_conflict_resolution(
    strategy='CLOUD_WINS',  # Cloud data takes precedence
    timestamp_field='updated_at'  # Use timestamp for resolution
)

# Or use custom resolution
def resolve_conflict(local_row, cloud_row):
    # Keep the version with higher temperature reading
    if local_row['temperature'] > cloud_row['temperature']:
        return local_row
    else:
        return cloud_row

db.set_conflict_resolution(
    strategy='CUSTOM',
    resolver=resolve_conflict
)

Monitoring Sync Status

# Check sync status
status = db.get_sync_status()
print(f"Last sync: {status['last_sync_time']}")
print(f"Synced records: {status['synced_records']}")
print(f"Pending records: {status['pending_records']}")
print(f"Failed records: {status['failed_records']}")

# Get sync errors
errors = db.get_sync_errors()
for error in errors:
    print(f"Table: {error['table']}, Error: {error['message']}")

Bandwidth Management

# Limit sync bandwidth
db.set_sync_parameters(
    max_bandwidth_mbps=1.0,       # 1 Mbps limit
    compression='ZSTD',           # Compress data
    compression_level=6           # Balance speed/ratio
)

# Prioritize tables
db.set_sync_priority(
    high=['critical_data'],
    medium=['sensor_readings'],
    low=['cache_data']
)

# Sync only when on WiFi
db.sync_when(
    network_type='WIFI',
    battery_level_min=20  # At least 20% battery
)

Edge AI Processing

Local ML Inference

from heliosdb_edge import MLModel

# Load model locally
model = MLModel.load('anomaly_detector.onnx')

# Run inference on edge data
results = db.execute("""
    SELECT id, temperature, humidity FROM sensor_readings
    WHERE timestamp > NOW() - INTERVAL '1 hour'
""")

for row in results:
    prediction = model.predict([row['temperature'], row['humidity']])
    if prediction['is_anomaly']:
        # Insert alert locally
        db.execute("""
            INSERT INTO alerts (reading_id, alert_type, severity)
            VALUES (?, ?, ?)
        """, [row['id'], 'ANOMALY', 'HIGH'])

Model Updates from Cloud

# Check for model updates periodically
def check_model_updates():
    updates = db.cloud.get_model_updates(
        models=['anomaly_detector', 'forecast_model'],
        since_version='7.0'
    )

    for model_info in updates:
        if model_info['version'] > current_version:
            # Download and verify
            db.download_model(
                name=model_info['name'],
                version=model_info['version'],
                verify_signature=True
            )

# Run check hourly
db.schedule_task(check_model_updates, interval=3600)

Training Data Collection

# Collect data for cloud training
db.execute("""
    INSERT INTO training_data
    SELECT id, temperature, humidity, actual_category
    FROM sensor_readings
    WHERE timestamp > NOW() - INTERVAL '7 days'
    AND labeled = true
""")

# Send training data to cloud
db.cloud.upload_training_data(
    table='training_data',
    model_name='anomaly_detector',
    max_rows=10000
)

Data Management

Backup & Recovery

# Create local backup
backup_file = db.backup(
    path='/backups/heliosdb_backup.zip',
    compression=True,
    timestamp=True
)

# Restore from backup
db.restore(backup_file)

# Scheduled backups
db.schedule_backup(
    path='/backups',
    interval=86400,  # Daily
    retention_days=30
)

Offline-First Development

# Enable offline mode
db.set_mode('OFFLINE')

# All operations work locally
db.execute("INSERT INTO events ...")
db.execute("UPDATE metrics ...")
db.commit()

# When network returns, sync automatically
db.set_mode('AUTO')  # Sync when connection available

Data Validation

# Validate data before sync
db.validate_sync_data(
    table='sensor_readings',
    rules=[
        'temperature BETWEEN -50 AND 150',
        'humidity BETWEEN 0 AND 100',
        'timestamp NOT NULL'
    ]
)

# Quarantine invalid records
db.validate_and_quarantine(
    table='sensor_readings',
    quarantine_table='sensor_readings_invalid'
)

Performance Tuning

Optimize Storage

-- Compress data
ALTER TABLE sensor_readings SET (
    compression = 'ZSTD',
    compression_level = 6
);

-- Remove old data
DELETE FROM sensor_readings
WHERE timestamp < NOW() - INTERVAL '90 days';

-- Vacuum
VACUUM ANALYZE sensor_readings;

Query Optimization

-- Use indexes
CREATE INDEX idx_device_time ON sensor_readings(device_id, timestamp);

-- Aggregate locally
SELECT
    device_id,
    DATE(timestamp) as date,
    AVG(temperature) as avg_temp
FROM sensor_readings
WHERE timestamp >= NOW() - INTERVAL '30 days'
GROUP BY device_id, DATE(timestamp);

Memory Management

# Configure memory limits
db.set_memory_limit(
    max_memory_mb=256,      # Max 256MB
    buffer_pool_size_mb=128  # Cache size
)

# Monitor memory usage
stats = db.get_memory_stats()
print(f"Memory used: {stats['used_mb']}MB")
print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")

Best Practices

1. Design for Offline

#  GOOD: Design queries that work offline
def get_local_summary():
    return db.execute("""
        SELECT COUNT(*) as count, AVG(value) as avg
        FROM local_data
        WHERE timestamp > NOW() - INTERVAL '24 hours'
    """)

# ❌ BAD: Depending on cloud data
def get_summary():
    return cloud_db.execute("SELECT * FROM large_table")

2. Manage Data Volume

# Implement retention policies
db.set_retention_policy(
    table='sensor_readings',
    retention_days=90,
    aggregate_older_than_days=30,  # Aggregate monthly after 30 days
    delete_older_than_days=90      # Delete after 90 days
)

3. Secure Sync

# Enable encryption
db.set_sync_encryption(
    enabled=True,
    algorithm='AES-256-GCM',
    key_derivation='ARGON2ID'
)

# Verify cloud certificate
db.set_cloud_cert_verification(
    verify=True,
    ca_cert_path='/etc/ssl/certs/ca-bundle.crt'
)

4. Monitor Health

# Setup health monitoring
db.start_health_monitor(
    check_interval=300,
    metrics=['disk_usage', 'memory_usage', 'sync_status']
)

# Define alerts
db.set_alert_threshold(
    metric='disk_usage_pct',
    warning=80,
    critical=95
)

Troubleshooting

Issue 1: Sync Failures

Symptoms:

Data not syncing to cloud
Pending records stuck

Solution:

# Check network connectivity
if db.is_connected():
    print("Connected to cloud")
else:
    print("No connection, will retry when available")

# Force sync
db.sync(force=True)

# Check sync errors
errors = db.get_sync_errors()
for error in errors:
    print(f"Error: {error['message']}")
    db.retry_sync_record(error['id'])

Issue 2: Storage Full

Symptoms:

Sync stops
Queries fail with “disk full” errors

Solution:

# Check disk usage
usage = db.get_disk_usage()
print(f"Used: {usage['used_mb']}MB, Free: {usage['free_mb']}MB")

# Compress old data
db.execute("""
    DELETE FROM sensor_readings
    WHERE timestamp < NOW() - INTERVAL '60 days'
""")
db.vacuum()

# Or increase storage
db.migrate_storage('/data/larger_location')

Issue 3: High Memory Usage

Symptoms:

Application crashes
Out of memory errors

Solution:

# Reduce cache size
db.set_memory_limit(max_memory_mb=128)

# Reduce batch sync size
db.set_sync_parameters(batch_size=100)  # Smaller batches

# Monitor and debug
stats = db.get_memory_stats()
print(f"Peak memory: {stats['peak_mb']}MB")

Summary

HeliosDB Edge enables:

Offline-First Applications - Full database on device, sync when available
IoT Data Collection - Collect data locally, aggregate and sync
Edge AI - Local ML inference with cloud training
Hybrid Deployment - Combine edge and cloud for optimal performance

Start with a simple offline-first setup, then add cloud sync and edge AI as needed.

Related Documentation:

HeliosDB Edge Deployment Guide

HeliosDB Edge Deployment Guide

Table of Contents

Overview

Key Use Cases

Architecture Benefits

Edge Architecture

High-Level Architecture

Edge Device Stack

Installation & Setup

Prerequisites

Step 1: Install HeliosDB Edge SDK

Step 2: Initialize Edge Database

Step 3: Configure Sync Settings

Embedded Database

Creating Tables

Local Queries

Data Insertion During Offline

Storage Management

Cloud Synchronization

Sync Strategies

1. Full Sync (Default)

2. Selective Sync

3. Background Sync

Conflict Resolution

Monitoring Sync Status

Bandwidth Management

Edge AI Processing

Local ML Inference

Model Updates from Cloud

Training Data Collection

Data Management

Backup & Recovery

Offline-First Development

Data Validation

Performance Tuning

Optimize Storage

Query Optimization

Memory Management

Best Practices

1. Design for Offline

2. Manage Data Volume

3. Secure Sync

4. Monitor Health

Troubleshooting

Issue 1: Sync Failures

Issue 2: Storage Full

Issue 3: High Memory Usage

Summary