Skip to content

HeliosDB Edge Deployment Guide

HeliosDB Edge Deployment Guide

Version: 1.0 Last Updated: 2025-11-30 Status: Complete


Table of Contents

  1. Overview
  2. Edge Architecture
  3. Installation & Setup
  4. Embedded Database
  5. Cloud Synchronization
  6. Edge AI Processing
  7. Data Management
  8. Performance Tuning
  9. Best Practices
  10. Troubleshooting

Overview

HeliosDB Edge provides a unified embedded+cloud database system for IoT, mobile, and edge computing scenarios. Run DuckDB-compatible local databases on devices, with seamless synchronization to HeliosDB Cloud.

Key Use Cases

  • IoT Data Collection - Collect sensor data locally, sync to cloud
  • Mobile Applications - Offline-first mobile apps with cloud backup
  • Remote Monitoring - Connected devices with local processing
  • Edge AI - ML inference on edge with cloud training
  • Disconnected Operations - Work offline, sync when connected
  • Hybrid Deployment - Local + cloud for compliance and performance

Architecture Benefits

AspectBenefit
PerformanceNo network latency for local queries
AvailabilityWorks offline, survives network interruptions
ScalabilityDistribute data to millions of edge devices
ComplianceKeep sensitive data local, sync metadata to cloud
CostReduce cloud compute for simple edge operations

Edge Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────┐
│ HeliosDB Cloud (Central) │
│ ├─ Primary Data Store │
│ ├─ Cloud Analytics │
│ ├─ Master Catalog │
│ └─ Sync Coordination │
└──────────────┬──────────────────────────────────────┘
┌──────┴──────┬───────────────┐
│ │ │
┌────▼─────┐ ┌────▼──────┐ ┌─────▼─────┐
│ Mobile │ │ IoT │ │ Gateway │
│ Edge │ │ Sensor │ │ Server │
│ Device │ │ Network │ │ (Local) │
└──────────┘ └───────────┘ └───────────┘

Edge Device Stack

Application Layer
HeliosDB Edge SDK
Embedded Database Engine (DuckDB-compatible)
Local Storage (RocksDB / SQLite)
Sync Manager
Network Transport
Cloud Connection

Installation & Setup

Prerequisites

  • Device Storage: 100MB minimum (varies by dataset)
  • Memory: 64MB for embedded database, 256MB+ recommended
  • Network: Periodic connectivity (can work offline)
  • OS: Linux, macOS, Windows, iOS, Android, RTOS

Step 1: Install HeliosDB Edge SDK

Python:

Terminal window
pip install heliosdb-edge

JavaScript/Node.js:

Terminal window
npm install @heliosdb/edge

Rust:

[dependencies]
heliosdb-edge = "7.0"

Go:

Terminal window
go get github.com/heliosdb/edge-go

Step 2: Initialize Edge Database

Python:

from heliosdb_edge import EdgeDatabase
# Create local database
db = EdgeDatabase(
name="local_data",
path="/data/heliosdb.db",
schema_sync_interval=3600, # Sync schema every hour
data_sync_interval=300 # Sync data every 5 minutes
)
# Connect to cloud
db.cloud_connect(
endpoint="heliosdb.cloud.example.com",
api_key="your-api-key",
device_id="device-001"
)

JavaScript:

const { EdgeDatabase } = require('@heliosdb/edge');
const db = new EdgeDatabase({
name: 'local_data',
path: '/data/heliosdb.db',
schemaSyncInterval: 3600,
dataSyncInterval: 300
});
await db.cloudConnect({
endpoint: 'heliosdb.cloud.example.com',
apiKey: 'your-api-key',
deviceId: 'device-001'
});

Rust:

use heliosdb_edge::EdgeDatabase;
let db = EdgeDatabase::new(
"local_data",
"/data/heliosdb.db",
Some(3600), // schema sync interval
Some(300) // data sync interval
)?;
db.cloud_connect(
"heliosdb.cloud.example.com",
"your-api-key",
"device-001"
).await?;

Step 3: Configure Sync Settings

-- Configure sync parameters
ALTER EDGE DATABASE SET (
sync_interval = 300, -- Sync every 5 minutes
batch_size = 1000, -- Sync 1000 rows per batch
compression = 'ZSTD', -- Use compression
conflict_resolution = 'CLOUD_WINS', -- Conflict strategy
selective_sync = true -- Only sync needed tables
);
-- Mark tables for cloud sync
ALTER TABLE sensor_data ENABLE CLOUD_SYNC;
ALTER TABLE metadata ENABLE CLOUD_SYNC WITH RETENTION = 30; -- Keep 30 days
-- Keep tables local only (don't sync)
ALTER TABLE cache DISABLE CLOUD_SYNC;

Embedded Database

Creating Tables

-- Create local tables
CREATE TABLE sensor_readings (
id SERIAL PRIMARY KEY,
device_id VARCHAR(50),
temperature DECIMAL(5,2),
humidity DECIMAL(5,2),
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create index for fast queries
CREATE INDEX idx_sensor_device_time ON sensor_readings(device_id, timestamp);

Local Queries

-- All queries work offline
SELECT AVG(temperature) as avg_temp
FROM sensor_readings
WHERE device_id = 'DEVICE-001'
AND timestamp >= NOW() - INTERVAL '1 hour';
-- Aggregate data locally
SELECT
DATE(timestamp) as date,
COUNT(*) as reading_count,
AVG(temperature) as avg_temp,
MAX(temperature) as max_temp,
MIN(temperature) as min_temp
FROM sensor_readings
GROUP BY DATE(timestamp)
ORDER BY date DESC;

Data Insertion During Offline

# Insert data while offline
db.execute("""
INSERT INTO sensor_readings (device_id, temperature, humidity)
VALUES (?, ?, ?)
""", ['DEVICE-001', 23.5, 45.2])
# Batch insert
for reading in readings_batch:
db.execute("""
INSERT INTO sensor_readings (device_id, temperature, humidity)
VALUES (?, ?, ?)
""", [reading['device'], reading['temp'], reading['humidity']])
# Commit locally
db.commit()
# Data will sync when network is available

Storage Management

# Check database size
size_mb = db.get_size()
print(f"Database size: {size_mb}MB")
# Cleanup old data
db.execute("""
DELETE FROM sensor_readings
WHERE timestamp < NOW() - INTERVAL '90 days'
""")
# Vacuum to reclaim space
db.vacuum()
# Set data retention policy
db.set_retention_policy(
table='sensor_readings',
retention_days=90,
cleanup_interval=3600 # Run cleanup hourly
)

Cloud Synchronization

Sync Strategies

1. Full Sync (Default)

# Sync all data to cloud
db.sync(mode='FULL')
# With progress tracking
def on_sync_progress(synced, total):
print(f"Synced {synced}/{total} records")
db.sync(mode='FULL', progress_callback=on_sync_progress)

2. Selective Sync

# Sync only specific tables
db.sync(
tables=['sensor_readings', 'device_status'],
mode='INCREMENTAL'
)
# Sync with conditions
db.sync(
tables={'sensor_readings': 'temperature > 25'},
mode='SELECTIVE'
)

3. Background Sync

# Automatic sync in background
db.start_background_sync(
interval=300, # Every 5 minutes
batch_size=1000, # 1000 rows per batch
network_aware=True # Pause if no connectivity
)
# Stop background sync
db.stop_background_sync()

Conflict Resolution

# Configure conflict resolution strategy
db.set_conflict_resolution(
strategy='CLOUD_WINS', # Cloud data takes precedence
timestamp_field='updated_at' # Use timestamp for resolution
)
# Or use custom resolution
def resolve_conflict(local_row, cloud_row):
# Keep the version with higher temperature reading
if local_row['temperature'] > cloud_row['temperature']:
return local_row
else:
return cloud_row
db.set_conflict_resolution(
strategy='CUSTOM',
resolver=resolve_conflict
)

Monitoring Sync Status

# Check sync status
status = db.get_sync_status()
print(f"Last sync: {status['last_sync_time']}")
print(f"Synced records: {status['synced_records']}")
print(f"Pending records: {status['pending_records']}")
print(f"Failed records: {status['failed_records']}")
# Get sync errors
errors = db.get_sync_errors()
for error in errors:
print(f"Table: {error['table']}, Error: {error['message']}")

Bandwidth Management

# Limit sync bandwidth
db.set_sync_parameters(
max_bandwidth_mbps=1.0, # 1 Mbps limit
compression='ZSTD', # Compress data
compression_level=6 # Balance speed/ratio
)
# Prioritize tables
db.set_sync_priority(
high=['critical_data'],
medium=['sensor_readings'],
low=['cache_data']
)
# Sync only when on WiFi
db.sync_when(
network_type='WIFI',
battery_level_min=20 # At least 20% battery
)

Edge AI Processing

Local ML Inference

from heliosdb_edge import MLModel
# Load model locally
model = MLModel.load('anomaly_detector.onnx')
# Run inference on edge data
results = db.execute("""
SELECT id, temperature, humidity FROM sensor_readings
WHERE timestamp > NOW() - INTERVAL '1 hour'
""")
for row in results:
prediction = model.predict([row['temperature'], row['humidity']])
if prediction['is_anomaly']:
# Insert alert locally
db.execute("""
INSERT INTO alerts (reading_id, alert_type, severity)
VALUES (?, ?, ?)
""", [row['id'], 'ANOMALY', 'HIGH'])

Model Updates from Cloud

# Check for model updates periodically
def check_model_updates():
updates = db.cloud.get_model_updates(
models=['anomaly_detector', 'forecast_model'],
since_version='7.0'
)
for model_info in updates:
if model_info['version'] > current_version:
# Download and verify
db.download_model(
name=model_info['name'],
version=model_info['version'],
verify_signature=True
)
# Run check hourly
db.schedule_task(check_model_updates, interval=3600)

Training Data Collection

# Collect data for cloud training
db.execute("""
INSERT INTO training_data
SELECT id, temperature, humidity, actual_category
FROM sensor_readings
WHERE timestamp > NOW() - INTERVAL '7 days'
AND labeled = true
""")
# Send training data to cloud
db.cloud.upload_training_data(
table='training_data',
model_name='anomaly_detector',
max_rows=10000
)

Data Management

Backup & Recovery

# Create local backup
backup_file = db.backup(
path='/backups/heliosdb_backup.zip',
compression=True,
timestamp=True
)
# Restore from backup
db.restore(backup_file)
# Scheduled backups
db.schedule_backup(
path='/backups',
interval=86400, # Daily
retention_days=30
)

Offline-First Development

# Enable offline mode
db.set_mode('OFFLINE')
# All operations work locally
db.execute("INSERT INTO events ...")
db.execute("UPDATE metrics ...")
db.commit()
# When network returns, sync automatically
db.set_mode('AUTO') # Sync when connection available

Data Validation

# Validate data before sync
db.validate_sync_data(
table='sensor_readings',
rules=[
'temperature BETWEEN -50 AND 150',
'humidity BETWEEN 0 AND 100',
'timestamp NOT NULL'
]
)
# Quarantine invalid records
db.validate_and_quarantine(
table='sensor_readings',
quarantine_table='sensor_readings_invalid'
)

Performance Tuning

Optimize Storage

-- Compress data
ALTER TABLE sensor_readings SET (
compression = 'ZSTD',
compression_level = 6
);
-- Remove old data
DELETE FROM sensor_readings
WHERE timestamp < NOW() - INTERVAL '90 days';
-- Vacuum
VACUUM ANALYZE sensor_readings;

Query Optimization

-- Use indexes
CREATE INDEX idx_device_time ON sensor_readings(device_id, timestamp);
-- Aggregate locally
SELECT
device_id,
DATE(timestamp) as date,
AVG(temperature) as avg_temp
FROM sensor_readings
WHERE timestamp >= NOW() - INTERVAL '30 days'
GROUP BY device_id, DATE(timestamp);

Memory Management

# Configure memory limits
db.set_memory_limit(
max_memory_mb=256, # Max 256MB
buffer_pool_size_mb=128 # Cache size
)
# Monitor memory usage
stats = db.get_memory_stats()
print(f"Memory used: {stats['used_mb']}MB")
print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")

Best Practices

1. Design for Offline

# GOOD: Design queries that work offline
def get_local_summary():
return db.execute("""
SELECT COUNT(*) as count, AVG(value) as avg
FROM local_data
WHERE timestamp > NOW() - INTERVAL '24 hours'
""")
# ❌ BAD: Depending on cloud data
def get_summary():
return cloud_db.execute("SELECT * FROM large_table")

2. Manage Data Volume

# Implement retention policies
db.set_retention_policy(
table='sensor_readings',
retention_days=90,
aggregate_older_than_days=30, # Aggregate monthly after 30 days
delete_older_than_days=90 # Delete after 90 days
)

3. Secure Sync

# Enable encryption
db.set_sync_encryption(
enabled=True,
algorithm='AES-256-GCM',
key_derivation='ARGON2ID'
)
# Verify cloud certificate
db.set_cloud_cert_verification(
verify=True,
ca_cert_path='/etc/ssl/certs/ca-bundle.crt'
)

4. Monitor Health

# Setup health monitoring
db.start_health_monitor(
check_interval=300,
metrics=['disk_usage', 'memory_usage', 'sync_status']
)
# Define alerts
db.set_alert_threshold(
metric='disk_usage_pct',
warning=80,
critical=95
)

Troubleshooting

Issue 1: Sync Failures

Symptoms:

  • Data not syncing to cloud
  • Pending records stuck

Solution:

# Check network connectivity
if db.is_connected():
print("Connected to cloud")
else:
print("No connection, will retry when available")
# Force sync
db.sync(force=True)
# Check sync errors
errors = db.get_sync_errors()
for error in errors:
print(f"Error: {error['message']}")
db.retry_sync_record(error['id'])

Issue 2: Storage Full

Symptoms:

  • Sync stops
  • Queries fail with “disk full” errors

Solution:

# Check disk usage
usage = db.get_disk_usage()
print(f"Used: {usage['used_mb']}MB, Free: {usage['free_mb']}MB")
# Compress old data
db.execute("""
DELETE FROM sensor_readings
WHERE timestamp < NOW() - INTERVAL '60 days'
""")
db.vacuum()
# Or increase storage
db.migrate_storage('/data/larger_location')

Issue 3: High Memory Usage

Symptoms:

  • Application crashes
  • Out of memory errors

Solution:

# Reduce cache size
db.set_memory_limit(max_memory_mb=128)
# Reduce batch sync size
db.set_sync_parameters(batch_size=100) # Smaller batches
# Monitor and debug
stats = db.get_memory_stats()
print(f"Peak memory: {stats['peak_mb']}MB")

Summary

HeliosDB Edge enables:

  • Offline-First Applications - Full database on device, sync when available
  • IoT Data Collection - Collect data locally, aggregate and sync
  • Edge AI - Local ML inference with cloud training
  • Hybrid Deployment - Combine edge and cloud for optimal performance

Start with a simple offline-first setup, then add cloud sync and edge AI as needed.


Related Documentation: