HeliosDB Edge Deployment Guide
HeliosDB Edge Deployment Guide
Version: 1.0 Last Updated: 2025-11-30 Status: Complete
Table of Contents
- Overview
- Edge Architecture
- Installation & Setup
- Embedded Database
- Cloud Synchronization
- Edge AI Processing
- Data Management
- Performance Tuning
- Best Practices
- Troubleshooting
Overview
HeliosDB Edge provides a unified embedded+cloud database system for IoT, mobile, and edge computing scenarios. Run DuckDB-compatible local databases on devices, with seamless synchronization to HeliosDB Cloud.
Key Use Cases
- IoT Data Collection - Collect sensor data locally, sync to cloud
- Mobile Applications - Offline-first mobile apps with cloud backup
- Remote Monitoring - Connected devices with local processing
- Edge AI - ML inference on edge with cloud training
- Disconnected Operations - Work offline, sync when connected
- Hybrid Deployment - Local + cloud for compliance and performance
Architecture Benefits
| Aspect | Benefit |
|---|---|
| Performance | No network latency for local queries |
| Availability | Works offline, survives network interruptions |
| Scalability | Distribute data to millions of edge devices |
| Compliance | Keep sensitive data local, sync metadata to cloud |
| Cost | Reduce cloud compute for simple edge operations |
Edge Architecture
High-Level Architecture
┌─────────────────────────────────────────────────────┐│ HeliosDB Cloud (Central) ││ ├─ Primary Data Store ││ ├─ Cloud Analytics ││ ├─ Master Catalog ││ └─ Sync Coordination │└──────────────┬──────────────────────────────────────┘ │ ┌──────┴──────┬───────────────┐ │ │ │ ┌────▼─────┐ ┌────▼──────┐ ┌─────▼─────┐ │ Mobile │ │ IoT │ │ Gateway │ │ Edge │ │ Sensor │ │ Server │ │ Device │ │ Network │ │ (Local) │ └──────────┘ └───────────┘ └───────────┘Edge Device Stack
Application Layer ↓HeliosDB Edge SDK ↓Embedded Database Engine (DuckDB-compatible) ↓Local Storage (RocksDB / SQLite) ↓Sync Manager ↓Network Transport ↓Cloud ConnectionInstallation & Setup
Prerequisites
- Device Storage: 100MB minimum (varies by dataset)
- Memory: 64MB for embedded database, 256MB+ recommended
- Network: Periodic connectivity (can work offline)
- OS: Linux, macOS, Windows, iOS, Android, RTOS
Step 1: Install HeliosDB Edge SDK
Python:
pip install heliosdb-edgeJavaScript/Node.js:
npm install @heliosdb/edgeRust:
[dependencies]heliosdb-edge = "7.0"Go:
go get github.com/heliosdb/edge-goStep 2: Initialize Edge Database
Python:
from heliosdb_edge import EdgeDatabase
# Create local databasedb = EdgeDatabase( name="local_data", path="/data/heliosdb.db", schema_sync_interval=3600, # Sync schema every hour data_sync_interval=300 # Sync data every 5 minutes)
# Connect to clouddb.cloud_connect( endpoint="heliosdb.cloud.example.com", api_key="your-api-key", device_id="device-001")JavaScript:
const { EdgeDatabase } = require('@heliosdb/edge');
const db = new EdgeDatabase({ name: 'local_data', path: '/data/heliosdb.db', schemaSyncInterval: 3600, dataSyncInterval: 300});
await db.cloudConnect({ endpoint: 'heliosdb.cloud.example.com', apiKey: 'your-api-key', deviceId: 'device-001'});Rust:
use heliosdb_edge::EdgeDatabase;
let db = EdgeDatabase::new( "local_data", "/data/heliosdb.db", Some(3600), // schema sync interval Some(300) // data sync interval)?;
db.cloud_connect( "heliosdb.cloud.example.com", "your-api-key", "device-001").await?;Step 3: Configure Sync Settings
-- Configure sync parametersALTER EDGE DATABASE SET ( sync_interval = 300, -- Sync every 5 minutes batch_size = 1000, -- Sync 1000 rows per batch compression = 'ZSTD', -- Use compression conflict_resolution = 'CLOUD_WINS', -- Conflict strategy selective_sync = true -- Only sync needed tables);
-- Mark tables for cloud syncALTER TABLE sensor_data ENABLE CLOUD_SYNC;ALTER TABLE metadata ENABLE CLOUD_SYNC WITH RETENTION = 30; -- Keep 30 days
-- Keep tables local only (don't sync)ALTER TABLE cache DISABLE CLOUD_SYNC;Embedded Database
Creating Tables
-- Create local tablesCREATE TABLE sensor_readings ( id SERIAL PRIMARY KEY, device_id VARCHAR(50), temperature DECIMAL(5,2), humidity DECIMAL(5,2), timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
-- Create index for fast queriesCREATE INDEX idx_sensor_device_time ON sensor_readings(device_id, timestamp);Local Queries
-- All queries work offlineSELECT AVG(temperature) as avg_tempFROM sensor_readingsWHERE device_id = 'DEVICE-001'AND timestamp >= NOW() - INTERVAL '1 hour';
-- Aggregate data locallySELECT DATE(timestamp) as date, COUNT(*) as reading_count, AVG(temperature) as avg_temp, MAX(temperature) as max_temp, MIN(temperature) as min_tempFROM sensor_readingsGROUP BY DATE(timestamp)ORDER BY date DESC;Data Insertion During Offline
# Insert data while offlinedb.execute(""" INSERT INTO sensor_readings (device_id, temperature, humidity) VALUES (?, ?, ?)""", ['DEVICE-001', 23.5, 45.2])
# Batch insertfor reading in readings_batch: db.execute(""" INSERT INTO sensor_readings (device_id, temperature, humidity) VALUES (?, ?, ?) """, [reading['device'], reading['temp'], reading['humidity']])
# Commit locallydb.commit()
# Data will sync when network is availableStorage Management
# Check database sizesize_mb = db.get_size()print(f"Database size: {size_mb}MB")
# Cleanup old datadb.execute(""" DELETE FROM sensor_readings WHERE timestamp < NOW() - INTERVAL '90 days'""")
# Vacuum to reclaim spacedb.vacuum()
# Set data retention policydb.set_retention_policy( table='sensor_readings', retention_days=90, cleanup_interval=3600 # Run cleanup hourly)Cloud Synchronization
Sync Strategies
1. Full Sync (Default)
# Sync all data to clouddb.sync(mode='FULL')
# With progress trackingdef on_sync_progress(synced, total): print(f"Synced {synced}/{total} records")
db.sync(mode='FULL', progress_callback=on_sync_progress)2. Selective Sync
# Sync only specific tablesdb.sync( tables=['sensor_readings', 'device_status'], mode='INCREMENTAL')
# Sync with conditionsdb.sync( tables={'sensor_readings': 'temperature > 25'}, mode='SELECTIVE')3. Background Sync
# Automatic sync in backgrounddb.start_background_sync( interval=300, # Every 5 minutes batch_size=1000, # 1000 rows per batch network_aware=True # Pause if no connectivity)
# Stop background syncdb.stop_background_sync()Conflict Resolution
# Configure conflict resolution strategydb.set_conflict_resolution( strategy='CLOUD_WINS', # Cloud data takes precedence timestamp_field='updated_at' # Use timestamp for resolution)
# Or use custom resolutiondef resolve_conflict(local_row, cloud_row): # Keep the version with higher temperature reading if local_row['temperature'] > cloud_row['temperature']: return local_row else: return cloud_row
db.set_conflict_resolution( strategy='CUSTOM', resolver=resolve_conflict)Monitoring Sync Status
# Check sync statusstatus = db.get_sync_status()print(f"Last sync: {status['last_sync_time']}")print(f"Synced records: {status['synced_records']}")print(f"Pending records: {status['pending_records']}")print(f"Failed records: {status['failed_records']}")
# Get sync errorserrors = db.get_sync_errors()for error in errors: print(f"Table: {error['table']}, Error: {error['message']}")Bandwidth Management
# Limit sync bandwidthdb.set_sync_parameters( max_bandwidth_mbps=1.0, # 1 Mbps limit compression='ZSTD', # Compress data compression_level=6 # Balance speed/ratio)
# Prioritize tablesdb.set_sync_priority( high=['critical_data'], medium=['sensor_readings'], low=['cache_data'])
# Sync only when on WiFidb.sync_when( network_type='WIFI', battery_level_min=20 # At least 20% battery)Edge AI Processing
Local ML Inference
from heliosdb_edge import MLModel
# Load model locallymodel = MLModel.load('anomaly_detector.onnx')
# Run inference on edge dataresults = db.execute(""" SELECT id, temperature, humidity FROM sensor_readings WHERE timestamp > NOW() - INTERVAL '1 hour'""")
for row in results: prediction = model.predict([row['temperature'], row['humidity']]) if prediction['is_anomaly']: # Insert alert locally db.execute(""" INSERT INTO alerts (reading_id, alert_type, severity) VALUES (?, ?, ?) """, [row['id'], 'ANOMALY', 'HIGH'])Model Updates from Cloud
# Check for model updates periodicallydef check_model_updates(): updates = db.cloud.get_model_updates( models=['anomaly_detector', 'forecast_model'], since_version='7.0' )
for model_info in updates: if model_info['version'] > current_version: # Download and verify db.download_model( name=model_info['name'], version=model_info['version'], verify_signature=True )
# Run check hourlydb.schedule_task(check_model_updates, interval=3600)Training Data Collection
# Collect data for cloud trainingdb.execute(""" INSERT INTO training_data SELECT id, temperature, humidity, actual_category FROM sensor_readings WHERE timestamp > NOW() - INTERVAL '7 days' AND labeled = true""")
# Send training data to clouddb.cloud.upload_training_data( table='training_data', model_name='anomaly_detector', max_rows=10000)Data Management
Backup & Recovery
# Create local backupbackup_file = db.backup( path='/backups/heliosdb_backup.zip', compression=True, timestamp=True)
# Restore from backupdb.restore(backup_file)
# Scheduled backupsdb.schedule_backup( path='/backups', interval=86400, # Daily retention_days=30)Offline-First Development
# Enable offline modedb.set_mode('OFFLINE')
# All operations work locallydb.execute("INSERT INTO events ...")db.execute("UPDATE metrics ...")db.commit()
# When network returns, sync automaticallydb.set_mode('AUTO') # Sync when connection availableData Validation
# Validate data before syncdb.validate_sync_data( table='sensor_readings', rules=[ 'temperature BETWEEN -50 AND 150', 'humidity BETWEEN 0 AND 100', 'timestamp NOT NULL' ])
# Quarantine invalid recordsdb.validate_and_quarantine( table='sensor_readings', quarantine_table='sensor_readings_invalid')Performance Tuning
Optimize Storage
-- Compress dataALTER TABLE sensor_readings SET ( compression = 'ZSTD', compression_level = 6);
-- Remove old dataDELETE FROM sensor_readingsWHERE timestamp < NOW() - INTERVAL '90 days';
-- VacuumVACUUM ANALYZE sensor_readings;Query Optimization
-- Use indexesCREATE INDEX idx_device_time ON sensor_readings(device_id, timestamp);
-- Aggregate locallySELECT device_id, DATE(timestamp) as date, AVG(temperature) as avg_tempFROM sensor_readingsWHERE timestamp >= NOW() - INTERVAL '30 days'GROUP BY device_id, DATE(timestamp);Memory Management
# Configure memory limitsdb.set_memory_limit( max_memory_mb=256, # Max 256MB buffer_pool_size_mb=128 # Cache size)
# Monitor memory usagestats = db.get_memory_stats()print(f"Memory used: {stats['used_mb']}MB")print(f"Cache hit rate: {stats['cache_hit_rate']:.1%}")Best Practices
1. Design for Offline
# GOOD: Design queries that work offlinedef get_local_summary(): return db.execute(""" SELECT COUNT(*) as count, AVG(value) as avg FROM local_data WHERE timestamp > NOW() - INTERVAL '24 hours' """)
# ❌ BAD: Depending on cloud datadef get_summary(): return cloud_db.execute("SELECT * FROM large_table")2. Manage Data Volume
# Implement retention policiesdb.set_retention_policy( table='sensor_readings', retention_days=90, aggregate_older_than_days=30, # Aggregate monthly after 30 days delete_older_than_days=90 # Delete after 90 days)3. Secure Sync
# Enable encryptiondb.set_sync_encryption( enabled=True, algorithm='AES-256-GCM', key_derivation='ARGON2ID')
# Verify cloud certificatedb.set_cloud_cert_verification( verify=True, ca_cert_path='/etc/ssl/certs/ca-bundle.crt')4. Monitor Health
# Setup health monitoringdb.start_health_monitor( check_interval=300, metrics=['disk_usage', 'memory_usage', 'sync_status'])
# Define alertsdb.set_alert_threshold( metric='disk_usage_pct', warning=80, critical=95)Troubleshooting
Issue 1: Sync Failures
Symptoms:
- Data not syncing to cloud
- Pending records stuck
Solution:
# Check network connectivityif db.is_connected(): print("Connected to cloud")else: print("No connection, will retry when available")
# Force syncdb.sync(force=True)
# Check sync errorserrors = db.get_sync_errors()for error in errors: print(f"Error: {error['message']}") db.retry_sync_record(error['id'])Issue 2: Storage Full
Symptoms:
- Sync stops
- Queries fail with “disk full” errors
Solution:
# Check disk usageusage = db.get_disk_usage()print(f"Used: {usage['used_mb']}MB, Free: {usage['free_mb']}MB")
# Compress old datadb.execute(""" DELETE FROM sensor_readings WHERE timestamp < NOW() - INTERVAL '60 days'""")db.vacuum()
# Or increase storagedb.migrate_storage('/data/larger_location')Issue 3: High Memory Usage
Symptoms:
- Application crashes
- Out of memory errors
Solution:
# Reduce cache sizedb.set_memory_limit(max_memory_mb=128)
# Reduce batch sync sizedb.set_sync_parameters(batch_size=100) # Smaller batches
# Monitor and debugstats = db.get_memory_stats()print(f"Peak memory: {stats['peak_mb']}MB")Summary
HeliosDB Edge enables:
- Offline-First Applications - Full database on device, sync when available
- IoT Data Collection - Collect data locally, aggregate and sync
- Edge AI - Local ML inference with cloud training
- Hybrid Deployment - Combine edge and cloud for optimal performance
Start with a simple offline-first setup, then add cloud sync and edge AI as needed.
Related Documentation: