HeliosDB Nano Storage Modes Tutorial
HeliosDB Nano Storage Modes Tutorial
A hands-on guide to per-column storage optimization with real-world applications.
Prerequisites: HeliosDB Nano REPL running (./heliosdb-nano repl -m for in-memory mode)
Table of Contents
- Understanding Storage Modes
- Dictionary Encoding: E-Commerce Orders
- Content-Addressed Storage: Document Deduplication
- Columnar Storage: Time-Series Analytics
- Mixed Modes: Event Tracking System
- Migration Strategies
See also: Advanced Features Tutorial for EXPLAIN, Vector Search, and Materialized Views.
1. Understanding Storage Modes
What Are Storage Modes?
HeliosDB Nano supports four per-column storage strategies:
| Mode | Best For | Compression | Use Case |
|---|---|---|---|
DEFAULT | General data | RocksDB LZ4 (3-7x) | OLTP, mixed workloads |
DICTIONARY | Low-cardinality strings | 50-95% | Status codes, categories |
CONTENT_ADDRESSED | Large duplicate values | ~100% dedup | Templates, JSON payloads |
COLUMNAR | Numeric sequences | 20-50% + faster aggregates | Time-series, analytics |
Your First Storage Mode
CREATE TABLE demo (id INT PRIMARY KEY, category TEXT, value FLOAT8);
INSERT INTO demo VALUES (1, 'electronics', 99.99);INSERT INTO demo VALUES (2, 'electronics', 149.99);INSERT INTO demo VALUES (3, 'clothing', 29.99);INSERT INTO demo VALUES (4, 'electronics', 199.99);INSERT INTO demo VALUES (5, 'clothing', 49.99);
SELECT * FROM demo ORDER BY id;Output:
┌────┬─────────────┬────────┐│ id │ category │ value │├────┼─────────────┼────────┤│ 1 │ electronics │ 99.99 ││ 2 │ electronics │ 149.99 ││ 3 │ clothing │ 29.99 ││ 4 │ electronics │ 199.99 ││ 5 │ clothing │ 49.99 │└────┴─────────────┴────────┘Now migrate to dictionary encoding:
ALTER TABLE demo ALTER COLUMN category SET STORAGE DICTIONARY;
SELECT * FROM demo ORDER BY id;Data is preserved - queries work identically:
INSERT INTO demo VALUES (6, 'electronics', 79.99);INSERT INTO demo VALUES (7, 'home', 299.99);
SELECT category, COUNT(*), AVG(value) FROM demo GROUP BY category;Output:
┌─────────────┬───────┬────────┐│ group_0 │ agg_0 │ agg_1 │├─────────────┼───────┼────────┤│ clothing │ 2 │ 39.99 ││ electronics │ 4 │ 132.49 ││ home │ 1 │ 299.99 │└─────────────┴───────┴────────┘DROP TABLE demo;2. Dictionary Encoding: E-Commerce Orders
Scenario
E-commerce platform with order status and shipping country having limited unique values but appearing millions of times.
Create the Schema
CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, status TEXT STORAGE DICTIONARY, shipping_country TEXT STORAGE DICTIONARY, payment_method TEXT STORAGE DICTIONARY, order_total FLOAT8, created_at INT8);Populate Data
INSERT INTO orders VALUES (1001, 501, 'pending', 'USA', 'credit_card', 125.50, 1705320000);INSERT INTO orders VALUES (1002, 502, 'pending', 'USA', 'paypal', 89.99, 1705320100);INSERT INTO orders VALUES (1003, 503, 'shipped', 'Canada', 'credit_card', 245.00, 1705320200);INSERT INTO orders VALUES (1004, 501, 'delivered', 'USA', 'credit_card', 67.25, 1705320300);INSERT INTO orders VALUES (1005, 504, 'pending', 'UK', 'paypal', 189.99, 1705320400);INSERT INTO orders VALUES (1006, 505, 'shipped', 'Germany', 'credit_card', 312.50, 1705320500);INSERT INTO orders VALUES (1007, 502, 'delivered', 'USA', 'apple_pay', 45.00, 1705320600);INSERT INTO orders VALUES (1008, 506, 'cancelled', 'France', 'credit_card', 78.99, 1705320700);INSERT INTO orders VALUES (1009, 503, 'pending', 'Canada', 'paypal', 156.75, 1705320800);INSERT INTO orders VALUES (1010, 507, 'shipped', 'USA', 'credit_card', 423.00, 1705320900);Analyze Order Patterns
SELECT status, COUNT(*) as order_count, SUM(order_total) as total_revenueFROM ordersGROUP BY statusORDER BY order_count DESC;Output:
┌───────────┬───────┬────────┐│ group_0 │ agg_0 │ agg_1 │├───────────┼───────┼────────┤│ pending │ 4 │ 562.23 ││ shipped │ 3 │ 980.5 ││ delivered │ 2 │ 112.25 ││ cancelled │ 1 │ 78.99 │└───────────┴───────┴────────┘Customer Analysis with Joins
CREATE TABLE customers ( customer_id INT PRIMARY KEY, name TEXT, email TEXT, tier TEXT STORAGE DICTIONARY);
INSERT INTO customers VALUES (501, 'Alice Johnson', 'alice@email.com', 'gold');INSERT INTO customers VALUES (502, 'Bob Smith', 'bob@email.com', 'silver');INSERT INTO customers VALUES (503, 'Carol White', 'carol@email.com', 'gold');INSERT INTO customers VALUES (504, 'David Brown', 'david@email.com', 'bronze');INSERT INTO customers VALUES (505, 'Eva Martinez', 'eva@email.com', 'silver');
SELECT c.name, c.tier, COUNT(*) as total_orders, SUM(o.order_total) as lifetime_valueFROM customers cINNER JOIN orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.name, c.tierORDER BY lifetime_value DESC;Output:
┌───────────────┬────────┬───────┬────────┐│ group_1 │ group_2│ agg_0 │ agg_1 │├───────────────┼────────┼───────┼────────┤│ Carol White │ gold │ 2 │ 401.75 ││ Eva Martinez │ silver │ 1 │ 312.5 ││ Alice Johnson │ gold │ 2 │ 192.75 ││ David Brown │ bronze │ 1 │ 189.99 ││ Bob Smith │ silver │ 2 │ 134.99 │└───────────────┴────────┴───────┴────────┘Filter by Dictionary-Encoded Columns
SELECT o.order_id, c.name, o.status, o.order_totalFROM orders oINNER JOIN customers c ON o.customer_id = c.customer_idWHERE o.status = 'pending' AND o.shipping_country = 'USA'ORDER BY o.order_total DESC;DROP TABLE orders;DROP TABLE customers;3. Content-Addressed Storage: Document Deduplication
Scenario
Notification system where many users receive identical email templates. Store once, reference everywhere.
Create Tables
CREATE TABLE email_templates ( template_id INT PRIMARY KEY, template_name TEXT, html_content TEXT STORAGE CONTENT_ADDRESSED, category TEXT STORAGE DICTIONARY);
CREATE TABLE notifications ( notification_id INT PRIMARY KEY, user_id INT, template_id INT, sent_at INT8, status TEXT STORAGE DICTIONARY);Insert Templates
INSERT INTO email_templates VALUES ( 1, 'Welcome Email', '<html><head><style>body{font-family:Arial,sans-serif;margin:0;padding:20px;background:#f5f5f5}.container{max-width:600px;margin:0 auto;background:white;border-radius:8px;padding:40px}.header{text-align:center;border-bottom:2px solid #007bff;padding-bottom:20px}.title{color:#333;font-size:28px;margin:20px 0}.content{color:#666;line-height:1.6;padding:20px 0}.button{display:inline-block;background:#007bff;color:white;padding:12px 30px;text-decoration:none;border-radius:5px;margin:20px 0}.footer{text-align:center;color:#999;font-size:12px;padding-top:20px;border-top:1px solid #eee}</style></head><body><div class="container"><div class="header"><h1 class="title">Welcome!</h1></div></div></body></html>', 'onboarding');
INSERT INTO email_templates VALUES ( 2, 'Order Confirmation', '<html><body>Your order has been confirmed!</body></html>', 'transactional');
INSERT INTO email_templates VALUES ( 3, 'Password Reset', '<html><body>Click here to reset your password.</body></html>', 'security');
SELECT template_id, template_name, category FROM email_templates ORDER BY template_id;Log Notifications
INSERT INTO notifications VALUES (1, 1001, 1, 1705400000, 'delivered');INSERT INTO notifications VALUES (2, 1002, 1, 1705400100, 'delivered');INSERT INTO notifications VALUES (3, 1003, 1, 1705400200, 'delivered');INSERT INTO notifications VALUES (4, 1004, 2, 1705400300, 'delivered');INSERT INTO notifications VALUES (5, 1005, 2, 1705400400, 'delivered');INSERT INTO notifications VALUES (6, 1001, 2, 1705400500, 'delivered');INSERT INTO notifications VALUES (7, 1006, 3, 1705400600, 'delivered');INSERT INTO notifications VALUES (8, 1007, 1, 1705400700, 'bounced');Analyze Usage
SELECT t.template_name, t.category, COUNT(*) as times_sentFROM email_templates tINNER JOIN notifications n ON t.template_id = n.template_idGROUP BY t.template_id, t.template_name, t.categoryORDER BY times_sent DESC;Output:
┌────────────────────┬───────────────┬───────┐│ group_1 │ group_2 │ agg_0 │├────────────────────┼───────────────┼───────┤│ Welcome Email │ onboarding │ 4 ││ Order Confirmation │ transactional │ 3 ││ Password Reset │ security │ 1 │└────────────────────┴───────────────┴───────┘Storage benefit: With CAS, 100K notifications using 3 templates stores template content only 3 times instead of 100K times.
DROP TABLE notifications;DROP TABLE email_templates;4. Columnar Storage: Time-Series Analytics
Scenario
IoT monitoring system with sensors reporting metrics every second. Aggregations over time ranges are critical.
Create Schema
CREATE TABLE sensor_readings ( reading_id INT PRIMARY KEY, sensor_id INT, sensor_type TEXT STORAGE DICTIONARY, location TEXT STORAGE DICTIONARY, timestamp INT8 STORAGE COLUMNAR, temperature FLOAT8 STORAGE COLUMNAR, humidity FLOAT8 STORAGE COLUMNAR, pressure FLOAT8 STORAGE COLUMNAR);Insert Sensor Data
INSERT INTO sensor_readings VALUES (1, 101, 'environmental', 'building_a_floor1', 1705500000, 22.5, 45.2, 1013.25);INSERT INTO sensor_readings VALUES (2, 101, 'environmental', 'building_a_floor1', 1705500060, 22.6, 45.0, 1013.20);INSERT INTO sensor_readings VALUES (3, 101, 'environmental', 'building_a_floor1', 1705500120, 22.7, 44.8, 1013.15);INSERT INTO sensor_readings VALUES (4, 102, 'environmental', 'building_a_floor2', 1705500000, 21.0, 48.5, 1013.30);INSERT INTO sensor_readings VALUES (5, 102, 'environmental', 'building_a_floor2', 1705500060, 21.2, 48.2, 1013.25);INSERT INTO sensor_readings VALUES (6, 201, 'industrial', 'building_b_warehouse', 1705500000, 18.0, 35.0, 1012.00);INSERT INTO sensor_readings VALUES (7, 201, 'industrial', 'building_b_warehouse', 1705500060, 18.2, 35.5, 1012.05);INSERT INTO sensor_readings VALUES (8, 301, 'outdoor', 'campus_entrance', 1705500000, 5.2, 78.0, 1015.50);INSERT INTO sensor_readings VALUES (9, 301, 'outdoor', 'campus_entrance', 1705500060, 5.0, 79.0, 1015.55);Aggregations (Columnar Advantage)
SELECT sensor_type, COUNT(*) as readings, AVG(temperature) as avg_temp, AVG(humidity) as avg_humidityFROM sensor_readingsGROUP BY sensor_typeORDER BY avg_temp DESC;Output:
┌───────────────┬───────┬───────┬────────┐│ group_0 │ agg_0 │ agg_1 │ agg_2 │├───────────────┼───────┼───────┼────────┤│ environmental │ 5 │ 22.0 │ 46.34 ││ industrial │ 2 │ 18.1 │ 35.25 ││ outdoor │ 2 │ 5.1 │ 78.5 │└───────────────┴───────┴───────┴────────┘Min/Max Analysis
SELECT location, MIN(temperature) as min_temp, MAX(temperature) as max_tempFROM sensor_readingsGROUP BY locationORDER BY location;Output:
┌──────────────────────┬───────┬───────┐│ group_0 │ agg_0 │ agg_1 │├──────────────────────┼───────┼───────┤│ building_a_floor1 │ 22.5 │ 22.7 ││ building_a_floor2 │ 21.0 │ 21.2 ││ building_b_warehouse │ 18.0 │ 18.2 ││ campus_entrance │ 5.0 │ 5.2 │└──────────────────────┴───────┴───────┘DROP TABLE sensor_readings;5. Mixed Modes: Event Tracking System
Scenario
SaaS application event tracking combining all storage modes for optimal performance.
Create Schema
CREATE TABLE events ( event_id INT PRIMARY KEY, event_type TEXT STORAGE DICTIONARY, source_app TEXT STORAGE DICTIONARY, user_country TEXT STORAGE DICTIONARY, device_type TEXT STORAGE DICTIONARY, timestamp INT8 STORAGE COLUMNAR, response_time_ms INT STORAGE COLUMNAR, payload_size INT STORAGE COLUMNAR, error_stack TEXT STORAGE CONTENT_ADDRESSED, user_id INT, session_id TEXT, request_path TEXT);Insert Events
INSERT INTO events VALUES (1, 'page_view', 'web_app', 'USA', 'desktop', 1705600000, 45, 1024, NULL, 1001, 'sess_abc', '/dashboard');INSERT INTO events VALUES (2, 'page_view', 'web_app', 'USA', 'mobile', 1705600010, 120, 512, NULL, 1002, 'sess_def', '/products');INSERT INTO events VALUES (3, 'page_view', 'web_app', 'UK', 'desktop', 1705600020, 52, 1024, NULL, 1003, 'sess_ghi', '/dashboard');INSERT INTO events VALUES (4, 'api_call', 'web_app', 'USA', 'desktop', 1705600050, 23, 256, NULL, 1001, 'sess_abc', '/api/users');INSERT INTO events VALUES (5, 'api_call', 'mobile_app', 'USA', 'mobile', 1705600060, 156, 512, NULL, 1002, 'sess_def', '/api/products');INSERT INTO events VALUES (6, 'error', 'web_app', 'USA', 'desktop', 1705600080, 5000, 2048, 'Error: Connection timeout at db.js:142', 1007, 'sess_stu', '/api/users/123');INSERT INTO events VALUES (7, 'error', 'web_app', 'UK', 'mobile', 1705600090, 5200, 2048, 'Error: Connection timeout at db.js:142', 1008, 'sess_vwx', '/api/users/456');INSERT INTO events VALUES (8, 'transaction', 'web_app', 'USA', 'desktop', 1705600150, 234, 512, NULL, 1001, 'sess_abc', '/checkout');INSERT INTO events VALUES (9, 'login', 'mobile_app', 'France', 'mobile', 1705600190, 203, 192, NULL, 1014, 'sess_nop', '/auth/login');Analytics Queries
SELECT event_type, COUNT(*) as count, AVG(response_time_ms) as avg_response_msFROM eventsGROUP BY event_typeORDER BY count DESC;Output:
┌─────────────┬───────┬───────┐│ group_0 │ agg_0 │ agg_1 │├─────────────┼───────┼───────┤│ page_view │ 3 │ 72.33 ││ api_call │ 2 │ 89.5 ││ error │ 2 │ 5100 ││ transaction │ 1 │ 234 ││ login │ 1 │ 203 │└─────────────┴───────┴───────┘SELECT device_type, COUNT(*) as events, SUM(payload_size) as total_bytesFROM eventsGROUP BY device_typeORDER BY events DESC;Output:
┌─────────┬───────┬───────┐│ group_0 │ agg_0 │ agg_1 │├─────────┼───────┼───────┤│ desktop │ 5 │ 4864 ││ mobile │ 4 │ 3264 │└─────────┴───────┴───────┘Error Analysis (CAS Benefit)
SELECT error_stack, COUNT(*) as occurrencesFROM eventsWHERE event_type = 'error' AND error_stack IS NOT NULLGROUP BY error_stackORDER BY occurrences DESC;Output:
┌────────────────────────────────────────┬───────┐│ group_0 │ agg_0 │├────────────────────────────────────────┼───────┤│ Error: Connection timeout at db.js:142 │ 2 │└────────────────────────────────────────┴───────┘Same error stack stored once, referenced twice.
DROP TABLE events;6. Migration Strategies
Online Schema Evolution
Migrate existing columns to new storage modes without downtime.
CREATE TABLE products ( id INT PRIMARY KEY, name TEXT, category TEXT, brand TEXT, price FLOAT8);
INSERT INTO products VALUES (1, 'Laptop Pro', 'electronics', 'TechCorp', 1299.99);INSERT INTO products VALUES (2, 'Wireless Mouse', 'electronics', 'TechCorp', 49.99);INSERT INTO products VALUES (3, 'USB Cable', 'electronics', 'CableCo', 9.99);INSERT INTO products VALUES (4, 'Office Chair', 'furniture', 'ComfortPlus', 299.99);Migrate to Dictionary
SELECT category, brand, COUNT(*), AVG(price)FROM productsGROUP BY category, brandORDER BY COUNT(*) DESC;
ALTER TABLE products ALTER COLUMN category SET STORAGE DICTIONARY;ALTER TABLE products ALTER COLUMN brand SET STORAGE DICTIONARY;
SELECT category, brand, COUNT(*), AVG(price)FROM productsGROUP BY category, brandORDER BY COUNT(*) DESC;Results are identical - migration is transparent.
Migrate to Columnar
ALTER TABLE products ALTER COLUMN price SET STORAGE COLUMNAR;
SELECT category, COUNT(*), MIN(price), MAX(price), SUM(price)FROM productsGROUP BY category;Output:
┌─────────────┬───────┬────────┬─────────┬─────────┐│ group_0 │ agg_0 │ agg_1 │ agg_2 │ agg_3 │├─────────────┼───────┼────────┼─────────┼─────────┤│ electronics │ 3 │ 9.99 │ 1299.99 │ 1359.97 ││ furniture │ 1 │ 299.99 │ 299.99 │ 299.99 │└─────────────┴───────┴────────┴─────────┴─────────┘Revert to Default
ALTER TABLE products ALTER COLUMN category SET STORAGE DEFAULT;ALTER TABLE products ALTER COLUMN brand SET STORAGE DEFAULT;ALTER TABLE products ALTER COLUMN price SET STORAGE DEFAULT;
SELECT * FROM products ORDER BY id;All data preserved:
┌────┬────────────────┬─────────────┬─────────────┬─────────┐│ id │ name │ category │ brand │ price │├────┼────────────────┼─────────────┼─────────────┼─────────┤│ 1 │ Laptop Pro │ electronics │ TechCorp │ 1299.99 ││ 2 │ Wireless Mouse │ electronics │ TechCorp │ 49.99 ││ 3 │ USB Cable │ electronics │ CableCo │ 9.99 ││ 4 │ Office Chair │ furniture │ ComfortPlus │ 299.99 │└────┴────────────────┴─────────────┴─────────────┴─────────┘DROP TABLE products;Quick Reference
Choosing the Right Mode
| Column Characteristics | Recommended Mode | Example |
|---|---|---|
| < 64K unique strings | DICTIONARY | status, country, category |
| Large values (>1KB) with duplicates | CONTENT_ADDRESSED | templates, JSON payloads |
| Numeric sequences for aggregations | COLUMNAR | timestamps, metrics, prices |
| Unique text, mixed queries | DEFAULT | names, descriptions, IDs |
SQL Syntax
-- At table creationCREATE TABLE t ( col1 TEXT STORAGE DICTIONARY, col2 TEXT STORAGE CONTENT_ADDRESSED, col3 FLOAT8 STORAGE COLUMNAR);
-- Online migrationALTER TABLE t ALTER COLUMN col SET STORAGE DICTIONARY;ALTER TABLE t ALTER COLUMN col SET STORAGE CONTENT_ADDRESSED;ALTER TABLE t ALTER COLUMN col SET STORAGE COLUMNAR;ALTER TABLE t ALTER COLUMN col SET STORAGE DEFAULT;Performance Guidelines
- Dictionary: Best when same values repeat thousands of times
- Content-Addressed: Best for values >1KB that duplicate across rows
- Columnar: Best for numeric columns with SUM/AVG/COUNT queries
- Mixed Modes: Real-world tables often combine multiple modes
See Advanced Features Tutorial for EXPLAIN, Vector Search, Materialized Views, and Bulk Loading.