Skip to content

HTAP & Columnar Storage

HTAP & Columnar Storage

HeliosDB-Lite supports Hybrid Transactional/Analytical Processing (HTAP) with per-column storage modes. Columnar storage groups values from the same column together in batches, improving compression and analytical query performance.

Prerequisites

  • HeliosDB-Lite v3.5 or later
  • Access to the SQL shell (REPL, PostgreSQL wire protocol, or REST API)

Step 1 — Create a Table with Columnar Columns

Mark individual columns as STORAGE COLUMNAR when the column will be used primarily for analytical reads (aggregations, scans, range filters).

CREATE TABLE metrics (
id INT PRIMARY KEY,
sensor_id INT,
timestamp INT8 STORAGE COLUMNAR,
value FLOAT8 STORAGE COLUMNAR,
label TEXT
);

The id, sensor_id, and label columns use the default row-oriented storage. The timestamp and value columns are stored in columnar batches of 1024 values each, optimized for sequential reads.

Step 2 — Insert Data

Insert data exactly as you would with a row table. HeliosDB handles the columnar layout transparently.

INSERT INTO metrics VALUES (1, 10, 1711900000, 23.5, 'temperature');
INSERT INTO metrics VALUES (2, 10, 1711900060, 23.7, 'temperature');
INSERT INTO metrics VALUES (3, 20, 1711900000, 55.0, 'humidity');
INSERT INTO metrics VALUES (4, 20, 1711900060, 54.8, 'humidity');

Step 3 — Run an Analytical Query

Columnar columns shine during aggregations that touch many rows but few columns.

SELECT sensor_id, AVG(value) AS avg_value, COUNT(*) AS readings
FROM metrics
GROUP BY sensor_id;

Expected output:

sensor_id | avg_value | readings
-----------+-----------+---------
10 | 23.60 | 2
20 | 54.90 | 2

Step 4 — Inspect the Query Plan

Use EXPLAIN to confirm HeliosDB uses a Columnar Scan for columns stored in columnar mode.

EXPLAIN SELECT AVG(value) FROM metrics;

Expected output (excerpt):

Aggregate
-> Columnar Scan on metrics (columns: value)

The plan shows Columnar Scan instead of a full row scan, meaning only the value column data is read from storage.

Step 5 — Workload Classification

EXPLAIN output includes workload classification, scoring the query as OLTP or OLAP. Analytical queries score higher on the OLAP axis.

EXPLAIN SELECT sensor_id, AVG(value) FROM metrics GROUP BY sensor_id;

Classification section in output:

Workload Classification:
Type : OLAP
OLTP Score : 0.20
OLAP Score : 0.80

Point lookups and small updates will classify as OLTP:

EXPLAIN SELECT * FROM metrics WHERE id = 1;
Workload Classification:
Type : OLTP
OLTP Score : 0.90
OLAP Score : 0.10

Step 6 — Alter an Existing Column to Columnar

You can change the storage mode of an existing column without rewriting the table.

ALTER TABLE metrics ALTER COLUMN label SET STORAGE COLUMNAR;

After this DDL, new writes to label will use columnar batches. Existing rows are migrated lazily on the next compaction or vacuum.

Step 7 — Compression

HeliosDB uses Zstd compression by default for all stored data. You can configure compression at the storage level in heliosdb.toml:

[storage]
compression = "Zstd" # Options: Zstd, Lz4, None

Columnar data compresses especially well because similar values (same type, same domain) are grouped together, giving Zstd and LZ4 better compression ratios than row-oriented layouts.

Tips and Troubleshooting

  • When to use columnar: Columns that are frequently aggregated, filtered in range scans, or rarely updated individually. Metrics, logs, and time-series data are ideal candidates.

  • When NOT to use columnar: Columns that are updated in-place frequently or accessed in single-row lookups. Keep those in default row storage.

  • Mixed workloads: HeliosDB’s HTAP design lets you mix columnar and row columns in the same table. OLTP-style point lookups use row-stored columns while OLAP-style aggregations benefit from columnar columns.

  • Batch size: Columnar batches hold 1024 values. Tables with fewer than 1024 rows in a column will still use columnar format but with partially filled batches.

  • EXPLAIN ANALYZE: Use EXPLAIN ANALYZE to see per-operator timing and row counts, confirming that columnar scans are reading only the columns needed.

EXPLAIN ANALYZE SELECT AVG(value) FROM metrics;
  • Resource management: The HTAP resource manager assigns separate quotas for OLTP and OLAP workloads, preventing analytical queries from starving transactional ones.