ClickHouse Quick Start Guide
ClickHouse Quick Start Guide
Get started with HeliosDB’s ClickHouse protocol support for high-performance analytical queries. Connect using standard ClickHouse clients with full native protocol compatibility.
Prerequisites
- HeliosDB server running (default native port: 9000, HTTP port: 8123)
- ClickHouse client library (Python, Go, Node.js, or clickhouse-client)
- Familiarity with SQL and analytical workloads
Connection Details
| Parameter | Default Value |
|---|---|
| Host | localhost |
| Native Port | 9000 |
| HTTP Port | 8123 |
| User | default |
| Database | default |
| Compression | LZ4, ZSTD |
5-Minute Getting Started
Step 1: Connect with Python Client
from clickhouse_driver import Client
client = Client( host='localhost', port=9000, user='default', password='', database='default')
# Test connectionresult = client.execute('SELECT 1')print(f"Connected: {result}")Step 2: Create a MergeTree Table
CREATE TABLE events ( timestamp DateTime, event_type String, user_id UInt32, value Float32) ENGINE = MergeTree()ORDER BY (timestamp, user_id)PARTITION BY toYYYYMM(timestamp);Step 3: Insert and Query Data
# Insert dataclient.execute(''' INSERT INTO events (timestamp, event_type, user_id, value) VALUES (now(), 'click', 1, 10.5), (now(), 'view', 2, 5.0), (now(), 'purchase', 1, 99.99)''')
# Query aggregationsresult = client.execute(''' SELECT event_type, count() as cnt, sum(value) as total FROM events GROUP BY event_type''')for row in result: print(f"{row[0]}: {row[1]} events, total: {row[2]}")Connecting with ClickHouse Clients/Drivers
Python (clickhouse-driver)
from clickhouse_driver import Client
client = Client( host='localhost', port=9000, user='default', password='', database='default', compression=True, # Enable LZ4 compression settings={'max_threads': 4})
result = client.execute('SELECT count() FROM events')print(f"Total events: {result[0][0]}")Go (clickhouse-go)
import "github.com/ClickHouse/clickhouse-go/v2"
conn, err := clickhouse.Open(&clickhouse.Options{ Addr: []string{"localhost:9000"}, Auth: clickhouse.Auth{ Database: "default", Username: "default", Password: "", }, Compression: &clickhouse.Compression{ Method: clickhouse.CompressionLZ4, },})
ctx := context.Background()var count uint64err = conn.QueryRow(ctx, "SELECT count() FROM events").Scan(&count)fmt.Printf("Total events: %d\n", count)Node.js (@clickhouse/client)
const { createClient } = require('@clickhouse/client');
const client = createClient({ host: 'http://localhost:8123', database: 'default', username: 'default', password: '',});
const result = await client.query({ query: 'SELECT count() FROM events', format: 'JSONEachRow',});const data = await result.json();console.log(`Total events: ${data[0]['count()']}`);HTTP API (curl)
curl 'http://localhost:8123/' \ --data-binary "SELECT count() FROM events"Creating MergeTree Tables
Standard MergeTree
CREATE TABLE page_views ( timestamp DateTime, session_id UUID, user_id UInt32, page_path String, duration_ms UInt32) ENGINE = MergeTree()ORDER BY (timestamp, user_id)PARTITION BY toYYYYMMDD(timestamp);ReplacingMergeTree (for Upserts)
CREATE TABLE user_profiles ( user_id UInt32, version UInt32, name String, email String) ENGINE = ReplacingMergeTree(version)ORDER BY user_id;
-- Query with deduplicationSELECT * FROM user_profiles FINAL WHERE user_id = 123;AggregatingMergeTree (Pre-aggregation)
CREATE TABLE metrics_hourly ( hour DateTime, metric_name String, total AggregateFunction(sum, Float64), count AggregateFunction(count, UInt64)) ENGINE = AggregatingMergeTree()ORDER BY (hour, metric_name);Basic Analytical Queries
Aggregations
SELECT event_type, count() AS events, count(DISTINCT user_id) AS unique_users, sum(value) AS total_value, avg(value) AS avg_value, quantile(0.95)(value) AS p95_valueFROM eventsWHERE timestamp > now() - INTERVAL 24 HOURGROUP BY event_typeORDER BY events DESC;PREWHERE for Optimization
-- Early filtering before reading heavy columnsSELECT user_id, event_type, valueFROM eventsPREWHERE timestamp > '2025-01-01'WHERE event_type = 'purchase';Window Functions
SELECT timestamp, user_id, value, sum(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS running_total, row_number() OVER (PARTITION BY user_id ORDER BY timestamp) AS event_seqFROM eventsORDER BY user_id, timestamp;Inserting Batch Data
Python Bulk Insert
import datetimeimport random
# Generate sample dataevents = [ ( datetime.datetime.now() - datetime.timedelta(hours=random.randint(0, 24)), random.choice(['click', 'view', 'purchase']), random.randint(1, 10000), random.uniform(0, 100) ) for _ in range(100000)]
# Bulk insertclient.execute( 'INSERT INTO events (timestamp, event_type, user_id, value) VALUES', events, types_check=True)print(f"Inserted {len(events)} events")Batch Insert with Batching
def bulk_insert(client, data, batch_size=50000): for i in range(0, len(data), batch_size): batch = data[i:i+batch_size] client.execute( 'INSERT INTO events (timestamp, event_type, user_id, value) VALUES', batch ) print(f"Inserted batch {i//batch_size + 1}")
bulk_insert(client, events, batch_size=25000)Performance Tips
| Tip | Description |
|---|---|
| Use PREWHERE | Filter early on partition/sort key columns |
| Batch Inserts | Insert 10K-100K rows per batch |
| Compression | Enable LZ4 for network efficiency |
| Partitioning | Partition by time (day/month) for pruning |
| ORDER BY | Match common query filters to ORDER BY columns |
| SAMPLE | Use sampling for exploratory queries |
Sampling for Large Datasets
-- Analyze 10% of data quicklySELECT event_type, count() as cntFROM eventsSAMPLE 0.1GROUP BY event_type;Materialized Views for Pre-aggregation
CREATE MATERIALIZED VIEW events_hourlyENGINE = AggregatingMergeTree()ORDER BY (hour, event_type)AS SELECT toStartOfHour(timestamp) AS hour, event_type, count() AS event_count, sum(value) AS total_valueFROM eventsGROUP BY hour, event_type;Key Features
| Feature | Status | Notes |
|---|---|---|
| Native Protocol | 100% | Port 9000 |
| HTTP Protocol | 100% | Port 8123 |
| MergeTree Engines | 100% | All variants |
| PREWHERE | Supported | Early filtering |
| Window Functions | Supported | Analytics |
| Materialized Views | Supported | Auto-aggregation |
Performance Benchmarks
| Operation | Speed | Notes |
|---|---|---|
| count() | 11.7M rows/ms | 1B rows in 85ms |
| GROUP BY | 4.1M rows/ms | 100K groups |
| Bulk Insert | 1.5M rows/sec | 100K batches |
Next Steps
- ClickHouse Configuration - Connection configuration
- ClickHouse Compatibility - Feature compatibility
- ClickHouse Examples - Advanced usage patterns
- Protocol Compatibility Matrix - Cross-protocol comparison
Last Updated: January 2026