Skip to content

ClickHouse Quick Start Guide

ClickHouse Quick Start Guide

Get started with HeliosDB’s ClickHouse protocol support for high-performance analytical queries. Connect using standard ClickHouse clients with full native protocol compatibility.

Prerequisites

  • HeliosDB server running (default native port: 9000, HTTP port: 8123)
  • ClickHouse client library (Python, Go, Node.js, or clickhouse-client)
  • Familiarity with SQL and analytical workloads

Connection Details

ParameterDefault Value
Hostlocalhost
Native Port9000
HTTP Port8123
Userdefault
Databasedefault
CompressionLZ4, ZSTD

5-Minute Getting Started

Step 1: Connect with Python Client

from clickhouse_driver import Client
client = Client(
host='localhost',
port=9000,
user='default',
password='',
database='default'
)
# Test connection
result = client.execute('SELECT 1')
print(f"Connected: {result}")

Step 2: Create a MergeTree Table

CREATE TABLE events (
timestamp DateTime,
event_type String,
user_id UInt32,
value Float32
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMM(timestamp);

Step 3: Insert and Query Data

# Insert data
client.execute('''
INSERT INTO events (timestamp, event_type, user_id, value)
VALUES
(now(), 'click', 1, 10.5),
(now(), 'view', 2, 5.0),
(now(), 'purchase', 1, 99.99)
''')
# Query aggregations
result = client.execute('''
SELECT event_type, count() as cnt, sum(value) as total
FROM events
GROUP BY event_type
''')
for row in result:
print(f"{row[0]}: {row[1]} events, total: {row[2]}")

Connecting with ClickHouse Clients/Drivers

Python (clickhouse-driver)

from clickhouse_driver import Client
client = Client(
host='localhost',
port=9000,
user='default',
password='',
database='default',
compression=True, # Enable LZ4 compression
settings={'max_threads': 4}
)
result = client.execute('SELECT count() FROM events')
print(f"Total events: {result[0][0]}")

Go (clickhouse-go)

import "github.com/ClickHouse/clickhouse-go/v2"
conn, err := clickhouse.Open(&clickhouse.Options{
Addr: []string{"localhost:9000"},
Auth: clickhouse.Auth{
Database: "default",
Username: "default",
Password: "",
},
Compression: &clickhouse.Compression{
Method: clickhouse.CompressionLZ4,
},
})
ctx := context.Background()
var count uint64
err = conn.QueryRow(ctx, "SELECT count() FROM events").Scan(&count)
fmt.Printf("Total events: %d\n", count)

Node.js (@clickhouse/client)

const { createClient } = require('@clickhouse/client');
const client = createClient({
host: 'http://localhost:8123',
database: 'default',
username: 'default',
password: '',
});
const result = await client.query({
query: 'SELECT count() FROM events',
format: 'JSONEachRow',
});
const data = await result.json();
console.log(`Total events: ${data[0]['count()']}`);

HTTP API (curl)

Terminal window
curl 'http://localhost:8123/' \
--data-binary "SELECT count() FROM events"

Creating MergeTree Tables

Standard MergeTree

CREATE TABLE page_views (
timestamp DateTime,
session_id UUID,
user_id UInt32,
page_path String,
duration_ms UInt32
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMMDD(timestamp);

ReplacingMergeTree (for Upserts)

CREATE TABLE user_profiles (
user_id UInt32,
version UInt32,
name String,
email String
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id;
-- Query with deduplication
SELECT * FROM user_profiles FINAL WHERE user_id = 123;

AggregatingMergeTree (Pre-aggregation)

CREATE TABLE metrics_hourly (
hour DateTime,
metric_name String,
total AggregateFunction(sum, Float64),
count AggregateFunction(count, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY (hour, metric_name);

Basic Analytical Queries

Aggregations

SELECT
event_type,
count() AS events,
count(DISTINCT user_id) AS unique_users,
sum(value) AS total_value,
avg(value) AS avg_value,
quantile(0.95)(value) AS p95_value
FROM events
WHERE timestamp > now() - INTERVAL 24 HOUR
GROUP BY event_type
ORDER BY events DESC;

PREWHERE for Optimization

-- Early filtering before reading heavy columns
SELECT user_id, event_type, value
FROM events
PREWHERE timestamp > '2025-01-01'
WHERE event_type = 'purchase';

Window Functions

SELECT
timestamp,
user_id,
value,
sum(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS running_total,
row_number() OVER (PARTITION BY user_id ORDER BY timestamp) AS event_seq
FROM events
ORDER BY user_id, timestamp;

Inserting Batch Data

Python Bulk Insert

import datetime
import random
# Generate sample data
events = [
(
datetime.datetime.now() - datetime.timedelta(hours=random.randint(0, 24)),
random.choice(['click', 'view', 'purchase']),
random.randint(1, 10000),
random.uniform(0, 100)
)
for _ in range(100000)
]
# Bulk insert
client.execute(
'INSERT INTO events (timestamp, event_type, user_id, value) VALUES',
events,
types_check=True
)
print(f"Inserted {len(events)} events")

Batch Insert with Batching

def bulk_insert(client, data, batch_size=50000):
for i in range(0, len(data), batch_size):
batch = data[i:i+batch_size]
client.execute(
'INSERT INTO events (timestamp, event_type, user_id, value) VALUES',
batch
)
print(f"Inserted batch {i//batch_size + 1}")
bulk_insert(client, events, batch_size=25000)

Performance Tips

TipDescription
Use PREWHEREFilter early on partition/sort key columns
Batch InsertsInsert 10K-100K rows per batch
CompressionEnable LZ4 for network efficiency
PartitioningPartition by time (day/month) for pruning
ORDER BYMatch common query filters to ORDER BY columns
SAMPLEUse sampling for exploratory queries

Sampling for Large Datasets

-- Analyze 10% of data quickly
SELECT event_type, count() as cnt
FROM events
SAMPLE 0.1
GROUP BY event_type;

Materialized Views for Pre-aggregation

CREATE MATERIALIZED VIEW events_hourly
ENGINE = AggregatingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
toStartOfHour(timestamp) AS hour,
event_type,
count() AS event_count,
sum(value) AS total_value
FROM events
GROUP BY hour, event_type;

Key Features

FeatureStatusNotes
Native Protocol100%Port 9000
HTTP Protocol100%Port 8123
MergeTree Engines100%All variants
PREWHERESupportedEarly filtering
Window FunctionsSupportedAnalytics
Materialized ViewsSupportedAuto-aggregation

Performance Benchmarks

OperationSpeedNotes
count()11.7M rows/ms1B rows in 85ms
GROUP BY4.1M rows/ms100K groups
Bulk Insert1.5M rows/sec100K batches

Next Steps


Last Updated: January 2026