ClickHouse Quick Start Guide

Get started with HeliosDB’s ClickHouse protocol support for high-performance analytical queries. Connect using standard ClickHouse clients with full native protocol compatibility.

Prerequisites

HeliosDB server running (default native port: 9000, HTTP port: 8123)
ClickHouse client library (Python, Go, Node.js, or clickhouse-client)
Familiarity with SQL and analytical workloads

Connection Details

Parameter	Default Value
Host	`localhost`
Native Port	`9000`
HTTP Port	`8123`
User	`default`
Database	`default`
Compression	LZ4, ZSTD

5-Minute Getting Started

Step 1: Connect with Python Client

from clickhouse_driver import Client

client = Client(
    host='localhost',
    port=9000,
    user='default',
    password='',
    database='default'
)

# Test connection
result = client.execute('SELECT 1')
print(f"Connected: {result}")

Step 2: Create a MergeTree Table

CREATE TABLE events (
    timestamp DateTime,
    event_type String,
    user_id UInt32,
    value Float32
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMM(timestamp);

Step 3: Insert and Query Data

# Insert data
client.execute('''
    INSERT INTO events (timestamp, event_type, user_id, value)
    VALUES
    (now(), 'click', 1, 10.5),
    (now(), 'view', 2, 5.0),
    (now(), 'purchase', 1, 99.99)
''')

# Query aggregations
result = client.execute('''
    SELECT event_type, count() as cnt, sum(value) as total
    FROM events
    GROUP BY event_type
''')
for row in result:
    print(f"{row[0]}: {row[1]} events, total: {row[2]}")

Connecting with ClickHouse Clients/Drivers

Python (clickhouse-driver)

from clickhouse_driver import Client

client = Client(
    host='localhost',
    port=9000,
    user='default',
    password='',
    database='default',
    compression=True,  # Enable LZ4 compression
    settings={'max_threads': 4}
)

result = client.execute('SELECT count() FROM events')
print(f"Total events: {result[0][0]}")

Go (clickhouse-go)

import "github.com/ClickHouse/clickhouse-go/v2"

conn, err := clickhouse.Open(&clickhouse.Options{
    Addr: []string{"localhost:9000"},
    Auth: clickhouse.Auth{
        Database: "default",
        Username: "default",
        Password: "",
    },
    Compression: &clickhouse.Compression{
        Method: clickhouse.CompressionLZ4,
    },
})

ctx := context.Background()
var count uint64
err = conn.QueryRow(ctx, "SELECT count() FROM events").Scan(&count)
fmt.Printf("Total events: %d\n", count)

Node.js (@clickhouse/client)

const { createClient } = require('@clickhouse/client');

const client = createClient({
    host: 'http://localhost:8123',
    database: 'default',
    username: 'default',
    password: '',
});

const result = await client.query({
    query: 'SELECT count() FROM events',
    format: 'JSONEachRow',
});
const data = await result.json();
console.log(`Total events: ${data[0]['count()']}`);

HTTP API (curl)

curl 'http://localhost:8123/' \
  --data-binary "SELECT count() FROM events"

Creating MergeTree Tables

Standard MergeTree

CREATE TABLE page_views (
    timestamp DateTime,
    session_id UUID,
    user_id UInt32,
    page_path String,
    duration_ms UInt32
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMMDD(timestamp);

ReplacingMergeTree (for Upserts)

CREATE TABLE user_profiles (
    user_id UInt32,
    version UInt32,
    name String,
    email String
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id;

-- Query with deduplication
SELECT * FROM user_profiles FINAL WHERE user_id = 123;

AggregatingMergeTree (Pre-aggregation)

CREATE TABLE metrics_hourly (
    hour DateTime,
    metric_name String,
    total AggregateFunction(sum, Float64),
    count AggregateFunction(count, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY (hour, metric_name);

Basic Analytical Queries

Aggregations

SELECT
    event_type,
    count() AS events,
    count(DISTINCT user_id) AS unique_users,
    sum(value) AS total_value,
    avg(value) AS avg_value,
    quantile(0.95)(value) AS p95_value
FROM events
WHERE timestamp > now() - INTERVAL 24 HOUR
GROUP BY event_type
ORDER BY events DESC;

PREWHERE for Optimization

-- Early filtering before reading heavy columns
SELECT user_id, event_type, value
FROM events
PREWHERE timestamp > '2025-01-01'
WHERE event_type = 'purchase';

Window Functions

SELECT
    timestamp,
    user_id,
    value,
    sum(value) OVER (PARTITION BY user_id ORDER BY timestamp) AS running_total,
    row_number() OVER (PARTITION BY user_id ORDER BY timestamp) AS event_seq
FROM events
ORDER BY user_id, timestamp;

Inserting Batch Data

Python Bulk Insert

import datetime
import random

# Generate sample data
events = [
    (
        datetime.datetime.now() - datetime.timedelta(hours=random.randint(0, 24)),
        random.choice(['click', 'view', 'purchase']),
        random.randint(1, 10000),
        random.uniform(0, 100)
    )
    for _ in range(100000)
]

# Bulk insert
client.execute(
    'INSERT INTO events (timestamp, event_type, user_id, value) VALUES',
    events,
    types_check=True
)
print(f"Inserted {len(events)} events")

Batch Insert with Batching

def bulk_insert(client, data, batch_size=50000):
    for i in range(0, len(data), batch_size):
        batch = data[i:i+batch_size]
        client.execute(
            'INSERT INTO events (timestamp, event_type, user_id, value) VALUES',
            batch
        )
        print(f"Inserted batch {i//batch_size + 1}")

bulk_insert(client, events, batch_size=25000)

Performance Tips

Tip	Description
Use PREWHERE	Filter early on partition/sort key columns
Batch Inserts	Insert 10K-100K rows per batch
Compression	Enable LZ4 for network efficiency
Partitioning	Partition by time (day/month) for pruning
ORDER BY	Match common query filters to ORDER BY columns
SAMPLE	Use sampling for exploratory queries

Sampling for Large Datasets

-- Analyze 10% of data quickly
SELECT event_type, count() as cnt
FROM events
SAMPLE 0.1
GROUP BY event_type;

Materialized Views for Pre-aggregation

CREATE MATERIALIZED VIEW events_hourly
ENGINE = AggregatingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    event_type,
    count() AS event_count,
    sum(value) AS total_value
FROM events
GROUP BY hour, event_type;

Key Features

Feature	Status	Notes
Native Protocol	100%	Port 9000
HTTP Protocol	100%	Port 8123
MergeTree Engines	100%	All variants
PREWHERE	Supported	Early filtering
Window Functions	Supported	Analytics
Materialized Views	Supported	Auto-aggregation

Performance Benchmarks

Operation	Speed	Notes
count()	11.7M rows/ms	1B rows in 85ms
GROUP BY	4.1M rows/ms	100K groups
Bulk Insert	1.5M rows/sec	100K batches

Next Steps

ClickHouse Configuration - Connection configuration
ClickHouse Compatibility - Feature compatibility
ClickHouse Examples - Advanced usage patterns
Protocol Compatibility Matrix - Cross-protocol comparison

Last Updated: January 2026