Python Vector App — Semantic Search End-to-End

Available since: v3.13.0 (vector + hybrid surface stable) Stack: psycopg2-binary + sentence-transformers (all-MiniLM-L6-v2, 384-dim) HeliosDB build: default — no feature flag required (vector-search is on by default)

UVP

Most “build a semantic search app” tutorials force you to bolt three services together: PostgreSQL for metadata, Pinecone/Qdrant for the vector index, and Elasticsearch for keyword fallback. HeliosDB Nano collapses all three into one binary. One connection string, one transaction, one cursor. This walkthrough takes you from pip install to a working hybrid (BM25 + vector) search service in ~150 lines of Python — schema, embedding generation, k-NN retrieval, and the keyword-blended scorer all live in the same database.

Prerequisites

pip install psycopg2-binary sentence-transformers numpy

A running Nano server:

heliosdb-nano start --memory --auth scram-sha-256 --password s3cret

(Or --data-dir ./mydata if you want persistence — the rest of the tutorial is identical.)

1. Connect

import psycopg2

conn = psycopg2.connect(
    "postgresql://postgres:s3cret@127.0.0.1:5432/postgres"
)
conn.autocommit = True
cur = conn.cursor()

psycopg2 talks the PostgreSQL v3 wire protocol; HeliosDB Nano implements it natively, so every psycopg2 feature (execute, executemany, fetchall, server-side cursors, COPY FROM) works unchanged.

2. Schema — Vector + Text + Index

cur.execute("""
CREATE TABLE IF NOT EXISTS docs (
    id        SERIAL PRIMARY KEY,
    title     TEXT NOT NULL,
    body      TEXT NOT NULL,
    category  TEXT,
    embedding VECTOR(384)
);
""")

# HNSW index for k-NN
cur.execute("""
CREATE INDEX IF NOT EXISTS docs_embedding_idx
    ON docs USING hnsw (embedding vector_cosine_ops);
""")

VECTOR(384) matches the dimensionality of all-MiniLM-L6-v2. The vector_cosine_ops opclass tells the HNSW index to use cosine distance (the <=> operator at query time).

3. Generate and Store Embeddings

from sentence_transformers import SentenceTransformer

# 80 MB download on first run
model = SentenceTransformer("all-MiniLM-L6-v2")

documents = [
    ("Intro to Rust",            "Rust is a systems programming language",  "programming"),
    ("Postgres for analysts",    "PostgreSQL is a powerful relational DB",  "databases"),
    ("Helios Nano release",      "Embedded DB with vector search built-in", "databases"),
    ("Sentence transformers",    "BERT models for semantic embeddings",     "ml"),
    ("FIPS 140-3 compliance",    "Federal cryptographic standards",         "security"),
    ("Hybrid retrieval",         "Combining BM25 with dense vectors",       "ml"),
]

# Embed in one batch (much faster than per-row encode)
embeddings = model.encode([d[1] for d in documents], normalize_embeddings=True)

# Store — psycopg2 needs the vector serialized as a Postgres array literal
def vec_lit(v):
    return "[" + ",".join(f"{x:.6f}" for x in v) + "]"

for (title, body, cat), vec in zip(documents, embeddings):
    cur.execute(
        "INSERT INTO docs (title, body, category, embedding) "
        "VALUES (%s, %s, %s, %s)",
        (title, body, cat, vec_lit(vec))
    )

normalize_embeddings=True puts every vector on the unit sphere so cosine distance equals 1 − dot_product — the HNSW index’s preferred regime.

4. k-NN Vector Search

def vector_search(query, k=5, category=None):
    qvec = model.encode([query], normalize_embeddings=True)[0]
    if category:
        cur.execute(
            "SELECT id, title, body, embedding <=> %s::vector AS distance "
            "FROM docs "
            "WHERE category = %s "
            "ORDER BY distance "
            "LIMIT %s",
            (vec_lit(qvec), category, k)
        )
    else:
        cur.execute(
            "SELECT id, title, body, embedding <=> %s::vector AS distance "
            "FROM docs "
            "ORDER BY distance "
            "LIMIT %s",
            (vec_lit(qvec), k)
        )
    return cur.fetchall()

for row in vector_search("how do I encrypt data at rest?", k=3):
    print(f"  dist={row[3]:.3f}  {row[1]}")

<=> is cosine distance. Other operators:

Operator	Distance	Best for
`<=>`	Cosine	Normalized text embeddings
`<->`	L2 (Euclidean)	Image embeddings, raw vectors
`<#>`	Negative inner product	Already-normalized + speed

The HNSW index kicks in automatically when the query has an ORDER BY <distance> LIMIT k shape. EXPLAIN will show Index Scan using docs_embedding_idx (HNSW).

5. Hybrid Search — BM25 + Vector

A 70/30 blend of vector similarity and keyword score is the most reliable retrieval recipe in production. HeliosDB exposes both in the same query:

def hybrid_search(query, k=5, alpha=0.7):
    """alpha * vector_score + (1 - alpha) * bm25_score"""
    qvec = model.encode([query], normalize_embeddings=True)[0]
    cur.execute("""
        SELECT id, title, body,
               %(alpha)s        * (1.0 - (embedding <=> %(qv)s::vector))
             + (1.0 - %(alpha)s) * ts_rank_cd(to_tsvector(body), plainto_tsquery(%(q)s))
               AS score
        FROM docs
        WHERE to_tsvector(body) @@ plainto_tsquery(%(q)s)
           OR embedding <=> %(qv)s::vector < 0.5
        ORDER BY score DESC
        LIMIT %(k)s
    """, {"alpha": alpha, "qv": vec_lit(qvec), "q": query, "k": k})
    return cur.fetchall()

for row in hybrid_search("vector search compliance", k=5):
    print(f"  score={row[3]:.3f}  {row[1]}")

The WHERE clause is a recall floor — return docs that match either keyword (BM25) or are within cosine distance 0.5 (vector). The ORDER BY then re-ranks the union by the blended score.

6. RLS-Aware Search (multi-tenant)

If you’ve enabled Row-Level Security on docs, every query above respects the active JWT claims automatically — no client-side filter needed. Set the session role and the policies enforce themselves:

cur.execute("SET ROLE tenant_user;")
cur.execute("SET LOCAL helios.jwt.tenant_id = 'acme';")

# Same vector_search() call — but policies on `docs` clip results to acme's rows
results = vector_search("encrypt data", k=10)

See RLS_POLICY_MANAGEMENT_TUTORIAL.

7. Putting It All Together

import psycopg2
from sentence_transformers import SentenceTransformer

CONN = "postgresql://postgres:s3cret@127.0.0.1:5432/postgres"
MODEL = SentenceTransformer("all-MiniLM-L6-v2")

def vec_lit(v):
    return "[" + ",".join(f"{x:.6f}" for x in v) + "]"

def setup(cur):
    cur.execute("CREATE TABLE IF NOT EXISTS docs ("
                "id SERIAL PRIMARY KEY, title TEXT, body TEXT, "
                "category TEXT, embedding VECTOR(384))")
    cur.execute("CREATE INDEX IF NOT EXISTS docs_embedding_idx "
                "ON docs USING hnsw (embedding vector_cosine_ops)")

def index(cur, docs):
    embs = MODEL.encode([d['body'] for d in docs], normalize_embeddings=True)
    for d, v in zip(docs, embs):
        cur.execute("INSERT INTO docs (title, body, category, embedding) "
                    "VALUES (%s, %s, %s, %s)",
                    (d['title'], d['body'], d['category'], vec_lit(v)))

def search(cur, q, k=5):
    qv = MODEL.encode([q], normalize_embeddings=True)[0]
    cur.execute("SELECT id, title, embedding <=> %s::vector AS d "
                "FROM docs ORDER BY d LIMIT %s", (vec_lit(qv), k))
    return cur.fetchall()

if __name__ == "__main__":
    conn = psycopg2.connect(CONN); conn.autocommit = True
    cur = conn.cursor()
    setup(cur)
    index(cur, [
        {"title": "Intro to Rust", "body": "Rust is a systems language", "category": "programming"},
        {"title": "Helios Nano", "body": "Embedded DB with vector search", "category": "databases"},
    ])
    for hit in search(cur, "fast database for AI workloads"):
        print(hit)

Run it:

python semantic_search_app.py
# (1, 'Helios Nano', 0.3421)
# (2, 'Intro to Rust', 0.7104)

Performance Notes

Batch your inserts. executemany or COPY for 100+ rows is 10–50× faster than row-at-a-time.
Index after bulk load. For an empty table, insert all rows first then CREATE INDEX … USING hnsw — HNSW build is faster than incremental insert at the same density.
Tune HNSW with WITH (m = 16, ef_construction = 200); defaults are fine up to ~1 M vectors.
Pre-warm the schema cache by issuing a SELECT 1 FROM docs LIMIT 1 on every new connection; subsequent queries hit the in-memory schema cache.

Troubleshooting

Symptom	Cause	Fix
`column embedding has type vector but expression is of type text`	Forgot the `::vector` cast on the literal	Always cast literals: `'%s'::vector` or use `pgvector` Python type
Slow ANN search	HNSW index missing or different opclass	`EXPLAIN` the query; ensure index opclass matches the operator (`<=>` ↔ `vector_cosine_ops`)
`OperationalError: SCRAM authentication failed`	Wrong password / auth mode	Check `--auth` flag matches client expectation
Memory grows unbounded	psycopg2 not closing cursors	Use context managers: `with conn.cursor() as cur:`

Where Next

VECTOR_SEARCH_TUTORIAL — distance operator deep dive, HNSW tuning.
Python SDK reference — non-PG-wire client API.
BAAS_REST_API_TUTORIAL — same data, REST surface.
NODEJS_BAAS_APP — equivalent app in TypeScript.