Skip to content

Python Vector App — Semantic Search End-to-End

Python Vector App — Semantic Search End-to-End

Available since: v3.13.0 (vector + hybrid surface stable) Stack: psycopg2-binary + sentence-transformers (all-MiniLM-L6-v2, 384-dim) HeliosDB build: default — no feature flag required (vector-search is on by default)


UVP

Most “build a semantic search app” tutorials force you to bolt three services together: PostgreSQL for metadata, Pinecone/Qdrant for the vector index, and Elasticsearch for keyword fallback. HeliosDB Nano collapses all three into one binary. One connection string, one transaction, one cursor. This walkthrough takes you from pip install to a working hybrid (BM25 + vector) search service in ~150 lines of Python — schema, embedding generation, k-NN retrieval, and the keyword-blended scorer all live in the same database.


Prerequisites

Terminal window
pip install psycopg2-binary sentence-transformers numpy

A running Nano server:

Terminal window
heliosdb-nano start --memory --auth scram-sha-256 --password s3cret

(Or --data-dir ./mydata if you want persistence — the rest of the tutorial is identical.)


1. Connect

import psycopg2
conn = psycopg2.connect(
"postgresql://postgres:s3cret@127.0.0.1:5432/postgres"
)
conn.autocommit = True
cur = conn.cursor()

psycopg2 talks the PostgreSQL v3 wire protocol; HeliosDB Nano implements it natively, so every psycopg2 feature (execute, executemany, fetchall, server-side cursors, COPY FROM) works unchanged.


2. Schema — Vector + Text + Index

cur.execute("""
CREATE TABLE IF NOT EXISTS docs (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
category TEXT,
embedding VECTOR(384)
);
""")
# HNSW index for k-NN
cur.execute("""
CREATE INDEX IF NOT EXISTS docs_embedding_idx
ON docs USING hnsw (embedding vector_cosine_ops);
""")

VECTOR(384) matches the dimensionality of all-MiniLM-L6-v2. The vector_cosine_ops opclass tells the HNSW index to use cosine distance (the <=> operator at query time).


3. Generate and Store Embeddings

from sentence_transformers import SentenceTransformer
# 80 MB download on first run
model = SentenceTransformer("all-MiniLM-L6-v2")
documents = [
("Intro to Rust", "Rust is a systems programming language", "programming"),
("Postgres for analysts", "PostgreSQL is a powerful relational DB", "databases"),
("Helios Nano release", "Embedded DB with vector search built-in", "databases"),
("Sentence transformers", "BERT models for semantic embeddings", "ml"),
("FIPS 140-3 compliance", "Federal cryptographic standards", "security"),
("Hybrid retrieval", "Combining BM25 with dense vectors", "ml"),
]
# Embed in one batch (much faster than per-row encode)
embeddings = model.encode([d[1] for d in documents], normalize_embeddings=True)
# Store — psycopg2 needs the vector serialized as a Postgres array literal
def vec_lit(v):
return "[" + ",".join(f"{x:.6f}" for x in v) + "]"
for (title, body, cat), vec in zip(documents, embeddings):
cur.execute(
"INSERT INTO docs (title, body, category, embedding) "
"VALUES (%s, %s, %s, %s)",
(title, body, cat, vec_lit(vec))
)

normalize_embeddings=True puts every vector on the unit sphere so cosine distance equals 1 − dot_product — the HNSW index’s preferred regime.


def vector_search(query, k=5, category=None):
qvec = model.encode([query], normalize_embeddings=True)[0]
if category:
cur.execute(
"SELECT id, title, body, embedding <=> %s::vector AS distance "
"FROM docs "
"WHERE category = %s "
"ORDER BY distance "
"LIMIT %s",
(vec_lit(qvec), category, k)
)
else:
cur.execute(
"SELECT id, title, body, embedding <=> %s::vector AS distance "
"FROM docs "
"ORDER BY distance "
"LIMIT %s",
(vec_lit(qvec), k)
)
return cur.fetchall()
for row in vector_search("how do I encrypt data at rest?", k=3):
print(f" dist={row[3]:.3f} {row[1]}")

<=> is cosine distance. Other operators:

OperatorDistanceBest for
<=>CosineNormalized text embeddings
<->L2 (Euclidean)Image embeddings, raw vectors
<#>Negative inner productAlready-normalized + speed

The HNSW index kicks in automatically when the query has an ORDER BY <distance> LIMIT k shape. EXPLAIN will show Index Scan using docs_embedding_idx (HNSW).


5. Hybrid Search — BM25 + Vector

A 70/30 blend of vector similarity and keyword score is the most reliable retrieval recipe in production. HeliosDB exposes both in the same query:

def hybrid_search(query, k=5, alpha=0.7):
"""alpha * vector_score + (1 - alpha) * bm25_score"""
qvec = model.encode([query], normalize_embeddings=True)[0]
cur.execute("""
SELECT id, title, body,
%(alpha)s * (1.0 - (embedding <=> %(qv)s::vector))
+ (1.0 - %(alpha)s) * ts_rank_cd(to_tsvector(body), plainto_tsquery(%(q)s))
AS score
FROM docs
WHERE to_tsvector(body) @@ plainto_tsquery(%(q)s)
OR embedding <=> %(qv)s::vector < 0.5
ORDER BY score DESC
LIMIT %(k)s
""", {"alpha": alpha, "qv": vec_lit(qvec), "q": query, "k": k})
return cur.fetchall()
for row in hybrid_search("vector search compliance", k=5):
print(f" score={row[3]:.3f} {row[1]}")

The WHERE clause is a recall floor — return docs that match either keyword (BM25) or are within cosine distance 0.5 (vector). The ORDER BY then re-ranks the union by the blended score.


6. RLS-Aware Search (multi-tenant)

If you’ve enabled Row-Level Security on docs, every query above respects the active JWT claims automatically — no client-side filter needed. Set the session role and the policies enforce themselves:

cur.execute("SET ROLE tenant_user;")
cur.execute("SET LOCAL helios.jwt.tenant_id = 'acme';")
# Same vector_search() call — but policies on `docs` clip results to acme's rows
results = vector_search("encrypt data", k=10)

See RLS_POLICY_MANAGEMENT_TUTORIAL.


7. Putting It All Together

semantic_search_app.py
import psycopg2
from sentence_transformers import SentenceTransformer
CONN = "postgresql://postgres:s3cret@127.0.0.1:5432/postgres"
MODEL = SentenceTransformer("all-MiniLM-L6-v2")
def vec_lit(v):
return "[" + ",".join(f"{x:.6f}" for x in v) + "]"
def setup(cur):
cur.execute("CREATE TABLE IF NOT EXISTS docs ("
"id SERIAL PRIMARY KEY, title TEXT, body TEXT, "
"category TEXT, embedding VECTOR(384))")
cur.execute("CREATE INDEX IF NOT EXISTS docs_embedding_idx "
"ON docs USING hnsw (embedding vector_cosine_ops)")
def index(cur, docs):
embs = MODEL.encode([d['body'] for d in docs], normalize_embeddings=True)
for d, v in zip(docs, embs):
cur.execute("INSERT INTO docs (title, body, category, embedding) "
"VALUES (%s, %s, %s, %s)",
(d['title'], d['body'], d['category'], vec_lit(v)))
def search(cur, q, k=5):
qv = MODEL.encode([q], normalize_embeddings=True)[0]
cur.execute("SELECT id, title, embedding <=> %s::vector AS d "
"FROM docs ORDER BY d LIMIT %s", (vec_lit(qv), k))
return cur.fetchall()
if __name__ == "__main__":
conn = psycopg2.connect(CONN); conn.autocommit = True
cur = conn.cursor()
setup(cur)
index(cur, [
{"title": "Intro to Rust", "body": "Rust is a systems language", "category": "programming"},
{"title": "Helios Nano", "body": "Embedded DB with vector search", "category": "databases"},
])
for hit in search(cur, "fast database for AI workloads"):
print(hit)

Run it:

Terminal window
python semantic_search_app.py
# (1, 'Helios Nano', 0.3421)
# (2, 'Intro to Rust', 0.7104)

Performance Notes

  • Batch your inserts. executemany or COPY for 100+ rows is 10–50× faster than row-at-a-time.
  • Index after bulk load. For an empty table, insert all rows first then CREATE INDEX … USING hnsw — HNSW build is faster than incremental insert at the same density.
  • Tune HNSW with WITH (m = 16, ef_construction = 200); defaults are fine up to ~1 M vectors.
  • Pre-warm the schema cache by issuing a SELECT 1 FROM docs LIMIT 1 on every new connection; subsequent queries hit the in-memory schema cache.

Troubleshooting

SymptomCauseFix
column embedding has type vector but expression is of type textForgot the ::vector cast on the literalAlways cast literals: '%s'::vector or use pgvector Python type
Slow ANN searchHNSW index missing or different opclassEXPLAIN the query; ensure index opclass matches the operator (<=>vector_cosine_ops)
OperationalError: SCRAM authentication failedWrong password / auth modeCheck --auth flag matches client expectation
Memory grows unboundedpsycopg2 not closing cursorsUse context managers: with conn.cursor() as cur:

Where Next