Python Vector App — Semantic Search End-to-End
Python Vector App — Semantic Search End-to-End
Available since: v3.13.0 (vector + hybrid surface stable)
Stack: psycopg2-binary + sentence-transformers (all-MiniLM-L6-v2, 384-dim)
HeliosDB build: default — no feature flag required (vector-search is on by default)
UVP
Most “build a semantic search app” tutorials force you to bolt three services together: PostgreSQL for metadata, Pinecone/Qdrant for the vector index, and Elasticsearch for keyword fallback. HeliosDB Nano collapses all three into one binary. One connection string, one transaction, one cursor. This walkthrough takes you from pip install to a working hybrid (BM25 + vector) search service in ~150 lines of Python — schema, embedding generation, k-NN retrieval, and the keyword-blended scorer all live in the same database.
Prerequisites
pip install psycopg2-binary sentence-transformers numpyA running Nano server:
heliosdb-nano start --memory --auth scram-sha-256 --password s3cret(Or --data-dir ./mydata if you want persistence — the rest of the tutorial is identical.)
1. Connect
import psycopg2
conn = psycopg2.connect( "postgresql://postgres:s3cret@127.0.0.1:5432/postgres")conn.autocommit = Truecur = conn.cursor()psycopg2 talks the PostgreSQL v3 wire protocol; HeliosDB Nano implements it natively, so every psycopg2 feature (execute, executemany, fetchall, server-side cursors, COPY FROM) works unchanged.
2. Schema — Vector + Text + Index
cur.execute("""CREATE TABLE IF NOT EXISTS docs ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, body TEXT NOT NULL, category TEXT, embedding VECTOR(384));""")
# HNSW index for k-NNcur.execute("""CREATE INDEX IF NOT EXISTS docs_embedding_idx ON docs USING hnsw (embedding vector_cosine_ops);""")VECTOR(384) matches the dimensionality of all-MiniLM-L6-v2. The vector_cosine_ops opclass tells the HNSW index to use cosine distance (the <=> operator at query time).
3. Generate and Store Embeddings
from sentence_transformers import SentenceTransformer
# 80 MB download on first runmodel = SentenceTransformer("all-MiniLM-L6-v2")
documents = [ ("Intro to Rust", "Rust is a systems programming language", "programming"), ("Postgres for analysts", "PostgreSQL is a powerful relational DB", "databases"), ("Helios Nano release", "Embedded DB with vector search built-in", "databases"), ("Sentence transformers", "BERT models for semantic embeddings", "ml"), ("FIPS 140-3 compliance", "Federal cryptographic standards", "security"), ("Hybrid retrieval", "Combining BM25 with dense vectors", "ml"),]
# Embed in one batch (much faster than per-row encode)embeddings = model.encode([d[1] for d in documents], normalize_embeddings=True)
# Store — psycopg2 needs the vector serialized as a Postgres array literaldef vec_lit(v): return "[" + ",".join(f"{x:.6f}" for x in v) + "]"
for (title, body, cat), vec in zip(documents, embeddings): cur.execute( "INSERT INTO docs (title, body, category, embedding) " "VALUES (%s, %s, %s, %s)", (title, body, cat, vec_lit(vec)) )normalize_embeddings=True puts every vector on the unit sphere so cosine distance equals 1 − dot_product — the HNSW index’s preferred regime.
4. k-NN Vector Search
def vector_search(query, k=5, category=None): qvec = model.encode([query], normalize_embeddings=True)[0] if category: cur.execute( "SELECT id, title, body, embedding <=> %s::vector AS distance " "FROM docs " "WHERE category = %s " "ORDER BY distance " "LIMIT %s", (vec_lit(qvec), category, k) ) else: cur.execute( "SELECT id, title, body, embedding <=> %s::vector AS distance " "FROM docs " "ORDER BY distance " "LIMIT %s", (vec_lit(qvec), k) ) return cur.fetchall()
for row in vector_search("how do I encrypt data at rest?", k=3): print(f" dist={row[3]:.3f} {row[1]}")<=> is cosine distance. Other operators:
| Operator | Distance | Best for |
|---|---|---|
<=> | Cosine | Normalized text embeddings |
<-> | L2 (Euclidean) | Image embeddings, raw vectors |
<#> | Negative inner product | Already-normalized + speed |
The HNSW index kicks in automatically when the query has an ORDER BY <distance> LIMIT k shape. EXPLAIN will show Index Scan using docs_embedding_idx (HNSW).
5. Hybrid Search — BM25 + Vector
A 70/30 blend of vector similarity and keyword score is the most reliable retrieval recipe in production. HeliosDB exposes both in the same query:
def hybrid_search(query, k=5, alpha=0.7): """alpha * vector_score + (1 - alpha) * bm25_score""" qvec = model.encode([query], normalize_embeddings=True)[0] cur.execute(""" SELECT id, title, body, %(alpha)s * (1.0 - (embedding <=> %(qv)s::vector)) + (1.0 - %(alpha)s) * ts_rank_cd(to_tsvector(body), plainto_tsquery(%(q)s)) AS score FROM docs WHERE to_tsvector(body) @@ plainto_tsquery(%(q)s) OR embedding <=> %(qv)s::vector < 0.5 ORDER BY score DESC LIMIT %(k)s """, {"alpha": alpha, "qv": vec_lit(qvec), "q": query, "k": k}) return cur.fetchall()
for row in hybrid_search("vector search compliance", k=5): print(f" score={row[3]:.3f} {row[1]}")The WHERE clause is a recall floor — return docs that match either keyword (BM25) or are within cosine distance 0.5 (vector). The ORDER BY then re-ranks the union by the blended score.
6. RLS-Aware Search (multi-tenant)
If you’ve enabled Row-Level Security on docs, every query above respects the active JWT claims automatically — no client-side filter needed. Set the session role and the policies enforce themselves:
cur.execute("SET ROLE tenant_user;")cur.execute("SET LOCAL helios.jwt.tenant_id = 'acme';")
# Same vector_search() call — but policies on `docs` clip results to acme's rowsresults = vector_search("encrypt data", k=10)See RLS_POLICY_MANAGEMENT_TUTORIAL.
7. Putting It All Together
import psycopg2from sentence_transformers import SentenceTransformer
CONN = "postgresql://postgres:s3cret@127.0.0.1:5432/postgres"MODEL = SentenceTransformer("all-MiniLM-L6-v2")
def vec_lit(v): return "[" + ",".join(f"{x:.6f}" for x in v) + "]"
def setup(cur): cur.execute("CREATE TABLE IF NOT EXISTS docs (" "id SERIAL PRIMARY KEY, title TEXT, body TEXT, " "category TEXT, embedding VECTOR(384))") cur.execute("CREATE INDEX IF NOT EXISTS docs_embedding_idx " "ON docs USING hnsw (embedding vector_cosine_ops)")
def index(cur, docs): embs = MODEL.encode([d['body'] for d in docs], normalize_embeddings=True) for d, v in zip(docs, embs): cur.execute("INSERT INTO docs (title, body, category, embedding) " "VALUES (%s, %s, %s, %s)", (d['title'], d['body'], d['category'], vec_lit(v)))
def search(cur, q, k=5): qv = MODEL.encode([q], normalize_embeddings=True)[0] cur.execute("SELECT id, title, embedding <=> %s::vector AS d " "FROM docs ORDER BY d LIMIT %s", (vec_lit(qv), k)) return cur.fetchall()
if __name__ == "__main__": conn = psycopg2.connect(CONN); conn.autocommit = True cur = conn.cursor() setup(cur) index(cur, [ {"title": "Intro to Rust", "body": "Rust is a systems language", "category": "programming"}, {"title": "Helios Nano", "body": "Embedded DB with vector search", "category": "databases"}, ]) for hit in search(cur, "fast database for AI workloads"): print(hit)Run it:
python semantic_search_app.py# (1, 'Helios Nano', 0.3421)# (2, 'Intro to Rust', 0.7104)Performance Notes
- Batch your inserts.
executemanyor COPY for 100+ rows is 10–50× faster than row-at-a-time. - Index after bulk load. For an empty table, insert all rows first then
CREATE INDEX … USING hnsw— HNSW build is faster than incremental insert at the same density. - Tune HNSW with
WITH (m = 16, ef_construction = 200); defaults are fine up to ~1 M vectors. - Pre-warm the schema cache by issuing a
SELECT 1 FROM docs LIMIT 1on every new connection; subsequent queries hit the in-memory schema cache.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
column embedding has type vector but expression is of type text | Forgot the ::vector cast on the literal | Always cast literals: '%s'::vector or use pgvector Python type |
| Slow ANN search | HNSW index missing or different opclass | EXPLAIN the query; ensure index opclass matches the operator (<=> ↔ vector_cosine_ops) |
OperationalError: SCRAM authentication failed | Wrong password / auth mode | Check --auth flag matches client expectation |
| Memory grows unbounded | psycopg2 not closing cursors | Use context managers: with conn.cursor() as cur: |
Where Next
- VECTOR_SEARCH_TUTORIAL — distance operator deep dive, HNSW tuning.
- Python SDK reference — non-PG-wire client API.
- BAAS_REST_API_TUTORIAL — same data, REST surface.
- NODEJS_BAAS_APP — equivalent app in TypeScript.