Full-Text Search Tutorial
Full-Text Search Tutorial
Available since: v3.13.0 (2026-04-19)
Build: default — no feature flag required
Surface: PostgreSQL-compatible: to_tsvector, to_tsquery, plainto_tsquery, phraseto_tsquery, ts_rank, ts_rank_cd, the @@ operator, TSVECTOR / TSQUERY column types, CREATE INDEX … USING gin | gist.
UVP
If your stack speaks PostgreSQL FTS, point it at HeliosDB Nano and it works. The same to_tsvector(body) @@ to_tsquery('helios') query runs unchanged. Under the hood it’s not a Postgres GIN index — it’s an in-process BM25 engine (src/search/bm25.rs) shared with the hybrid-search path. That means one in-binary scoring engine for FTS, vector, and hybrid retrieval, with ts_rank / ts_rank_cd returning real BM25 scores you can blend with cosine distance in a single SQL statement. No external search service. No client-side reranking. No drift between your DB schema and your search index.
The EasyRAG team had a Python adapter doing client-side reranking with rank_bm25.BM25Okapi to compensate for missing scalar FTS — v3.13.0 deletes that workaround.
Prerequisites
- HeliosDB Nano v3.13+ binary (
heliosdb-nano --version) - A PostgreSQL client:
psql,psycopg,pgx, SQLAlchemy, anything wire-compatible - ~15 minutes
1. Start the Server
heliosdb-nano start --memory --auth scram-sha-256 --password s3cretConnect with psql:
psql "postgresql://postgres:s3cret@127.0.0.1:5432/postgres"2. The Surface in 60 Seconds
| Feature | Syntax | What it does |
|---|---|---|
| Tokenize document | to_tsvector(body) | Returns Unicode-normalised token list as tsvector |
| With config | to_tsvector('english', body) | Config arg accepted, ignored at runtime |
| Build query | to_tsquery('helios') | Same encoding as tsvector |
| Plain query | plainto_tsquery('hello helios') | Alias of to_tsquery |
| Phrase query | phraseto_tsquery('embedded db') | Alias — does NOT do phrase matching |
| Match | vec @@ query | True iff any query term appears in doc |
| BM25 score | ts_rank(vec, query) | Real BM25 score against ephemeral 1-doc index |
Same as _cd | ts_rank_cd(vec, query) | Alias — no cover-density distinction |
| Persistent col | body_tsv TSVECTOR | Stored as JSON token array |
| Index DDL | CREATE INDEX … USING gin (col) | Accepted (ORM-compat); see “Limitations” |
tsvector and tsquery round-trip as Value::Json arrays — they cross the PostgreSQL wire protocol unchanged and render as JSON to introspection tools.
3. Basic Match — @@ operator
CREATE TABLE articles ( id SERIAL PRIMARY KEY, title TEXT, body TEXT);
INSERT INTO articles (title, body) VALUES ('HeliosDB Nano', 'embedded database with native vector search'), ('PostgreSQL guide', 'tuning the postgres optimiser for olap'), ('Hybrid retrieval', 'combine BM25 lexical scores with cosine vector distance'), ('Time-travel queries', 'snapshot isolation with AS OF SYSTEM TIME'), ('SQL FTS overview', 'tsvector tsquery and the @@ match operator');SELECT id, titleFROM articlesWHERE to_tsvector(body) @@ to_tsquery('vector'); id | title----+---------------------- 1 | HeliosDB Nano 3 | Hybrid retrieval4. Ranked Retrieval — ts_rank / ts_rank_cd
SELECT id, title, ts_rank_cd(to_tsvector(body), to_tsquery('vector')) AS scoreFROM articlesWHERE to_tsvector(body) @@ to_tsquery('vector')ORDER BY score DESCLIMIT 10;Both ts_rank and ts_rank_cd are powered by the same BM25 engine (search::Bm25Index, landed in v3.11.0 and surfaced through SQL in v3.13.0). The cover-density variant _cd exists for signature compatibility but produces the same score — positional information isn’t in our tsvector encoding.
The optional weights / normalisation arguments are accepted and ignored:
-- All four signatures parse and run; the weights/norm are no-ops.SELECT ts_rank(to_tsvector(body), to_tsquery('vector'));SELECT ts_rank('{0.1, 0.2, 0.4, 1.0}'::float[], to_tsvector(body), to_tsquery('vector'));SELECT ts_rank(to_tsvector(body), to_tsquery('vector'), 32);SELECT ts_rank('{0.1, 0.2, 0.4, 1.0}'::float[], to_tsvector(body), to_tsquery('vector'), 32);5. Persistent tsvector Columns
Pre-tokenise on write to keep query-time CPU minimal:
CREATE TABLE docs ( id SERIAL PRIMARY KEY, body TEXT, body_tsv TSVECTOR);
CREATE INDEX docs_body_fts ON docs USING gin (body_tsv);
INSERT INTO docs (body, body_tsv) VALUES ('quick brown fox', to_tsvector('quick brown fox')), ('lazy dog sleeps', to_tsvector('lazy dog sleeps')), ('the fox jumps', to_tsvector('the fox jumps'));
SELECT id, body, ts_rank_cd(body_tsv, to_tsquery('fox')) AS rankFROM docsWHERE body_tsv @@ to_tsquery('fox')ORDER BY rank DESC;The GIN DDL is accepted and persisted but does not build a runtime inverted index — @@ walks matching rows and matches in the evaluator. Rationale and migration path: see docs/compatibility/fts.md in the source tree.
6. Hybrid Search (BM25 + Vector via Linear Blend)
The same query can compose lexical and semantic signals. This pattern is what most RAG pipelines actually want:
CREATE TABLE chunks ( id SERIAL PRIMARY KEY, tenant_id INTEGER, text TEXT, embedding VECTOR(384));
CREATE INDEX chunks_emb_hnsw ON chunks USING hnsw (embedding vector_cosine_ops);SELECT id, text, 1.0 - (embedding <=> $1::vector) AS vec_score, ts_rank_cd(to_tsvector(text), plainto_tsquery($2)) AS bm25_score, 0.7 * (1.0 - (embedding <=> $1::vector)) + 0.3 * ts_rank_cd(to_tsvector(text), plainto_tsquery($2)) AS hybrid_scoreFROM chunksWHERE tenant_id = $3 AND (embedding <=> $1::vector) < 0.8ORDER BY hybrid_score DESCLIMIT 10;For RRF (Reciprocal Rank Fusion) and MMR (Maximal Marginal Relevance), call the in-process search::hybrid_search orchestrator from a Rust embedder, or use the MCP tool heliosdb_hybrid_search (added v3.11.0).
7. The EasyRAG Simplification (Real-World Adoption)
Pre-3.13, the EasyRAG adapter
(backend/app/services/vectordb/adapters/heliosdb_nano_adapter.py) had to:
# OLD: client-side rerankingfrom rank_bm25 import BM25Okapibm25 = BM25Okapi([doc.split() for doc in docs])scores = bm25.get_scores(query.split())After v3.13.0 the entire BM25Okapi block deletes — replaced with one SQL query that uses ts_rank_cd server-side. Migration guide: easyrag/docs/heliosdb_nano_adapter_simplification.md in that project.
8. Limitations — Read This Before You Migrate
These are the cases where Postgres FTS and HeliosDB FTS differ. All come from the same source: we ship one Unicode-word tokenizer and a token-set BM25 — no positions, no stemmers, no persistent inverted index.
| Postgres has | We have | Workaround |
|---|---|---|
Language stemmers ('english', 'spanish', …) | Tokenizer normalises but does NOT stem | Stem at ingest time, store the stemmed tokens in a TSVECTOR column |
Phrase queries ('quick <-> fox') | Operator parses, proximity discarded | Post-filter in app code, or use phraseto_tsquery for “any term match” |
setweight(tsv, 'A') | Accepted at the API surface, ignored at runtime | Bias scores manually with multiplied terms in ts_rank |
| Persistent GIN index walks | USING gin DDL accepted, runtime is sequential @@ walk | Prefilter (tenant_id, time range, vector cut) before @@ to bound the walk |
| Multi-column `setweight(…) | setweight(…)` |
In practice, ~100k rows of moderate-length text scan in <50 ms unfiltered. Beyond that, prefilter — typical RAG queries already filter by tenant_id or a vector proximity cut, which keeps the walk bounded.
The full scope-of-support contract is docs/compatibility/fts.md in the source repo.
9. Verification Checklist
-- Sanity: scalar functionsSELECT to_tsvector('hello heliosdb'); -- ["hello","heliosdb"]SELECT to_tsquery('helios'); -- ["helios"]
-- Sanity: match operatorSELECT 'hello world'::TEXT::TSVECTOR @@ 'world'::TEXT::TSQUERY;-- (Use the function form to avoid the cast surprise.)
-- Sanity: rank orderingSELECT ts_rank_cd(to_tsvector('helios is fast'), to_tsquery('fast')); -- > 0.0
-- DDL acceptance (no-op at runtime)CREATE INDEX IF NOT EXISTS t_fts ON articles USING gin (body);Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
to_tsvector('foxes') ≠ to_tsvector('fox') | No stemmer | Stem at ingest, store the stemmed TSVECTOR |
ts_rank_cd returns same score as ts_rank | Aliased to BM25 — no cover-density encoding | Both are correct BM25; the _cd is API-compat only |
USING gin “works” but query is slow | DDL accepted, runtime walks rows | Add a prefilter predicate; index is no-op |
@@ returns NULL | Three-valued logic — one operand is NULL | COALESCE(body_tsv, to_tsvector('')) |
phraseto_tsquery('embedded db') returns rows missing the phrase | We treat phrase queries as bag-of-terms | Post-filter in app code if exact phrase matters |
Where Next
- VECTOR_SEARCH_TUTORIAL — HNSW, cosine, product quantization.
- EXPLAIN_AND_OPTIMIZATION_TUTORIAL — see how the planner walks an
@@query. - BAAS_REST_API_TUTORIAL — the REST
fts/plfts/phftsfilter operators surface this engine over HTTP. docs/compatibility/fts.md(source repo) — exact scope of support, what works, what doesn’t.
CHANGELOG references: v3.13.0 (FTS surface), v3.11.0 (BM25 engine, hybrid orchestrator), v3.19.1 (current verified release).