Vector Search Tutorial
Vector Search Tutorial
Version: 1.0 Created: 2025-12-01 Target Audience: Developers building AI/ML applications Prerequisites: Basic SQL knowledge, understanding of embeddings Estimated Time: 45-60 minutes
Table of Contents
- Introduction
- Vector Basics
- Creating Vector Tables
- Generating Embeddings
- HNSW Indexing
- Similarity Search
- Product Quantization
- Real-World Examples
- Performance Tuning
- Monitoring
1. Introduction
What is Vector Search?
Vector search (also known as semantic search or similarity search) enables AI applications to find similar items based on meaning rather than exact keyword matches. Instead of searching for exact text, vector search compares mathematical representations (embeddings) of content to find semantically similar items.
Traditional Keyword Search vs Vector Search:
Query: "automobile repair"
Keyword Search Results:✓ "automobile repair shop"✗ "car maintenance service" (missed - different words)✗ "vehicle fixing center" (missed - different words)
Vector Search Results:✓ "automobile repair shop" (exact match)✓ "car maintenance service" (semantically similar)✓ "vehicle fixing center" (semantically similar)Use Cases
| Use Case | Description | Typical Dimensions |
|---|---|---|
| Semantic Search | Find documents by meaning, not keywords | 384-768 |
| Recommendations | Suggest similar products, articles, or content | 128-512 |
| RAG (Retrieval Augmented Generation) | Retrieve relevant context for LLM prompts | 384-1536 |
| Image Similarity | Find visually similar images | 512-2048 |
| Anomaly Detection | Identify outliers in time-series or sensor data | 64-256 |
| Duplicate Detection | Find near-duplicate content | 256-768 |
How HeliosDB Nano Implements It
HeliosDB Nano provides native vector search capabilities with:
┌─────────────────────────────────────────────────────────────┐│ HeliosDB Nano Vector Search │├─────────────────────────────────────────────────────────────┤│ SQL Interface: VECTOR type, distance operators, indexes │├─────────────────────────────────────────────────────────────┤│ HNSW Algorithm: Graph-based approximate nearest neighbors │├─────────────────────────────────────────────────────────────┤│ SIMD Acceleration: AVX2/NEON for 2-6x faster distances │├─────────────────────────────────────────────────────────────┤│ Product Quantization: 384x memory compression │├─────────────────────────────────────────────────────────────┤│ RocksDB Storage: Persistent, crash-safe index storage │└─────────────────────────────────────────────────────────────┘Key Benefits:
- Embedded (no separate server required)
- Sub-millisecond query latency for millions of vectors
- SIMD-accelerated distance calculations
- Memory-efficient with Product Quantization
- Compatible with pgvector syntax
2. Vector Basics
Understanding Embeddings
Embeddings are mathematical representations of content (text, images, audio) as fixed-length arrays of floating-point numbers. Each dimension captures semantic information.
Text: "The cat sat on the mat" ↓ (embedding model)Vector: [0.12, -0.45, 0.78, 0.34, -0.12, ...] (384 dimensions)
Text: "A feline rested on the rug" ↓ (embedding model)Vector: [0.15, -0.42, 0.81, 0.36, -0.09, ...] (similar values!)Properties of Good Embeddings:
- Similar content → similar vectors (small distance)
- Different content → different vectors (large distance)
- Fixed dimensionality (all vectors same length)
- Typically normalized (unit length) for cosine similarity
Dimension Considerations
Choosing the right dimension depends on your use case and embedding model:
| Model | Dimension | Use Case | Speed | Accuracy |
|---|---|---|---|---|
all-MiniLM-L6-v2 | 384 | Fast semantic search | Fastest | Good |
all-mpnet-base-v2 | 768 | Balanced performance | Fast | Better |
text-embedding-ada-002 (OpenAI) | 1536 | High accuracy | Medium | Best |
ResNet-50 (images) | 2048 | Image features | Medium | Excellent |
Rules of Thumb:
- Higher dimensions: Better accuracy, more memory, slower queries
- Lower dimensions: Faster queries, less memory, may lose nuance
- Common choices: 384 (fast), 768 (balanced), 1536 (accurate)
Distance Metrics
HeliosDB Nano supports three distance metrics to measure vector similarity:
L2 Distance (Euclidean)
Measures straight-line distance in N-dimensional space.
Formula: distance = √(Σ(a[i] - b[i])²)
Example:Vector A: [1.0, 0.0, 0.0]Vector B: [0.0, 1.0, 0.0]L2 Distance: √((1-0)² + (0-1)² + (0-0)²) = √2 ≈ 1.414
Use when: General-purpose similarity, raw feature vectorsOperator: <->Range: [0, ∞) (0 = identical, larger = more different)Cosine Similarity (Cosine Distance)
Measures angle between vectors (ignores magnitude).
Formula: distance = 1 - (A·B / (||A|| × ||B||))
Example:Vector A: [1.0, 0.0, 0.0] (normalized)Vector B: [0.707, 0.707, 0.0] (normalized)Cosine Distance: 1 - 0.707 ≈ 0.293
Use when: Text embeddings, normalized vectorsOperator: <=>Range: [0, 2] (0 = identical, 2 = opposite direction)Why Cosine for Text?
- Text embeddings are typically normalized (unit length)
- Cosine ignores magnitude, focuses on semantic direction
- Most NLP models optimize for cosine similarity
Inner Product (Dot Product)
Measures alignment between vectors (considers both angle and magnitude).
Formula: distance = -(A·B) = -Σ(a[i] × b[i]) (negative because smaller = better in HeliosDB Nano)
Example:Vector A: [1.0, 2.0, 3.0]Vector B: [4.0, 5.0, 6.0]Inner Product Distance: -(4 + 10 + 18) = -32
Use when: Recommendation systems, non-normalized vectorsOperator: <#>Range: (-∞, 0] (more negative = more similar)Comparison Table:
| Metric | When to Use | Normalized? | SQL Operator |
|---|---|---|---|
| L2 | General similarity, image features | No | <-> |
| Cosine | Text embeddings, semantic search | Yes (recommended) | <=> |
| Inner Product | Recommendations, scoring | No | <#> |
3. Creating Vector Tables
VECTOR Data Type Syntax
The VECTOR(n) data type stores fixed-dimension floating-point arrays.
-- Basic syntaxVECTOR(dimension)
-- ExamplesVECTOR(3) -- 3-dimensional vectorVECTOR(384) -- Common for MiniLM embeddingsVECTOR(768) -- Common for BERT embeddingsVECTOR(1536) -- OpenAI ada-002 embeddingsExample Table Creation
Example 1: Simple Document Embeddings
-- Semantic search for documentsCREATE TABLE documents ( id INT PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL, embedding VECTOR(384) -- all-MiniLM-L6-v2 embeddings);Example 2: Multi-Column Vectors
-- Product catalog with multiple embedding typesCREATE TABLE products ( product_id INT PRIMARY KEY, name TEXT NOT NULL, description TEXT, price DECIMAL(10, 2), category TEXT,
-- Text embedding for name + description text_embedding VECTOR(768),
-- Image embedding for product photos image_embedding VECTOR(512),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);Example 3: User Preferences for Recommendations
-- User preference vectors for collaborative filteringCREATE TABLE user_profiles ( user_id INT PRIMARY KEY, username TEXT NOT NULL, preference_vector VECTOR(128), last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
-- Item vectors for recommendation matchingCREATE TABLE items ( item_id INT PRIMARY KEY, title TEXT NOT NULL, item_vector VECTOR(128), popularity_score FLOAT DEFAULT 0.0);Example 4: Time-Series Anomaly Detection
-- Sensor readings with feature vectorsCREATE TABLE sensor_readings ( reading_id INT PRIMARY KEY, sensor_id TEXT NOT NULL, timestamp TIMESTAMP NOT NULL, raw_values TEXT, -- JSON or serialized data feature_vector VECTOR(64), -- Extracted features is_anomaly BOOLEAN DEFAULT FALSE);Example 5: Image Similarity Search
-- Image database with visual featuresCREATE TABLE images ( image_id INT PRIMARY KEY, filename TEXT NOT NULL, upload_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, tags TEXT, -- Comma-separated tags
-- ResNet-50 feature vector features VECTOR(2048),
-- Metadata width INT, height INT, file_size INT);Table Design Best Practices
-
Choose dimension based on your embedding model
-- Match the model output dimension exactly-- sentence-transformers/all-MiniLM-L6-v2 → 384CREATE TABLE docs (id INT, vec VECTOR(384)); -
Add metadata columns for filtering
-- Combine vector search with WHERE filtersCREATE TABLE articles (id INT PRIMARY KEY,category TEXT, -- For filtering by categorypublish_date DATE, -- For filtering by dateauthor TEXT, -- For filtering by authorembedding VECTOR(768)); -
Use appropriate primary keys
-- Auto-incrementing ID for new dataCREATE TABLE logs (id SERIAL PRIMARY KEY,log_text TEXT,embedding VECTOR(384)); -
Consider NULL handling
-- Allow NULL if embeddings are generated asynchronouslyCREATE TABLE pending_docs (id INT PRIMARY KEY,content TEXT NOT NULL,embedding VECTOR(384) NULL -- Generated after insertion);
4. Generating Embeddings
Using OpenAI API (Python)
The OpenAI API provides high-quality text embeddings via text-embedding-ada-002 (1536 dimensions).
Installation
pip install openaiBasic Usage
import openaiimport json
# Configure API keyopenai.api_key = "your-api-key-here"
def get_embedding(text, model="text-embedding-ada-002"): """Generate embedding for a single text string.""" response = openai.Embedding.create( input=text, model=model ) return response['data'][0]['embedding']
# Exampletext = "The quick brown fox jumps over the lazy dog"embedding = get_embedding(text)
print(f"Dimension: {len(embedding)}") # 1536print(f"First 5 values: {embedding[:5]}")# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]Batch Processing
def get_embeddings_batch(texts, model="text-embedding-ada-002"): """Generate embeddings for multiple texts (up to 2048 per request).""" response = openai.Embedding.create( input=texts, model=model ) return [item['embedding'] for item in response['data']]
# Process multiple documentsdocuments = [ "First document text", "Second document text", "Third document text"]
embeddings = get_embeddings_batch(documents)print(f"Generated {len(embeddings)} embeddings")Storing in HeliosDB Nano
import heliosdb # Hypothetical Python client
# Connect to databasedb = heliosdb.connect("mydb.db")
# Generate embeddingtext = "Sample document content"embedding = get_embedding(text)
# Convert to SQL array formatembedding_str = "[" + ",".join(map(str, embedding)) + "]"
# Insert into databasedb.execute(""" INSERT INTO documents (id, content, embedding) VALUES (?, ?, ?)""", (1, text, embedding_str))
db.commit()Using sentence-transformers (Python)
The sentence-transformers library provides efficient local embedding generation (no API calls).
Installation
pip install sentence-transformersBasic Usage
from sentence_transformers import SentenceTransformer
# Load model (downloads on first use, cached afterward)model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate single embeddingtext = "This is a sample sentence"embedding = model.encode(text)
print(f"Dimension: {len(embedding)}") # 384print(f"Type: {type(embedding)}") # numpy.ndarrayprint(f"First 5 values: {embedding[:5]}")# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]Batch Processing (Efficient)
# Process multiple texts in paralleltexts = [ "First sentence", "Second sentence", "Third sentence"]
# Batch encoding (much faster than loop)embeddings = model.encode(texts, batch_size=32)
print(f"Shape: {embeddings.shape}") # (3, 384)
# Iterate over resultsfor i, emb in enumerate(embeddings): print(f"Text {i}: dimension={len(emb)}")Complete Example: Indexing Documents
from sentence_transformers import SentenceTransformerimport sqlite3 # or heliosdb client
# Initialize modelmodel = SentenceTransformer('all-MiniLM-L6-v2')
# Sample documentsdocuments = [ {"id": 1, "text": "Python is a programming language"}, {"id": 2, "text": "Machine learning uses algorithms"}, {"id": 3, "text": "Databases store structured data"}, {"id": 4, "text": "Neural networks process information"},]
# Generate embeddingstexts = [doc["text"] for doc in documents]embeddings = model.encode(texts, show_progress_bar=True)
# Connect to HeliosDB Nanoconn = sqlite3.connect("mydb.db") # Use HeliosDB Nano client in productioncursor = conn.cursor()
# Create tablecursor.execute(""" CREATE TABLE IF NOT EXISTS documents ( id INT PRIMARY KEY, text TEXT, embedding VECTOR(384) )""")
# Insert documents with embeddingsfor doc, emb in zip(documents, embeddings): emb_str = "[" + ",".join(map(str, emb)) + "]" cursor.execute(""" INSERT INTO documents (id, text, embedding) VALUES (?, ?, ?) """, (doc["id"], doc["text"], emb_str))
conn.commit()print(f"Indexed {len(documents)} documents")Popular Models Comparison
| Model | Dimensions | Speed | Quality | Use Case |
|---|---|---|---|---|
all-MiniLM-L6-v2 | 384 | Fastest | Good | General semantic search |
all-mpnet-base-v2 | 768 | Fast | Better | Balanced quality/speed |
all-MiniLM-L12-v2 | 384 | Fast | Good | Slightly better than L6 |
multi-qa-mpnet-base-dot-v1 | 768 | Medium | Best | Q&A and retrieval |
paraphrase-multilingual-mpnet-base-v2 | 768 | Medium | Best | Multilingual support |
Custom Model Fine-Tuning
from sentence_transformers import SentenceTransformer, InputExample, lossesfrom torch.utils.data import DataLoader
# Load base modelmodel = SentenceTransformer('all-MiniLM-L6-v2')
# Create training data (text pairs with similarity scores)train_examples = [ InputExample(texts=["query 1", "relevant doc 1"], label=1.0), InputExample(texts=["query 1", "irrelevant doc"], label=0.0), # Add more examples...]
# Create data loadertrain_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
# Define loss functiontrain_loss = losses.CosineSimilarityLoss(model)
# Fine-tunemodel.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100)
# Save fine-tuned modelmodel.save("./my-finetuned-model")
# Use fine-tuned modelcustom_model = SentenceTransformer("./my-finetuned-model")Using Other Embedding Providers
Cohere API
import cohere
co = cohere.Client("your-api-key")
response = co.embed( texts=["Text to embed"], model="embed-english-v3.0")
embeddings = response.embeddings # List of vectorsHugging Face Transformers
from transformers import AutoTokenizer, AutoModelimport torch
# Load modeltokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")model = AutoModel.from_pretrained("bert-base-uncased")
# Generate embeddingtext = "Sample text"inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad(): outputs = model(**inputs) # Use [CLS] token embedding or mean pooling embedding = outputs.last_hidden_state[:, 0, :].numpy()[0]
print(f"Dimension: {len(embedding)}") # 768 for BERT-baseBest Practices for Embedding Generation
-
Batch processing for efficiency
# Good: Process in batchesembeddings = model.encode(texts, batch_size=32)# Bad: One at a timeembeddings = [model.encode(text) for text in texts] -
Normalize vectors for cosine similarity
import numpy as np# Normalize to unit lengthembedding = model.encode(text)embedding = embedding / np.linalg.norm(embedding) -
Cache embeddings to avoid recomputation
# Store embeddings in database, don't regenerate# Check if embedding exists before generating -
Handle long texts appropriately
# Truncate or chunk long documentsmax_length = 512 # Model's max tokens# Option 1: Truncatetext = text[:max_length]# Option 2: Chunk and averagechunks = [text[i:i+max_length] for i in range(0, len(text), max_length)]chunk_embeddings = model.encode(chunks)avg_embedding = np.mean(chunk_embeddings, axis=0)
5. HNSW Indexing
What is HNSW?
HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It constructs a multi-layer graph where each layer contains progressively fewer nodes, enabling fast logarithmic-time searches.
How HNSW Works
Layer 2: A ←─────→ D (Entry point, sparse) ↓ ↓Layer 1: A ←→ B ←→ D ←→ E (Medium density) ↓ ↓ ↓ ↓Layer 0: A ←→ B ←→ C ←→ D ←→ E ←→ F (All nodes, fully connected)
Search Process:1. Start at entry point in top layer (Layer 2: node A)2. Greedily navigate to closest node in current layer (A → D)3. Drop down to next layer, continue greedy search (D → D → E)4. Repeat until bottom layer (Layer 0)5. Perform local search in bottom layer for k neighborsKey Properties:
- Approximate: Finds ~95-99% of true nearest neighbors (configurable)
- Fast: O(log N) query time vs O(N) for brute force
- Scalable: Handles millions to billions of vectors
- Memory-efficient: ~50-100 bytes per vector
Creating HNSW Indexes
Basic Index Creation
-- Create HNSW index on vector columnCREATE INDEX embedding_idx ON documents USING hnsw (embedding);With Custom Table
-- Full exampleCREATE TABLE articles ( id INT PRIMARY KEY, title TEXT, content TEXT, embedding VECTOR(768));
-- Create HNSW indexCREATE INDEX articles_embedding_idx ON articles USING hnsw (embedding);Multiple Indexes
-- Table with multiple vector columnsCREATE TABLE products ( product_id INT PRIMARY KEY, text_embedding VECTOR(768), image_embedding VECTOR(512));
-- Create separate indexes for each vector columnCREATE INDEX products_text_idx ON products USING hnsw (text_embedding);CREATE INDEX products_image_idx ON products USING hnsw (image_embedding);Index Parameters
HNSW indexes have two critical parameters that control the accuracy/speed tradeoff:
Parameter: m (Max Connections)
Number of bidirectional links per node in the graph.
-- Default: m = 16CREATE INDEX idx ON docs USING hnsw (embedding);
-- Custom m parameter (if supported in future versions)-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (m = 32);Rules of Thumb:
- m = 16: Default, good balance (recommended)
- m = 8: Faster build, less memory, lower accuracy
- m = 32: Slower build, more memory, higher accuracy
- m = 64: Very accurate but memory-intensive
Memory Impact:
- Memory per vector ≈ m × 8 bytes (for pointers/IDs)
- m=16: ~128 bytes per vector
- m=32: ~256 bytes per vector
Parameter: ef_construction (Construction Search Depth)
Size of dynamic candidate list during index construction.
-- Default: ef_construction = 200CREATE INDEX idx ON docs USING hnsw (embedding);
-- Custom ef_construction (if supported in future versions)-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (ef_construction = 400);Rules of Thumb:
- ef_construction = 100: Fast build, lower quality graph
- ef_construction = 200: Default, good balance (recommended)
- ef_construction = 400: Slower build, higher quality graph
- ef_construction = 800: Very slow but maximum quality
Build Time Impact:
- Higher ef_construction = slower index builds (linear increase)
- But better recall at query time
Recommendations by Dataset Size
| Dataset Size | m | ef_construction | Expected Recall@10 |
|---|---|---|---|
| <10K vectors | 16 | 200 | >99% |
| 10K-100K | 16 | 200 | >98% |
| 100K-1M | 16 | 200 | >95% |
| 1M-10M | 32 | 400 | >95% |
| >10M | 32-64 | 400-800 | >93% |
When to Create Indexes
Strategy 1: Index Before Insertion (Empty Table)
-- Create index first, then insert dataCREATE TABLE docs (id INT, vec VECTOR(384));CREATE INDEX docs_idx ON docs USING hnsw (vec);
-- Insert data (index updates automatically)INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');Pros: Incremental index updates Cons: Slower individual inserts
Strategy 2: Bulk Load Then Index
-- Insert data firstCREATE TABLE docs (id INT, vec VECTOR(384));
-- Bulk insert (fast, no index overhead)INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');-- ... (thousands or millions of rows)
-- Create index after all data loadedCREATE INDEX docs_idx ON docs USING hnsw (vec);Pros: Faster bulk loading Cons: Index build at end can be slow for large datasets
Recommendation: Use Strategy 2 for initial bulk loads, Strategy 1 for incremental updates.
Index Build Time Estimates
| Dataset Size | Dimension | Expected Build Time |
|---|---|---|
| 1,000 vectors | 384 | <1 second |
| 10,000 vectors | 384 | 1-3 seconds |
| 100,000 vectors | 384 | 10-30 seconds |
| 1,000,000 vectors | 384 | 2-5 minutes |
| 10,000,000 vectors | 768 | 30-60 minutes |
Times on commodity hardware (4-core CPU, 16GB RAM)
Verifying Index Creation
-- Check indexes on a table\d documents
-- Expected output:-- Indexes:-- "documents_embedding_idx" hnsw (embedding)6. Similarity Search
KNN Queries with <-> Operator (L2 Distance)
The <-> operator performs L2 (Euclidean) distance calculation.
Basic KNN Query
-- Find 5 nearest neighbors using L2 distanceSELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]' AS distanceFROM documentsORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'LIMIT 5;Output:
id | title | distance----|---------------------|----------42 | "Machine Learning" | 0.23417 | "Neural Networks" | 0.45689 | "Deep Learning" | 0.67823 | "AI Algorithms" | 0.78956 | "Data Science" | 0.901With Parameterized Queries
# Python example with parameter bindingquery_vector = model.encode("machine learning tutorial")query_str = "[" + ",".join(map(str, query_vector)) + "]"
results = db.execute(""" SELECT id, title, embedding <-> ? AS distance FROM documents ORDER BY embedding <-> ? LIMIT 10""", (query_str, query_str))Practical Example: Document Search
-- SetupCREATE TABLE articles ( id INT PRIMARY KEY, title TEXT, author TEXT, content TEXT, embedding VECTOR(384));
CREATE INDEX articles_idx ON articles USING hnsw (embedding);
-- Insert sample data (embeddings pre-generated)INSERT INTO articles VALUES (1, 'Intro to Python', 'Alice', '...', '[0.12, -0.34, ...]'), (2, 'Advanced ML', 'Bob', '...', '[0.45, 0.67, ...]'), (3, 'Database Design', 'Carol', '...', '[-0.23, 0.56, ...]');
-- Search: Find articles similar to "python programming"-- (query_embedding generated from "python programming")SELECT id, title, author, embedding <-> '[0.15, -0.30, ...]' AS distanceFROM articlesORDER BY embedding <-> '[0.15, -0.30, ...]'LIMIT 5;Cosine Similarity with <=> Operator
The <=> operator calculates cosine distance (1 - cosine similarity).
Basic Cosine Search
-- Find 10 most similar documents using cosine distanceSELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS cosine_distanceFROM documentsORDER BY embedding <=> '[0.1, 0.2, ...]'LIMIT 10;Why Cosine for Text?
- Ignores vector magnitude (length), focuses on direction
- Most text embedding models produce normalized vectors
- Better semantic similarity for text than L2 distance
Converting Distance to Similarity Score
-- Cosine similarity = 1 - cosine distanceSELECT id, title, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity_scoreFROM documentsORDER BY embedding <=> '[0.1, 0.2, ...]'LIMIT 10;Output:
id | title | similarity_score----|---------------------|------------------42 | "Machine Learning" | 0.987 (very similar)17 | "Neural Networks" | 0.92389 | "Deep Learning" | 0.87623 | "AI Algorithms" | 0.83456 | "Data Science" | 0.801Practical Example: Semantic Q&A
-- FAQ database with questions and answersCREATE TABLE faqs ( faq_id INT PRIMARY KEY, question TEXT, answer TEXT, question_embedding VECTOR(384));
CREATE INDEX faqs_idx ON faqs USING hnsw (question_embedding);
-- Find most relevant FAQ for user query-- Query: "How do I reset my password?"-- (query_embedding pre-generated)
SELECT question, answer, (1 - (question_embedding <=> '[0.23, -0.45, ...]')) AS relevanceFROM faqsORDER BY question_embedding <=> '[0.23, -0.45, ...]'LIMIT 3;Output:
question | answer | relevance------------------------------|----------------------------|----------"Password reset instructions" | "Go to Settings > Reset" | 0.95"Forgot password recovery" | "Click 'Forgot Password'" | 0.89"Change account credentials" | "Visit Account Settings" | 0.76Inner Product with <#> Operator
The <#> operator calculates negative inner product (negative because smaller = better).
Basic Inner Product Search
-- Find top matches using inner productSELECT id, name, embedding <#> '[1.0, 0.5, 0.3, ...]' AS scoreFROM itemsORDER BY embedding <#> '[1.0, 0.5, 0.3, ...]'LIMIT 10;When to Use Inner Product:
- Recommendation systems (user-item matching)
- Non-normalized vectors
- When magnitude matters (popularity, confidence scores)
Practical Example: Product Recommendations
-- User preference vectorsCREATE TABLE users ( user_id INT PRIMARY KEY, username TEXT, preference_vector VECTOR(128));
-- Product feature vectorsCREATE TABLE products ( product_id INT PRIMARY KEY, name TEXT, price DECIMAL(10, 2), product_vector VECTOR(128));
CREATE INDEX products_idx ON products USING hnsw (product_vector);
-- Recommend products for user 42-- (combining user preference with product features)SELECT p.product_id, p.name, p.price, p.product_vector <#> u.preference_vector AS match_scoreFROM products p, users uWHERE u.user_id = 42ORDER BY p.product_vector <#> u.preference_vectorLIMIT 20;Combining with WHERE Filters
Combine vector similarity with traditional SQL filters for powerful hybrid search.
Filter Before Vector Search
-- Find similar documents in specific categorySELECT id, title, embedding <=> '[0.1, ...]' AS distanceFROM documentsWHERE category = 'technology' -- Pre-filter AND publish_date > '2024-01-01'ORDER BY embedding <=> '[0.1, ...]'LIMIT 10;Multi-Condition Filtering
-- Complex filtering + vector searchSELECT id, title, author, price, embedding <-> '[0.2, ...]' AS distanceFROM booksWHERE category IN ('science', 'technology') AND price < 50.00 AND rating >= 4.0 AND in_stock = TRUEORDER BY embedding <-> '[0.2, ...]'LIMIT 5;Distance Threshold Filtering
-- Only return results within distance thresholdSELECT id, title, embedding <=> '[0.1, ...]' AS distanceFROM documentsWHERE (embedding <=> '[0.1, ...]') < 0.5 -- Similarity thresholdORDER BY embedding <=> '[0.1, ...]'LIMIT 100;Practical Example: Location + Semantic Search
CREATE TABLE restaurants ( restaurant_id INT PRIMARY KEY, name TEXT, cuisine TEXT, latitude FLOAT, longitude FLOAT, description_embedding VECTOR(384));
-- Find restaurants matching "cozy italian place"-- within 5 miles of user location (40.7128, -74.0060)SELECT name, cuisine, description_embedding <=> '[0.12, -0.34, ...]' AS relevance, SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69 AS milesFROM restaurantsWHERE cuisine = 'Italian' AND (SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69) < 5ORDER BY description_embedding <=> '[0.12, -0.34, ...]'LIMIT 10;Advanced Search Patterns
Hybrid Search: Keyword + Vector
-- Combine full-text search with vector similaritySELECT id, title, embedding <=> '[0.1, ...]' AS vector_score, CASE WHEN title LIKE '%machine learning%' THEN 0.5 ELSE 1.0 END AS keyword_penaltyFROM documentsWHERE content LIKE '%neural network%' -- Keyword filterORDER BY (embedding <=> '[0.1, ...]') * keyword_penaltyLIMIT 10;Multi-Vector Search
-- Search across multiple vector typesSELECT product_id, name, text_embedding <=> '[0.1, ...]' AS text_score, image_embedding <-> '[0.2, ...]' AS image_score, (text_embedding <=> '[0.1, ...]') + (image_embedding <-> '[0.2, ...]') AS combined_scoreFROM productsORDER BY combined_scoreLIMIT 10;Weighted Multi-Vector Search
-- Weight different embedding typesSELECT product_id, name, (0.7 * (text_embedding <=> '[0.1, ...]')) + (0.3 * (image_embedding <-> '[0.2, ...]')) AS weighted_scoreFROM productsORDER BY weighted_scoreLIMIT 10;7. Product Quantization
What is Product Quantization?
Product Quantization (PQ) is a compression technique that reduces vector storage by 8-384x with minimal accuracy loss (<5% typically).
How PQ Works
Original Vector (768 dimensions):[0.123, -0.456, 0.789, ..., 0.234] → 768 × 4 bytes = 3,072 bytes
Product Quantization:1. Split into M=8 sub-vectors (96 dimensions each)2. Map each sub-vector to nearest centroid (1 byte code)3. Store 8 codes instead of 768 floats
Quantized: [42, 17, 89, 123, 5, 67, 201, 34] → 8 bytes
Compression: 3,072 bytes → 8 bytes = 384x reductionKey Concepts:
- M (sub-quantizers): Number of chunks (typical: 8-64)
- K (centroids): Codebook size per chunk (typical: 256 for 1-byte codes)
- Codebook: Learned lookup table mapping codes to vectors
- ADC (Asymmetric Distance Computation): Fast approximate distance
Memory Optimization Benefits
| Vector Dim | Original Size | PQ Size (M=8) | Compression | 1M Vectors |
|---|---|---|---|---|
| 384 | 1,536 bytes | 8 bytes | 192x | 1.5 GB → 8 MB |
| 768 | 3,072 bytes | 8 bytes | 384x | 3 GB → 8 MB |
| 1536 | 6,144 bytes | 8 bytes | 768x | 6 GB → 8 MB |
When to Use PQ:
- Large datasets (>100K vectors) with memory constraints
- Edge deployments (IoT, mobile)
- Cost optimization (smaller instances)
- Can tolerate 2-5% accuracy loss
Creating PQ Indexes
Basic PQ Index
-- Create HNSW index with Product QuantizationCREATE INDEX products_idxON products(embedding)USING hnswWITH (quantization='product');With Custom Sub-Quantizers
-- Control compression ratio via sub-quantizers-- More sub-quantizers = better accuracy, less compression
-- High compression (M=8, dimension must be divisible by 8)CREATE INDEX idx_compressON docs(embedding)USING hnswWITH (quantization='product', pq_subquantizers=8);
-- Balanced (M=16, dimension must be divisible by 16)CREATE INDEX idx_balancedON docs(embedding)USING hnswWITH (quantization='product', pq_subquantizers=16);
-- High accuracy (M=32, dimension must be divisible by 32)CREATE INDEX idx_accurateON docs(embedding)USING hnswWITH (quantization='product', pq_subquantizers=32);With Custom Centroids
-- Control codebook size (K parameter)-- More centroids = better accuracy, larger codebook
-- Standard (K=256, 1-byte codes)CREATE INDEX idxON docs(embedding)USING hnswWITH (quantization='product', pq_centroids=256);
-- Smaller codebook (K=128, faster but less accurate)CREATE INDEX idxON docs(embedding)USING hnswWITH (quantization='product', pq_centroids=128);PQ Configuration Guidelines
Choosing M (Sub-Quantizers)
| Dimension | Recommended M | Compression | Accuracy Loss |
|---|---|---|---|
| 128 | 8 | 64x | <3% |
| 256 | 8 | 128x | <3% |
| 384 | 8 | 192x | <5% |
| 768 | 8-16 | 384x-192x | <5% |
| 1536 | 16-32 | 384x-192x | <5% |
Rules:
- Dimension must be divisible by M
- Start with M=8 for most cases
- Increase M if accuracy is critical
- Lower M for maximum compression
Training Requirements
PQ requires training data to build codebooks:
-- Minimum vectors needed for training-- Rule of thumb: 10,000+ vectors for quality codebooks
-- Small dataset (<10K): PQ may not be worth it-- Medium dataset (10K-100K): PQ beneficial, use M=8-- Large dataset (>100K): PQ highly recommended, M=8-16Trade-offs
| Aspect | Without PQ | With PQ (M=8, K=256) |
|---|---|---|
| Memory | 3,072 bytes (768-dim) | 8 bytes |
| Compression | 1x | 384x |
| Query Speed | Baseline | 1.2-1.5x faster (less memory bandwidth) |
| Accuracy | 100% | 95-98% (Recall@10) |
| Index Build | Faster | Slower (codebook training) |
| Disk Storage | Larger | Much smaller |
Practical Example: Large-Scale Document Search
-- Scenario: 10 million documents, 768-dim embeddings-- Without PQ: 10M × 3KB = 30 GB RAM-- With PQ (M=8): 10M × 8 bytes = 80 MB RAM (375x reduction)
CREATE TABLE documents ( doc_id INT PRIMARY KEY, title TEXT, content TEXT, embedding VECTOR(768));
-- Create PQ-compressed HNSW indexCREATE INDEX docs_pq_idxON documents(embedding)USING hnswWITH ( quantization='product', pq_subquantizers=8, pq_centroids=256);
-- Queries work identically (compression is transparent)SELECT doc_id, title, embedding <=> '[0.1, 0.2, ...]' AS distanceFROM documentsORDER BY embedding <=> '[0.1, 0.2, ...]'LIMIT 10;Monitoring PQ Performance
-- Check index configuration and memory usageSELECT index_name, dimensions, quantization_type, memory_bytes, compression_ratioFROM pg_vector_index_stats()WHERE index_name = 'docs_pq_idx';Expected Output:
index_name | dimensions | quantization_type | memory_bytes | compression_ratio--------------|------------|------------------|--------------|------------------docs_pq_idx | 768 | product | 83886080 | 384.08. Real-World Examples
Example 1: Semantic Document Search
Build a semantic search engine for a knowledge base with 100K articles.
Schema Design
CREATE TABLE knowledge_base ( article_id INT PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL, category TEXT, author TEXT, publish_date DATE, view_count INT DEFAULT 0,
-- 384-dim embedding from all-MiniLM-L6-v2 title_embedding VECTOR(384), content_embedding VECTOR(768) -- all-mpnet-base-v2 for content);
-- Create indexesCREATE INDEX kb_title_idx ON knowledge_base USING hnsw (title_embedding);CREATE INDEX kb_content_idx ON knowledge_base USING hnsw (content_embedding);
-- Traditional indexes for filteringCREATE INDEX kb_category_idx ON knowledge_base(category);CREATE INDEX kb_date_idx ON knowledge_base(publish_date);Indexing Pipeline (Python)
from sentence_transformers import SentenceTransformerimport heliosdb
# Load modelstitle_model = SentenceTransformer('all-MiniLM-L6-v2')content_model = SentenceTransformer('all-mpnet-base-v2')
# Connect to databasedb = heliosdb.connect("knowledge.db")
# Sample articlesarticles = [ { "id": 1, "title": "Introduction to Machine Learning", "content": "Machine learning is a subset of artificial intelligence...", "category": "AI", "author": "Alice Smith", "publish_date": "2024-01-15" }, # ... more articles]
# Generate embeddings in batchesbatch_size = 32for i in range(0, len(articles), batch_size): batch = articles[i:i+batch_size]
# Generate embeddings titles = [a["title"] for a in batch] contents = [a["content"][:512] for a in batch] # Truncate long content
title_embeddings = title_model.encode(titles) content_embeddings = content_model.encode(contents)
# Insert into database for article, title_emb, content_emb in zip(batch, title_embeddings, content_embeddings): title_str = "[" + ",".join(map(str, title_emb)) + "]" content_str = "[" + ",".join(map(str, content_emb)) + "]"
db.execute(""" INSERT INTO knowledge_base (article_id, title, content, category, author, publish_date, title_embedding, content_embedding) VALUES (?, ?, ?, ?, ?, ?, ?, ?) """, ( article["id"], article["title"], article["content"], article["category"], article["author"], article["publish_date"], title_str, content_str ))
db.commit() print(f"Indexed {min(i+batch_size, len(articles))} / {len(articles)} articles")Search Implementation
def search_knowledge_base(query, category=None, top_k=10): """ Semantic search with optional category filter. """ # Generate query embedding query_embedding = content_model.encode(query) query_str = "[" + ",".join(map(str, query_embedding)) + "]"
# Build SQL query sql = """ SELECT article_id, title, category, author, publish_date, content_embedding <=> ? AS relevance_score FROM knowledge_base """
params = [query_str, query_str]
# Add category filter if specified if category: sql += " WHERE category = ?" params.append(category)
sql += """ ORDER BY content_embedding <=> ? LIMIT ? """ params.extend([query_str, top_k])
# Execute query results = db.execute(sql, params).fetchall()
return [ { "article_id": r[0], "title": r[1], "category": r[2], "author": r[3], "publish_date": r[4], "relevance": 1 - r[5] # Convert distance to similarity } for r in results ]
# Example usageresults = search_knowledge_base( query="How does neural network training work?", category="AI", top_k=5)
for r in results: print(f"{r['title']} (relevance: {r['relevance']:.3f})")Output:
Introduction to Neural Networks (relevance: 0.923)Backpropagation Algorithm Explained (relevance: 0.887)Deep Learning Training Techniques (relevance: 0.845)Gradient Descent Optimization (relevance: 0.812)Neural Network Architectures (relevance: 0.798)Example 2: Product Recommendations
Build a recommendation engine for e-commerce using collaborative filtering.
Schema Design
-- User preference vectors (learned from purchase history)CREATE TABLE user_preferences ( user_id INT PRIMARY KEY, username TEXT, preference_vector VECTOR(128), last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
-- Product feature vectorsCREATE TABLE products ( product_id INT PRIMARY KEY, name TEXT NOT NULL, description TEXT, category TEXT, price DECIMAL(10, 2), in_stock BOOLEAN DEFAULT TRUE, popularity_score FLOAT DEFAULT 0.0,
-- Combined feature vector (category + attributes + user preferences) feature_vector VECTOR(128));
-- Create HNSW index for fast recommendationsCREATE INDEX products_feature_idxON products(feature_vector)USING hnswWITH (quantization='product', pq_subquantizers=8);
-- Purchase history for trainingCREATE TABLE purchases ( purchase_id INT PRIMARY KEY, user_id INT, product_id INT, purchase_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, rating INT, -- 1-5 stars FOREIGN KEY (user_id) REFERENCES user_preferences(user_id), FOREIGN KEY (product_id) REFERENCES products(product_id));Recommendation Query
-- Get personalized recommendations for user 42-- Combine vector similarity with business rules
SELECT p.product_id, p.name, p.price, p.category, p.popularity_score,
-- Vector similarity score p.feature_vector <#> u.preference_vector AS match_score,
-- Combined ranking score ( -- 70% vector similarity (p.feature_vector <#> u.preference_vector) * 0.7 +
-- 20% popularity (p.popularity_score * 0.2) +
-- 10% recency bonus (newly added products) CASE WHEN p.created_at > CURRENT_DATE - INTERVAL '7 days' THEN 0.1 ELSE 0.0 END ) AS final_score
FROM products p, user_preferences uWHERE u.user_id = 42 AND p.in_stock = TRUE AND p.price > 0
-- Exclude already purchased items AND p.product_id NOT IN ( SELECT product_id FROM purchases WHERE user_id = 42 )
ORDER BY final_scoreLIMIT 20;Python Implementation
def get_recommendations(user_id, max_price=None, category=None, limit=20): """ Get personalized product recommendations. """ sql = """ SELECT p.product_id, p.name, p.price, p.category, p.feature_vector <#> u.preference_vector AS score FROM products p, user_preferences u WHERE u.user_id = ? AND p.in_stock = TRUE AND p.product_id NOT IN ( SELECT product_id FROM purchases WHERE user_id = ? ) """
params = [user_id, user_id]
if max_price: sql += " AND p.price <= ?" params.append(max_price)
if category: sql += " AND p.category = ?" params.append(category)
sql += " ORDER BY score LIMIT ?" params.append(limit)
results = db.execute(sql, params).fetchall()
return [ { "product_id": r[0], "name": r[1], "price": float(r[2]), "category": r[3], "score": float(r[4]) } for r in results ]
# Example usagerecommendations = get_recommendations( user_id=42, max_price=100.00, category="Electronics", limit=10)
for rec in recommendations: print(f"{rec['name']} - ${rec['price']:.2f} (score: {rec['score']:.3f})")Example 3: RAG Pipeline for LLM
Build a Retrieval Augmented Generation system for answering questions with LLM + context.
Schema Design
-- Document chunks for retrievalCREATE TABLE document_chunks ( chunk_id INT PRIMARY KEY, document_id INT, document_title TEXT, chunk_text TEXT NOT NULL, chunk_index INT, -- Position in document
-- Embedding for semantic retrieval embedding VECTOR(1536), -- OpenAI ada-002
-- Metadata for filtering source TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
-- Create HNSW indexCREATE INDEX chunks_embedding_idxON document_chunks(embedding)USING hnsw;RAG Implementation (Python)
import openaifrom typing import List, Dict
# Initialize OpenAIopenai.api_key = "your-api-key"
def embed_query(query: str) -> List[float]: """Generate embedding for query.""" response = openai.Embedding.create( input=query, model="text-embedding-ada-002" ) return response['data'][0]['embedding']
def retrieve_context(query: str, top_k: int = 5) -> List[Dict]: """Retrieve relevant document chunks.""" # Generate query embedding query_emb = embed_query(query) query_str = "[" + ",".join(map(str, query_emb)) + "]"
# Search for similar chunks results = db.execute(""" SELECT chunk_id, document_title, chunk_text, embedding <=> ? AS relevance FROM document_chunks ORDER BY embedding <=> ? LIMIT ? """, (query_str, query_str, top_k)).fetchall()
return [ { "chunk_id": r[0], "title": r[1], "text": r[2], "relevance": 1 - r[3] } for r in results ]
def generate_answer(query: str, context_chunks: List[Dict]) -> str: """Generate answer using LLM with retrieved context.""" # Build context string context = "\n\n".join([ f"[Document: {c['title']}]\n{c['text']}" for c in context_chunks ])
# Create prompt prompt = f"""Answer the question based on the following context:
Context:{context}
Question: {query}
Answer:"""
# Generate answer response = openai.ChatCompletion.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."}, {"role": "user", "content": prompt} ], temperature=0.7, max_tokens=500 )
return response['choices'][0]['message']['content']
def rag_query(query: str, top_k: int = 5) -> Dict: """Complete RAG pipeline: retrieve + generate.""" # Retrieve relevant context context_chunks = retrieve_context(query, top_k)
# Generate answer answer = generate_answer(query, context_chunks)
return { "query": query, "answer": answer, "sources": [ {"title": c["title"], "relevance": c["relevance"]} for c in context_chunks ] }
# Example usageresult = rag_query("How does HNSW indexing work?")
print(f"Query: {result['query']}")print(f"Answer: {result['answer']}")print("\nSources:")for source in result['sources']: print(f" - {source['title']} (relevance: {source['relevance']:.3f})")9. Performance Tuning
Index Optimization
Build-Time vs Query-Time Tradeoffs
-- Fast build, lower recall (~90%)CREATE INDEX fast_idx ON docs(embedding)USING hnsw; -- Default: m=16, ef_construction=200
-- Slow build, higher recall (~98%)-- (If future version supports parameters)-- CREATE INDEX accurate_idx ON docs(embedding)-- USING hnsw WITH (m=32, ef_construction=400);When to Rebuild Indexes
Rebuild indexes when:
- Dataset grows >2x since index creation
- Significant data distribution changes
- Query accuracy degrades over time
-- Rebuild indexDROP INDEX docs_embedding_idx;CREATE INDEX docs_embedding_idx ON docs(embedding) USING hnsw;Batch Operations
Efficient Bulk Insertion
# BAD: Individual inserts (slow)for doc in documents: db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding)) db.commit() # Commit each row
# GOOD: Batch insertsdb.execute("BEGIN TRANSACTION")for doc in documents: db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))db.execute("COMMIT") # Commit once
# BETTER: Prepared statement with executemanyvalues = [(doc.id, doc.embedding_str) for doc in documents]db.executemany("INSERT INTO docs VALUES (?, ?)", values)db.commit()Optimal Batch Sizes
| Operation | Recommended Batch Size | Reason |
|---|---|---|
| Embedding generation | 32-64 | GPU memory limits |
| Database inserts | 1000-10000 | Transaction overhead |
| Vector searches | 1-10 | Parallel queries limited |
Query Optimization
Use LIMIT Always
-- BAD: No limit (retrieves all results)SELECT * FROM docsORDER BY embedding <=> '[0.1, ...]';
-- GOOD: Limit resultsSELECT * FROM docsORDER BY embedding <=> '[0.1, ...]'LIMIT 10;Pre-Filter with WHERE
-- BAD: Filter after vector searchSELECT * FROM docsORDER BY embedding <=> '[0.1, ...]'LIMIT 10-- Post-processing in application: filter by category
-- GOOD: Filter before vector searchSELECT * FROM docsWHERE category = 'technology'ORDER BY embedding <=> '[0.1, ...]'LIMIT 10;Avoid Computing Distance Twice
-- BAD: Compute distance twiceSELECT id, title, embedding <=> '[0.1, ...]' AS scoreFROM docsORDER BY embedding <=> '[0.1, ...]' -- Computed againLIMIT 10;
-- GOOD: Reference distance columnSELECT id, title, embedding <=> '[0.1, ...]' AS scoreFROM docsORDER BY score -- Use computed columnLIMIT 10;Memory Management
Monitor Memory Usage
-- Check index memory consumptionSELECT index_name, dimensions, vector_count, memory_bytes / 1024 / 1024 AS memory_mbFROM pg_vector_index_stats();Memory-Saving Strategies
-
Use Product Quantization
CREATE INDEX idx ON docs(embedding)USING hnsw WITH (quantization='product'); -
Lower dimensions
# Use smaller embedding modelmodel = SentenceTransformer('all-MiniLM-L6-v2') # 384-dim# Instead of 'all-mpnet-base-v2' (768-dim) -
Partition large tables
-- Split by date for time-series dataCREATE TABLE docs_2024_q1 (embedding VECTOR(384), ...);CREATE TABLE docs_2024_q2 (embedding VECTOR(384), ...);
Hardware Optimization
CPU Considerations
-
SIMD Support: Ensure AVX2 (x86) or NEON (ARM) enabled
Terminal window # Check CPU featureslscpu | grep -i avx2 # x86_64lscpu | grep -i neon # ARM -
Core Count: More cores = faster parallel searches
- 4 cores: Good for small datasets
- 8+ cores: Better for concurrent queries
Memory Considerations
| Dataset Size | Minimum RAM | Recommended RAM |
|---|---|---|
| 10K vectors | 512 MB | 1 GB |
| 100K vectors | 2 GB | 4 GB |
| 1M vectors | 8 GB | 16 GB |
| 10M vectors | 32 GB | 64 GB |
Storage Considerations
-
SSD vs HDD: SSD strongly recommended
- SSD: <10ms latency for index loads
- HDD: 100-1000ms latency (100x slower)
-
Disk Space:
- Without PQ: ~100 bytes/vector
- With PQ: ~20-50 bytes/vector
10. Monitoring
pg_vector_index_stats Function
Query index statistics and health metrics.
Basic Usage
-- Get all vector index statsSELECT * FROM pg_vector_index_stats();Output Columns:
index_name: Name of the indextable_name: Table the index belongs todimensions: Vector dimensionalityvector_count: Number of vectors indexedmemory_bytes: Memory consumed by indexquantization_type: Compression type (none, product)distance_metric: Distance function used
Example Output
index_name | table_name | dimensions | vector_count | memory_bytes | quantization_type | distance_metric--------------------|------------|------------|--------------|--------------|-------------------|----------------docs_embedding_idx | documents | 384 | 50000 | 8388608 | none | cosineproducts_vec_idx | products | 768 | 100000 | 8388608 | product | l2Monitoring Queries
Index Size by Table
SELECT table_name, index_name, dimensions, vector_count, memory_bytes / 1024 / 1024 AS memory_mb, (memory_bytes / vector_count) AS bytes_per_vectorFROM pg_vector_index_stats()ORDER BY memory_bytes DESC;Compression Effectiveness
SELECT index_name, quantization_type, dimensions, memory_bytes / 1024 / 1024 AS memory_mb,
-- Calculate compression ratio CASE WHEN quantization_type = 'product' THEN (dimensions * 4.0) / 8.0 -- 4 bytes per float -> 8 bytes PQ ELSE 1.0 END AS compression_ratioFROM pg_vector_index_stats();Identify Large Indexes
-- Find indexes using >1 GBSELECT index_name, table_name, memory_bytes / 1024 / 1024 / 1024 AS memory_gbFROM pg_vector_index_stats()WHERE memory_bytes > 1073741824 -- 1 GBORDER BY memory_bytes DESC;Performance Metrics
Query Latency Tracking
import time
def measure_query_latency(query_embedding, top_k=10, iterations=100): """Measure average query latency.""" query_str = "[" + ",".join(map(str, query_embedding)) + "]"
latencies = [] for _ in range(iterations): start = time.perf_counter()
results = db.execute(""" SELECT id, embedding <-> ? AS dist FROM documents ORDER BY embedding <-> ? LIMIT ? """, (query_str, query_str, top_k)).fetchall()
end = time.perf_counter() latencies.append((end - start) * 1000) # Convert to ms
avg_latency = sum(latencies) / len(latencies) p50 = sorted(latencies)[len(latencies) // 2] p99 = sorted(latencies)[int(len(latencies) * 0.99)]
return { "avg_ms": avg_latency, "p50_ms": p50, "p99_ms": p99, "min_ms": min(latencies), "max_ms": max(latencies) }
# Example usagemetrics = measure_query_latency(query_embedding, top_k=10)print(f"Average latency: {metrics['avg_ms']:.2f} ms")print(f"P50 latency: {metrics['p50_ms']:.2f} ms")print(f"P99 latency: {metrics['p99_ms']:.2f} ms")Throughput Measurement
import concurrent.futuresimport time
def measure_throughput(query_embeddings, top_k=10, workers=4): """Measure queries per second with parallel execution."""
def run_query(query_emb): query_str = "[" + ",".join(map(str, query_emb)) + "]" results = db.execute(""" SELECT id FROM documents ORDER BY embedding <-> ? LIMIT ? """, (query_str, top_k)).fetchall() return len(results)
start = time.perf_counter()
with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor: results = list(executor.map(run_query, query_embeddings))
end = time.perf_counter() elapsed = end - start qps = len(query_embeddings) / elapsed
return { "total_queries": len(query_embeddings), "elapsed_seconds": elapsed, "queries_per_second": qps, "workers": workers }
# Example usagethroughput = measure_throughput(query_embeddings, workers=8)print(f"Throughput: {throughput['queries_per_second']:.1f} QPS")Health Checks
Index Integrity Check
-- Verify index exists and has expected vectorsSELECT index_name, vector_count, CASE WHEN vector_count > 0 THEN 'OK' ELSE 'EMPTY' END AS statusFROM pg_vector_index_stats();Dimension Mismatch Detection
-- Check for dimension inconsistencies-- (This would be a custom query checking actual data)SELECT 'documents' AS table_name, COUNT(*) AS total_rows, COUNT(embedding) AS non_null_embeddingsFROM documents;Alerting Thresholds
Recommended alert thresholds:
| Metric | Warning | Critical |
|---|---|---|
| Query latency (p99) | >50ms | >200ms |
| Index memory usage | >80% available | >95% available |
| Query error rate | >1% | >5% |
| QPS throughput | <100 QPS | <10 QPS |
Summary
This tutorial covered:
- Introduction: What vector search is and why it matters
- Vector Basics: Embeddings, dimensions, and distance metrics
- Creating Tables: VECTOR type and schema design
- Generating Embeddings: OpenAI, sentence-transformers, and batch processing
- HNSW Indexing: Graph-based approximate search with configurable parameters
- Similarity Search: KNN queries with L2, cosine, and inner product operators
- Product Quantization: 8-384x memory compression for large-scale deployments
- Real-World Examples: Document search, recommendations, and RAG pipelines
- Performance Tuning: Index optimization, batching, and hardware considerations
- Monitoring: pg_vector_index_stats, latency tracking, and health checks
Next Steps
- Try the examples: Implement a semantic search system with your own data
- Experiment with parameters: Test different dimensions and distance metrics
- Optimize for your use case: Profile queries and tune indexes
- Explore advanced features: Multi-modal search, hybrid ranking, custom embeddings
Additional Resources
Troubleshooting
Common Issues:
-
Dimension mismatch errors
- Ensure embedding dimension matches VECTOR(n) declaration
- Verify all embeddings from same model
-
Slow queries
- Add HNSW index if missing
- Use LIMIT to cap results
- Consider Product Quantization for large datasets
-
Memory issues
- Enable Product Quantization (384x compression)
- Use smaller embedding dimensions
- Partition large tables
-
Low accuracy
- Increase ef_construction (if configurable)
- Use higher m parameter (more connections)
- Disable PQ or increase sub-quantizers
Document Version: 1.0 Last Updated: 2025-12-01 Feedback: Please report issues or suggestions via GitHub issues