Skip to content

Vector Storage TOAST Performance Analysis

Vector Storage TOAST Performance Analysis

Executive Summary

This analysis examines the performance trade-offs of PostgreSQL’s TOAST (The Oversized-Attribute Storage Technique) for vector storage in HeliosDB, comparing in-line vs out-of-line storage across different vector dimensionalities and workload patterns.

1. TOAST Architecture for Vector Data

1.1 Storage Threshold Mechanics

TOAST Decision Logic:
┌────────────────────────────────────────────┐
│ Row Insertion/Update │
│ Calculate: row_size = Σ(all column sizes) │
└────────────────┬───────────────────────────┘
┌─────────────────────┐
│ row_size > THRESHOLD? │
└─────────┬─────────────┘
/ \
YES / \ NO
↓ ↓
┌────────────────┐ ┌────────────────────┐
│ Out-of-line: │ │ In-line: │
│ Store in TOAST │ │ Store in main page │
│ table, put │ │ directly │
│ 18-byte ptr │ │ │
│ in main row │ │ │
└────────────────┘ └────────────────────┘
Default TOAST_TUPLE_THRESHOLD = 2KB (2048 bytes)
Page size = 8KB (8192 bytes)

1.2 Vector Size Calculation by Dimensionality

Storage Formula:

Vector size (bytes) = dimensions × bytes_per_element + metadata
For VECTOR(n) with float32 (4 bytes per element):
- Metadata overhead: ~24 bytes (vector header, length, flags)
- Element data: n × 4 bytes
- Total: (n × 4) + 24 bytes

Dimensionality Breakpoints:

Model/Use CaseDimensionsRaw SizeTOAST Status (2KB threshold)
Word2Vec, GloVe100-300424-1,224 bytesIn-line
BERT-base7683,096 bytesOut-of-line
OpenAI text-embedding-3-small15366,168 bytesOut-of-line
OpenAI text-embedding-3-large307212,312 bytesOut-of-line
Vision models (ResNet, CLIP)20488,216 bytesOut-of-line
Multimodal embeddings4096+16,408+ bytesOut-of-line

Key Insight: Most modern ML embeddings (768+ dimensions) exceed 2KB threshold and default to out-of-line storage.

2. In-line vs Out-of-line Performance Analysis

2.1 Storage Layout Comparison

In-line Storage (VECTOR(256)):

Main Table Page (8KB):
┌────────────────────────────────────────────────────────┐
│ Page Header (24 bytes) │
├────────────────────────────────────────────────────────┤
│ Row 1: [doc_id][text][metadata][VECTOR_DATA_1048_BYTES]│
├────────────────────────────────────────────────────────┤
│ Row 2: [doc_id][text][metadata][VECTOR_DATA_1048_BYTES]│
├────────────────────────────────────────────────────────┤
│ Row 3: [doc_id][text][metadata][VECTOR_DATA_1048_BYTES]│
├────────────────────────────────────────────────────────┤
│ Row 4: ... │
└────────────────────────────────────────────────────────┘
Rows per page: ~4-6 (depends on other columns)

Out-of-line Storage (VECTOR(1536)):

Main Table Page (8KB):
┌────────────────────────────────────────────────────────┐
│ Page Header (24 bytes) │
├────────────────────────────────────────────────────────┤
│ Row 1: [doc_id][text][metadata][18-byte TOAST pointer]│
├────────────────────────────────────────────────────────┤
│ Row 2: [doc_id][text][metadata][18-byte TOAST pointer]│
├────────────────────────────────────────────────────────┤
│ ... │
│ Row 50+ │
└────────────────────────────────────────────────────────┘
TOAST Table Page (8KB):
┌────────────────────────────────────────────────────────┐
│ TOAST Chunk 1: [chunk_id][chunk_seq][6144 bytes data] │
├────────────────────────────────────────────────────────┤
│ TOAST Chunk 2: [for same vector if >8KB] │
└────────────────────────────────────────────────────────┘
Rows per main page: 50+ (much higher density)

2.2 Access Pattern Performance

Workload:

-- Find 10 nearest neighbors to query vector
SELECT doc_id, embedding <-> :query_vector AS distance
FROM documents
ORDER BY embedding <-> :query_vector
LIMIT 10;

HNSW Index Scan Process:

In-line Storage (VECTOR(256) - 1,048 bytes):

HNSW traversal evaluates ~500 candidates for top-10 results
For each candidate:
1. Index provides doc_id
2. Fetch main table page (1 I/O per ~5 rows)
3. Vector data is in same page
4. Compute distance
Total I/O: ~100 pages (500 rows ÷ 5 rows/page)
Latency: 100 × 0.1ms (cache miss) = 10ms

Out-of-line Storage (VECTOR(1536) - 6,168 bytes):

HNSW traversal evaluates ~500 candidates
For each candidate:
1. Index provides doc_id
2. Fetch main table page (1 I/O per ~50 rows)
3. Read TOAST pointer (18 bytes)
4. Fetch TOAST table page (1 I/O per vector)
5. Vector data retrieved
6. Compute distance
Total I/O: 10 main pages + 500 TOAST pages = 510 pages
Latency: 510 × 0.1ms = 51ms
**5x slower due to double I/O overhead**

Performance Summary:

Vector DimsStorageI/O per VectorCandidates EvalTotal I/OLatencyRelative
256In-line0.2 pages500100 pages10ms1.0x
768Out-of-line2.0 pages5001000 pages100ms10x
1536Out-of-line2.0 pages5001000 pages100ms10x
3072Out-of-line2.0 pages5001000 pages100ms10x

Critical Finding: Out-of-line storage creates 5-10x performance penalty for vector search workloads, regardless of dimensionality beyond the threshold.

Scenario 2: Hybrid Query (Vector + Scalar Filter)

Workload:

SELECT doc_id, title, embedding <-> :query_vector AS distance
FROM documents
WHERE category = 'technology'
AND created_date > '2025-01-01'
ORDER BY embedding <-> :query_vector
LIMIT 10;

In-line Storage:

Execution plan:
1. Use B-tree indexes to build bitmap of matching rows: 10,000 rows
2. HNSW filtered search evaluates ~1,000 candidates (higher due to filtering)
3. Each candidate: 1 I/O per ~5 rows = 200 pages
4. Return top-10
Total I/O: 200 pages
Latency: 20ms

Out-of-line Storage:

Execution plan:
1. Bitmap index scan: 10,000 rows
2. HNSW filtered search evaluates ~1,000 candidates
3. Each candidate: 1 I/O (main) + 1 I/O (TOAST) = 2 I/O
4. Total I/O: 20 main pages + 1,000 TOAST pages = 1,020 pages
Total I/O: 1,020 pages
Latency: 102ms
**5x slower**

Observation: Hybrid queries suffer even more because scalar columns are fetched from main table, but vectors still require separate TOAST access.

Scenario 3: Bulk Vector Export

Workload:

SELECT doc_id, embedding
FROM documents
WHERE created_date = '2025-09-15'
LIMIT 100000;

In-line Storage:

Sequential scan optimization:
- Read main table pages sequentially
- All data co-located
- I/O: 100,000 rows ÷ 5 rows/page = 20,000 pages
- Latency: 2 seconds (sequential I/O @ 100MB/sec)

Out-of-line Storage:

Sequential scan with TOAST:
- Read main table: 100,000 rows ÷ 50 rows/page = 2,000 pages (fast)
- Read TOAST table: 100,000 random accesses (slow)
- Even sequential main scan results in random TOAST access pattern
- TOAST chunks not guaranteed to be sequential
- I/O: 2,000 + 100,000 = 102,000 pages
- Latency: 10.2 seconds (mostly random I/O)
**5x slower, worse with random TOAST layout**

2.3 Write Performance Impact

Insert Performance:

Vector DimsStorageWrite LatencyExplanation
256In-line0.8msSingle LSM write (commit log + memtable)
768Out-of-line1.2msMain table write + TOAST table write
1536Out-of-line1.3msSame as 768 (TOAST overhead fixed)

Analysis:

  • Out-of-line adds ~50% write latency
  • Requires 2 separate writes (main + TOAST)
  • Marginal impact compared to read penalty

Update Performance:

UPDATE documents SET embedding = :new_vector WHERE doc_id = 123;

In-line:

1. Locate row in LSM (0.2ms)
2. Write tombstone + new version (0.6ms)
3. Total: 0.8ms

Out-of-line:

1. Locate row in LSM (0.2ms)
2. Write new TOAST chunk (0.4ms)
3. Write new main row with updated TOAST pointer (0.6ms)
4. Old TOAST chunk becomes garbage (cleaned up later)
Total: 1.2ms
TOAST garbage accumulation: ~50% space overhead until VACUUM

Recommendation: Out-of-line updates create more garbage, requiring more frequent VACUUM operations.

3. Storage Strategy Recommendations

Configuration:

CREATE TABLE documents (
doc_id BIGINT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
embedding VECTOR(1536)
);
-- Force in-line storage
ALTER TABLE documents ALTER COLUMN embedding SET STORAGE PLAIN;

Trade-offs:

Advantages:

  • 5-10x faster vector similarity search
  • No double I/O overhead
  • Better cache locality
  • Simpler access pattern

Disadvantages:

  • Lower row density (fewer rows per page)
  • May fail if row exceeds page size (8KB limit)
  • Higher memory usage for cache

Dimensionality Limits:

Maximum safe dimensions for PLAIN storage:
Page size: 8KB = 8,192 bytes
Page overhead: ~200 bytes
Space for other columns: ~500 bytes (estimate)
Available for vector: 7,492 bytes
Max dimensions (float32): 7,492 ÷ 4 ≈ 1,873 dimensions
Practical limits considering other columns:
- With minimal other columns: 1,536 dimensions ✓ (6,168 bytes)
- With moderate metadata: 1,024 dimensions ✓ (4,120 bytes)
- With TEXT columns: 768 dimensions ✓ (3,096 bytes)
Rule of thumb: PLAIN storage safe up to 1,536 dimensions for typical schemas

Configuration:

CREATE TABLE images (
image_id BIGINT PRIMARY KEY,
filename VARCHAR(500),
upload_date TIMESTAMP,
tags VARCHAR[],
embedding VECTOR(4096) -- 16KB, too large for PLAIN
);
-- Force out-of-line storage
ALTER TABLE images ALTER COLUMN embedding SET STORAGE EXTERNAL;

Use Case:

  • Vectors too large for in-line (>1,536 dims)
  • Frequent scalar-only queries (SELECT filename, tags WHERE…)
  • Occasional vector search

Advantages:

  • High row density for scalar queries
  • No page size limit errors
  • Better for mixed workloads

Disadvantages:

  • 5-10x slower vector search
  • Double I/O overhead

Configuration:

-- Use default MAIN storage (PostgreSQL default)
-- Automatically chooses in-line or out-of-line based on row size
CREATE TABLE documents (
doc_id BIGINT PRIMARY KEY,
content TEXT, -- Variable size
embedding VECTOR(1536)
);
-- No ALTER STORAGE needed, uses MAIN by default

Behavior:

  • Vector stored in-line if total row size < 2KB
  • Automatically TOASTed if row exceeds threshold
  • Adapts to data characteristics

Advantages:

  • Flexible, adapts to actual data
  • No tuning required
  • Works for mixed row sizes

Disadvantages:

  • Unpredictable performance (some rows fast, some slow)
  • Difficult to optimize

Recommendation: Use MAIN for prototyping, switch to PLAIN or EXTERNAL once workload is understood.

4. Quantitative Performance Matrix

4.1 Vector Search Performance by Configuration

DimensionsStorage StrategyIndex Scan TimeTop-10 LatencyThroughput (QPS)
256PLAIN (in-line)8ms10ms100
256MAIN (in-line)8ms10ms100
256EXTERNAL (out-of-line)40ms50ms20
768PLAIN (in-line)15ms18ms55
768MAIN (out-of-line)75ms90ms11
768EXTERNAL (out-of-line)75ms90ms11
1536PLAIN (in-line)25ms30ms33
1536MAIN (out-of-line)125ms150ms6.6
1536EXTERNAL (out-of-line)125ms150ms6.6
3072MAIN (out-of-line)125ms150ms6.6
4096EXTERNAL (out-of-line)125ms150ms6.6

Key Insights:

  1. In-line (PLAIN) provides 5x better throughput for vector search
  2. Dimensionality matters less than storage strategy for out-of-line
  3. 1536 dims is sweet spot for PLAIN storage with typical schemas

4.2 Memory and Storage Trade-offs

DimensionsStorageVector SizeRows/Page (est)Cache Efficiency
256In-line1,048 bytes5High (vector with row)
768In-line3,096 bytes2Medium (large rows)
1536In-line6,168 bytes1Low (1 row/page)
1536Out-of-line18 bytes (ptr)60High for scalars, Low for vectors

Cache Amplification Analysis:

For 1 GB cache:

In-line VECTOR(1536):

Row size: 6,500 bytes (including metadata)
Rows cached: 1 GB ÷ 6,500 = 153,846 rows
Vector coverage: 153,846 vectors ✓

Out-of-line VECTOR(1536):

Main table cache (500 MB): 500 MB ÷ 300 bytes/row = 1,666,666 rows
TOAST table cache (500 MB): 500 MB ÷ 6,168 bytes = 81,037 vectors
Effective vector coverage: min(1.6M, 81K) = 81,037 vectors ✗
**2x worse cache efficiency** for vector workload

Conclusion: Out-of-line storage wastes cache on main table rows without vectors.

5. Workload-Specific Recommendations

5.1 Vector Search Dominant (>80% vector queries)

Profile:

  • Semantic search applications
  • Recommendation engines
  • Image similarity search

Optimal Configuration:

CREATE TABLE embeddings (
id BIGINT PRIMARY KEY,
metadata JSONB, -- Keep minimal, <500 bytes
vector VECTOR(1536)
) WITH (fillfactor = 90);
ALTER TABLE embeddings ALTER COLUMN vector SET STORAGE PLAIN;
CREATE INDEX ON embeddings USING hnsw (vector) WITH (m = 16, ef_construction = 64);

Expected Performance:

  • Top-10 search: 20-40ms
  • Throughput: 25-50 QPS per node
  • Cache hit rate: 70-80%

5.2 Hybrid Workload (50% scalar, 50% vector)

Profile:

  • Content management with search
  • E-commerce with visual search
  • Document retrieval systems

Optimal Configuration:

CREATE TABLE products (
product_id BIGINT PRIMARY KEY,
name VARCHAR(500),
description TEXT,
category VARCHAR(100),
price DECIMAL(10,2),
image_embedding VECTOR(768) -- Smaller dims to fit in-line
);
-- Force in-line if possible
ALTER TABLE products ALTER COLUMN image_embedding SET STORAGE PLAIN;
-- Indexes for both access patterns
CREATE INDEX ON products (category, price);
CREATE INDEX ON products USING hnsw (image_embedding);

Trade-off:

  • Use 768 dims (3KB) instead of 1536 dims (6KB) to ensure in-line storage
  • Quality loss: ~5-10% in embedding accuracy
  • Performance gain: 5-10x in query speed
  • Recommended: Compress embeddings with PCA/UMAP to fit

5.3 Large Vectors (>2048 dims)

Profile:

  • Multimodal embeddings
  • High-precision applications
  • Research/scientific computing

Optimal Configuration:

CREATE TABLE research_data (
id BIGINT PRIMARY KEY,
experiment_id VARCHAR(100),
timestamp TIMESTAMP,
embedding VECTOR(4096) -- 16KB, must be out-of-line
);
-- Explicit out-of-line
ALTER TABLE research_data ALTER COLUMN embedding SET STORAGE EXTERNAL;
-- Separate table for frequent scalar queries
CREATE TABLE research_metadata (
id BIGINT PRIMARY KEY REFERENCES research_data(id),
experiment_id VARCHAR(100),
timestamp TIMESTAMP,
tags VARCHAR[]
);
CREATE INDEX ON research_metadata (experiment_id, timestamp);

Strategy:

  • Accept out-of-line performance penalty
  • Normalize schema to separate hot metadata
  • Pre-filter with metadata before vector search
  • Use batch queries to amortize I/O cost

6. Advanced Optimization Techniques

6.1 Vector Quantization for Dimensionality Reduction

Product Quantization (PQ):

Original: VECTOR(1536) = 6,168 bytes
PQ-compressed: 1536 dims ÷ 8 dims/subvector × 1 byte/code = 192 bytes
Storage savings: 32x
In-line storage: ✓ Guaranteed
Performance: 95-98% recall with 10-20x speedup

Recommendation for HeliosDB:

-- Future feature: Quantized vectors
CREATE TABLE documents (
doc_id BIGINT PRIMARY KEY,
title VARCHAR(255),
embedding_full VECTOR(1536) STORAGE EXTERNAL, -- Exact, slow
embedding_pq VECTOR_PQ(1536, 192) STORAGE PLAIN -- Approximate, fast
);
-- Two-stage search:
-- 1. Fast PQ search for top-100 candidates (in-line, fast)
-- 2. Rerank with full vectors (out-of-line, but only 100 accesses)

6.2 Adaptive TOAST Threshold

Problem: Fixed 2KB threshold doesn’t optimize for vector workloads

Proposed Solution:

[storage.toast]
# Per-table TOAST threshold
threshold_bytes = 8192 # Increase to 8KB (same as page size)
# Allow larger vectors to stay in-line
# VECTOR(1536) = 6,168 bytes < 8,192 → in-line ✓

Impact:

With 8KB threshold:
- VECTOR(1536): In-line ✓ (was out-of-line)
- VECTOR(1920): In-line ✓ (new)
- VECTOR(2048): Out-of-line (too large)
Performance gain: 5-10x for 1536-dim vectors
Row density cost: Acceptable (1-2 rows per page)

Trade-off Analysis:

ThresholdMax In-line DimsRows/PageVector Search PerfScalar Query Perf
2 KB (default)38420-30Poor (out-of-line)Excellent
4 KB8968-12GoodGood
6 KB1,4083-5ExcellentFair
8 KB (page size)1,9201-2ExcellentPoor

Recommendation:

  • Vector-heavy workloads: 6-8 KB threshold
  • Balanced workloads: 4 KB threshold
  • Scalar-heavy workloads: 2 KB threshold (default)

7. Monitoring and Diagnostics

7.1 Key Metrics

TOAST Usage Statistics:

-- Check TOAST overhead per table
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as main_size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) -
pg_relation_size(schemaname||'.'||tablename)) as toast_size,
ROUND(100.0 * (pg_total_relation_size(schemaname||'.'||tablename) -
pg_relation_size(schemaname||'.'||tablename)) /
NULLIF(pg_total_relation_size(schemaname||'.'||tablename), 0), 2) as toast_pct
FROM pg_tables
WHERE tablename = 'documents';

Performance Monitoring:

Metrics to track:
- toast.fetch_count: Number of TOAST fetches per query
- toast.fetch_latency_ms: Average TOAST fetch time
- toast.hit_rate: TOAST cache hit ratio
- vector.inline_pct: Percentage of vectors stored in-line
Alerts:
if toast.fetch_count > 100 per query:
→ Consider PLAIN storage or dimensionality reduction
if toast.hit_rate < 50%:
→ Increase TOAST cache allocation

7.2 Diagnostic Queries

Check storage strategy per column:

SELECT
attname,
attstorage,
CASE attstorage
WHEN 'p' THEN 'PLAIN (no TOAST)'
WHEN 'm' THEN 'MAIN (prefer inline)'
WHEN 'e' THEN 'EXTERNAL (always out-of-line)'
WHEN 'x' THEN 'EXTENDED (compress + TOAST)'
END as storage_strategy
FROM pg_attribute
WHERE attrelid = 'documents'::regclass
AND attname = 'embedding';

Identify TOAST bloat:

SELECT
schemaname,
tablename,
pg_size_pretty(toast_bloat) as toast_bloat
FROM (
SELECT
schemaname,
tablename,
pg_total_relation_size(schemaname||'.'||tablename) -
pg_relation_size(schemaname||'.'||tablename) -
(SELECT pg_total_relation_size(reltoastrelid)
FROM pg_class
WHERE relname = tablename) as toast_bloat
FROM pg_tables
) t
WHERE toast_bloat > 1073741824 -- >1GB bloat
ORDER BY toast_bloat DESC;

8. Conclusion

Key Findings:

  1. Storage Strategy Impact:

    • In-line (PLAIN): 5-10x faster vector search, 1-2 rows/page
    • Out-of-line (EXTERNAL): Slower search, 50+ rows/page
    • Default (MAIN): Unpredictable, depends on row size
  2. Dimensionality Breakpoints:

    • <384 dims (1,560 bytes): Always in-line with 2KB threshold ✓
    • 384-1,536 dims: Depends on other columns, use PLAIN to force
    • 1,536 dims: Out-of-line required, accept performance penalty

  3. Performance Impact:

    • Vector search: 5-10x penalty for out-of-line storage
    • Hybrid queries: Similar penalty due to TOAST indirection
    • Bulk operations: 5x penalty, worse with fragmentation
  4. Resource Trade-offs:

    • In-line: Lower row density, better vector performance
    • Out-of-line: Higher row density, better scalar-only performance
    • Cache: In-line 2x more efficient for vector workloads

Recommended Configurations:

[storage.vector]
# Performance-critical vector search
default_storage = "PLAIN"
max_inline_dimensions = 1536
toast_threshold_kb = 6 # Relaxed from 2KB
# Fallback for large vectors
large_vector_threshold_dims = 2048
large_vector_storage = "EXTERNAL"
# Monitoring
track_toast_usage = true
warn_on_excessive_toast_io = true
toast_io_threshold = 100 # Fetches per query

Decision Matrix:

WorkloadDimsOther ColumnsRecommended StorageExpected Perf
Pure vector searchAnyMinimalPLAINExcellent
Hybrid (vector + scalar)<1536ModeratePLAINGood
Hybrid (vector + scalar)>1536ModerateEXTERNALFair
Scalar-heavy with occasional vectorAnyLargeEXTERNALGood (for scalars)
Large multimodal>2048AnyEXTERNAL + separate metadata tableFair

Next Steps:

  1. Implement automatic storage strategy selection based on dimensionality
  2. Add query planner hints for TOAST-aware optimization
  3. Develop vector quantization support for dimension reduction
  4. Build monitoring dashboard for TOAST performance metrics