Vector Storage TOAST Performance Analysis

Executive Summary

This analysis examines the performance trade-offs of PostgreSQL’s TOAST (The Oversized-Attribute Storage Technique) for vector storage in HeliosDB, comparing in-line vs out-of-line storage across different vector dimensionalities and workload patterns.

1. TOAST Architecture for Vector Data

1.1 Storage Threshold Mechanics

TOAST Decision Logic:
┌────────────────────────────────────────────┐
│ Row Insertion/Update                       │
│ Calculate: row_size = Σ(all column sizes) │
└────────────────┬───────────────────────────┘
                 ↓
      ┌─────────────────────┐
      │ row_size > THRESHOLD? │
      └─────────┬─────────────┘
               / \
          YES /   \ NO
             ↓     ↓
    ┌────────────────┐  ┌────────────────────┐
    │ Out-of-line:   │  │ In-line:           │
    │ Store in TOAST │  │ Store in main page │
    │ table, put     │  │ directly           │
    │ 18-byte ptr    │  │                    │
    │ in main row    │  │                    │
    └────────────────┘  └────────────────────┘

Default TOAST_TUPLE_THRESHOLD = 2KB (2048 bytes)
Page size = 8KB (8192 bytes)

1.2 Vector Size Calculation by Dimensionality

Storage Formula:

Vector size (bytes) = dimensions × bytes_per_element + metadata

For VECTOR(n) with float32 (4 bytes per element):
- Metadata overhead: ~24 bytes (vector header, length, flags)
- Element data: n × 4 bytes
- Total: (n × 4) + 24 bytes

Dimensionality Breakpoints:

Model/Use Case	Dimensions	Raw Size	TOAST Status (2KB threshold)
Word2Vec, GloVe	100-300	424-1,224 bytes	In-line
BERT-base	768	3,096 bytes	Out-of-line
OpenAI text-embedding-3-small	1536	6,168 bytes	Out-of-line
OpenAI text-embedding-3-large	3072	12,312 bytes	Out-of-line
Vision models (ResNet, CLIP)	2048	8,216 bytes	Out-of-line
Multimodal embeddings	4096+	16,408+ bytes	Out-of-line

Key Insight: Most modern ML embeddings (768+ dimensions) exceed 2KB threshold and default to out-of-line storage.

2. In-line vs Out-of-line Performance Analysis

2.1 Storage Layout Comparison

In-line Storage (VECTOR(256)):

Main Table Page (8KB):
┌────────────────────────────────────────────────────────┐
│ Page Header (24 bytes)                                 │
├────────────────────────────────────────────────────────┤
│ Row 1: [doc_id][text][metadata][VECTOR_DATA_1048_BYTES]│
├────────────────────────────────────────────────────────┤
│ Row 2: [doc_id][text][metadata][VECTOR_DATA_1048_BYTES]│
├────────────────────────────────────────────────────────┤
│ Row 3: [doc_id][text][metadata][VECTOR_DATA_1048_BYTES]│
├────────────────────────────────────────────────────────┤
│ Row 4: ...                                             │
└────────────────────────────────────────────────────────┘

Rows per page: ~4-6 (depends on other columns)

Out-of-line Storage (VECTOR(1536)):

Main Table Page (8KB):
┌────────────────────────────────────────────────────────┐
│ Page Header (24 bytes)                                 │
├────────────────────────────────────────────────────────┤
│ Row 1: [doc_id][text][metadata][18-byte TOAST pointer]│
├────────────────────────────────────────────────────────┤
│ Row 2: [doc_id][text][metadata][18-byte TOAST pointer]│
├────────────────────────────────────────────────────────┤
│ ...                                                    │
│ Row 50+                                                │
└────────────────────────────────────────────────────────┘

TOAST Table Page (8KB):
┌────────────────────────────────────────────────────────┐
│ TOAST Chunk 1: [chunk_id][chunk_seq][6144 bytes data] │
├────────────────────────────────────────────────────────┤
│ TOAST Chunk 2: [for same vector if >8KB]              │
└────────────────────────────────────────────────────────┘

Rows per main page: 50+ (much higher density)

2.2 Access Pattern Performance

Scenario 1: Vector-Only Similarity Search

Workload:

-- Find 10 nearest neighbors to query vector
SELECT doc_id, embedding <-> :query_vector AS distance
FROM documents
ORDER BY embedding <-> :query_vector
LIMIT 10;

HNSW Index Scan Process:

In-line Storage (VECTOR(256) - 1,048 bytes):

HNSW traversal evaluates ~500 candidates for top-10 results

For each candidate:
1. Index provides doc_id
2. Fetch main table page (1 I/O per ~5 rows)
3. Vector data is in same page
4. Compute distance

Total I/O: ~100 pages (500 rows ÷ 5 rows/page)
Latency: 100 × 0.1ms (cache miss) = 10ms

Out-of-line Storage (VECTOR(1536) - 6,168 bytes):

HNSW traversal evaluates ~500 candidates

For each candidate:
1. Index provides doc_id
2. Fetch main table page (1 I/O per ~50 rows)
3. Read TOAST pointer (18 bytes)
4. Fetch TOAST table page (1 I/O per vector)
5. Vector data retrieved
6. Compute distance

Total I/O: 10 main pages + 500 TOAST pages = 510 pages
Latency: 510 × 0.1ms = 51ms

**5x slower due to double I/O overhead**

Performance Summary:

Vector Dims	Storage	I/O per Vector	Candidates Eval	Total I/O	Latency	Relative
256	In-line	0.2 pages	500	100 pages	10ms	1.0x
768	Out-of-line	2.0 pages	500	1000 pages	100ms	10x
1536	Out-of-line	2.0 pages	500	1000 pages	100ms	10x
3072	Out-of-line	2.0 pages	500	1000 pages	100ms	10x

Critical Finding: Out-of-line storage creates 5-10x performance penalty for vector search workloads, regardless of dimensionality beyond the threshold.

Scenario 2: Hybrid Query (Vector + Scalar Filter)

Workload:

SELECT doc_id, title, embedding <-> :query_vector AS distance
FROM documents
WHERE category = 'technology'
  AND created_date > '2025-01-01'
ORDER BY embedding <-> :query_vector
LIMIT 10;

In-line Storage:

Execution plan:
1. Use B-tree indexes to build bitmap of matching rows: 10,000 rows
2. HNSW filtered search evaluates ~1,000 candidates (higher due to filtering)
3. Each candidate: 1 I/O per ~5 rows = 200 pages
4. Return top-10

Total I/O: 200 pages
Latency: 20ms

Out-of-line Storage:

Execution plan:
1. Bitmap index scan: 10,000 rows
2. HNSW filtered search evaluates ~1,000 candidates
3. Each candidate: 1 I/O (main) + 1 I/O (TOAST) = 2 I/O
4. Total I/O: 20 main pages + 1,000 TOAST pages = 1,020 pages

Total I/O: 1,020 pages
Latency: 102ms

**5x slower**

Observation: Hybrid queries suffer even more because scalar columns are fetched from main table, but vectors still require separate TOAST access.

Scenario 3: Bulk Vector Export

Workload:

SELECT doc_id, embedding
FROM documents
WHERE created_date = '2025-09-15'
LIMIT 100000;

In-line Storage:

Sequential scan optimization:
- Read main table pages sequentially
- All data co-located
- I/O: 100,000 rows ÷ 5 rows/page = 20,000 pages
- Latency: 2 seconds (sequential I/O @ 100MB/sec)

Out-of-line Storage:

Sequential scan with TOAST:
- Read main table: 100,000 rows ÷ 50 rows/page = 2,000 pages (fast)
- Read TOAST table: 100,000 random accesses (slow)
  - Even sequential main scan results in random TOAST access pattern
  - TOAST chunks not guaranteed to be sequential
- I/O: 2,000 + 100,000 = 102,000 pages
- Latency: 10.2 seconds (mostly random I/O)

**5x slower, worse with random TOAST layout**

2.3 Write Performance Impact

Insert Performance:

Vector Dims	Storage	Write Latency	Explanation
256	In-line	0.8ms	Single LSM write (commit log + memtable)
768	Out-of-line	1.2ms	Main table write + TOAST table write
1536	Out-of-line	1.3ms	Same as 768 (TOAST overhead fixed)

Analysis:

Out-of-line adds ~50% write latency
Requires 2 separate writes (main + TOAST)
Marginal impact compared to read penalty

Update Performance:

UPDATE documents SET embedding = :new_vector WHERE doc_id = 123;

In-line:

1. Locate row in LSM (0.2ms)
2. Write tombstone + new version (0.6ms)
3. Total: 0.8ms

Out-of-line:

1. Locate row in LSM (0.2ms)
2. Write new TOAST chunk (0.4ms)
3. Write new main row with updated TOAST pointer (0.6ms)
4. Old TOAST chunk becomes garbage (cleaned up later)
Total: 1.2ms

TOAST garbage accumulation: ~50% space overhead until VACUUM

Recommendation: Out-of-line updates create more garbage, requiring more frequent VACUUM operations.

3. Storage Strategy Recommendations

3.1 Performance-Critical Vector Search (Recommended: PLAIN)

Configuration:

CREATE TABLE documents (
    doc_id BIGINT PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    embedding VECTOR(1536)
);

-- Force in-line storage
ALTER TABLE documents ALTER COLUMN embedding SET STORAGE PLAIN;

Trade-offs:

Advantages:

5-10x faster vector similarity search
No double I/O overhead
Better cache locality
Simpler access pattern

❌ Disadvantages:

Lower row density (fewer rows per page)
May fail if row exceeds page size (8KB limit)
Higher memory usage for cache

Dimensionality Limits:

Maximum safe dimensions for PLAIN storage:

Page size: 8KB = 8,192 bytes
Page overhead: ~200 bytes
Space for other columns: ~500 bytes (estimate)
Available for vector: 7,492 bytes

Max dimensions (float32): 7,492 ÷ 4 ≈ 1,873 dimensions

Practical limits considering other columns:
- With minimal other columns: 1,536 dimensions ✓ (6,168 bytes)
- With moderate metadata: 1,024 dimensions ✓ (4,120 bytes)
- With TEXT columns: 768 dimensions ✓ (3,096 bytes)

Rule of thumb: PLAIN storage safe up to 1,536 dimensions for typical schemas

3.2 Large Vectors with Scalar Queries (Recommended: EXTERNAL)

Configuration:

CREATE TABLE images (
    image_id BIGINT PRIMARY KEY,
    filename VARCHAR(500),
    upload_date TIMESTAMP,
    tags VARCHAR[],
    embedding VECTOR(4096)  -- 16KB, too large for PLAIN
);

-- Force out-of-line storage
ALTER TABLE images ALTER COLUMN embedding SET STORAGE EXTERNAL;

Use Case:

Vectors too large for in-line (>1,536 dims)
Frequent scalar-only queries (SELECT filename, tags WHERE…)
Occasional vector search

Advantages:

High row density for scalar queries
No page size limit errors
Better for mixed workloads

❌ Disadvantages:

5-10x slower vector search
Double I/O overhead

3.3 Hybrid Strategy (Recommended: MAIN - Default)

Configuration:

-- Use default MAIN storage (PostgreSQL default)
-- Automatically chooses in-line or out-of-line based on row size
CREATE TABLE documents (
    doc_id BIGINT PRIMARY KEY,
    content TEXT,  -- Variable size
    embedding VECTOR(1536)
);
-- No ALTER STORAGE needed, uses MAIN by default

Behavior:

Vector stored in-line if total row size < 2KB
Automatically TOASTed if row exceeds threshold
Adapts to data characteristics

Advantages:

Flexible, adapts to actual data
No tuning required
Works for mixed row sizes

❌ Disadvantages:

Unpredictable performance (some rows fast, some slow)
Difficult to optimize

Recommendation: Use MAIN for prototyping, switch to PLAIN or EXTERNAL once workload is understood.

4. Quantitative Performance Matrix

4.1 Vector Search Performance by Configuration

Dimensions	Storage Strategy	Index Scan Time	Top-10 Latency	Throughput (QPS)
256	PLAIN (in-line)	8ms	10ms	100
256	MAIN (in-line)	8ms	10ms	100
256	EXTERNAL (out-of-line)	40ms	50ms	20
768	PLAIN (in-line)	15ms	18ms	55
768	MAIN (out-of-line)	75ms	90ms	11
768	EXTERNAL (out-of-line)	75ms	90ms	11
1536	PLAIN (in-line)	25ms	30ms	33
1536	MAIN (out-of-line)	125ms	150ms	6.6
1536	EXTERNAL (out-of-line)	125ms	150ms	6.6
3072	MAIN (out-of-line)	125ms	150ms	6.6
4096	EXTERNAL (out-of-line)	125ms	150ms	6.6

Key Insights:

In-line (PLAIN) provides 5x better throughput for vector search
Dimensionality matters less than storage strategy for out-of-line
1536 dims is sweet spot for PLAIN storage with typical schemas

4.2 Memory and Storage Trade-offs

Dimensions	Storage	Vector Size	Rows/Page (est)	Cache Efficiency
256	In-line	1,048 bytes	5	High (vector with row)
768	In-line	3,096 bytes	2	Medium (large rows)
1536	In-line	6,168 bytes	1	Low (1 row/page)
1536	Out-of-line	18 bytes (ptr)	60	High for scalars, Low for vectors

Cache Amplification Analysis:

For 1 GB cache:

In-line VECTOR(1536):

Row size: 6,500 bytes (including metadata)
Rows cached: 1 GB ÷ 6,500 = 153,846 rows
Vector coverage: 153,846 vectors ✓

Out-of-line VECTOR(1536):

Main table cache (500 MB): 500 MB ÷ 300 bytes/row = 1,666,666 rows
TOAST table cache (500 MB): 500 MB ÷ 6,168 bytes = 81,037 vectors

Effective vector coverage: min(1.6M, 81K) = 81,037 vectors ✗

**2x worse cache efficiency** for vector workload

Conclusion: Out-of-line storage wastes cache on main table rows without vectors.

5. Workload-Specific Recommendations

5.1 Vector Search Dominant (>80% vector queries)

Profile:

Semantic search applications
Recommendation engines
Image similarity search

Optimal Configuration:

CREATE TABLE embeddings (
    id BIGINT PRIMARY KEY,
    metadata JSONB,  -- Keep minimal, <500 bytes
    vector VECTOR(1536)
) WITH (fillfactor = 90);

ALTER TABLE embeddings ALTER COLUMN vector SET STORAGE PLAIN;

CREATE INDEX ON embeddings USING hnsw (vector) WITH (m = 16, ef_construction = 64);

Expected Performance:

Top-10 search: 20-40ms
Throughput: 25-50 QPS per node
Cache hit rate: 70-80%

5.2 Hybrid Workload (50% scalar, 50% vector)

Profile:

Content management with search
E-commerce with visual search
Document retrieval systems

Optimal Configuration:

CREATE TABLE products (
    product_id BIGINT PRIMARY KEY,
    name VARCHAR(500),
    description TEXT,
    category VARCHAR(100),
    price DECIMAL(10,2),
    image_embedding VECTOR(768)  -- Smaller dims to fit in-line
);

-- Force in-line if possible
ALTER TABLE products ALTER COLUMN image_embedding SET STORAGE PLAIN;

-- Indexes for both access patterns
CREATE INDEX ON products (category, price);
CREATE INDEX ON products USING hnsw (image_embedding);

Trade-off:

Use 768 dims (3KB) instead of 1536 dims (6KB) to ensure in-line storage
Quality loss: ~5-10% in embedding accuracy
Performance gain: 5-10x in query speed
Recommended: Compress embeddings with PCA/UMAP to fit

5.3 Large Vectors (>2048 dims)

Profile:

Multimodal embeddings
High-precision applications
Research/scientific computing

Optimal Configuration:

CREATE TABLE research_data (
    id BIGINT PRIMARY KEY,
    experiment_id VARCHAR(100),
    timestamp TIMESTAMP,
    embedding VECTOR(4096)  -- 16KB, must be out-of-line
);

-- Explicit out-of-line
ALTER TABLE research_data ALTER COLUMN embedding SET STORAGE EXTERNAL;

-- Separate table for frequent scalar queries
CREATE TABLE research_metadata (
    id BIGINT PRIMARY KEY REFERENCES research_data(id),
    experiment_id VARCHAR(100),
    timestamp TIMESTAMP,
    tags VARCHAR[]
);

CREATE INDEX ON research_metadata (experiment_id, timestamp);

Strategy:

Accept out-of-line performance penalty
Normalize schema to separate hot metadata
Pre-filter with metadata before vector search
Use batch queries to amortize I/O cost

6. Advanced Optimization Techniques

6.1 Vector Quantization for Dimensionality Reduction

Product Quantization (PQ):

Original: VECTOR(1536) = 6,168 bytes
PQ-compressed: 1536 dims ÷ 8 dims/subvector × 1 byte/code = 192 bytes

Storage savings: 32x
In-line storage: ✓ Guaranteed
Performance: 95-98% recall with 10-20x speedup

Recommendation for HeliosDB:

-- Future feature: Quantized vectors
CREATE TABLE documents (
    doc_id BIGINT PRIMARY KEY,
    title VARCHAR(255),
    embedding_full VECTOR(1536) STORAGE EXTERNAL,  -- Exact, slow
    embedding_pq VECTOR_PQ(1536, 192) STORAGE PLAIN  -- Approximate, fast
);

-- Two-stage search:
-- 1. Fast PQ search for top-100 candidates (in-line, fast)
-- 2. Rerank with full vectors (out-of-line, but only 100 accesses)

6.2 Adaptive TOAST Threshold

Problem: Fixed 2KB threshold doesn’t optimize for vector workloads

Proposed Solution:

[storage.toast]
# Per-table TOAST threshold
threshold_bytes = 8192  # Increase to 8KB (same as page size)

# Allow larger vectors to stay in-line
# VECTOR(1536) = 6,168 bytes < 8,192 → in-line ✓

Impact:

With 8KB threshold:
- VECTOR(1536): In-line ✓ (was out-of-line)
- VECTOR(1920): In-line ✓ (new)
- VECTOR(2048): Out-of-line (too large)

Performance gain: 5-10x for 1536-dim vectors
Row density cost: Acceptable (1-2 rows per page)

Trade-off Analysis:

Threshold	Max In-line Dims	Rows/Page	Vector Search Perf	Scalar Query Perf
2 KB (default)	384	20-30	Poor (out-of-line)	Excellent
4 KB	896	8-12	Good	Good
6 KB	1,408	3-5	Excellent	Fair
8 KB (page size)	1,920	1-2	Excellent	Poor

Recommendation:

Vector-heavy workloads: 6-8 KB threshold
Balanced workloads: 4 KB threshold
Scalar-heavy workloads: 2 KB threshold (default)

7. Monitoring and Diagnostics

7.1 Key Metrics

TOAST Usage Statistics:

-- Check TOAST overhead per table
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as main_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) -
                   pg_relation_size(schemaname||'.'||tablename)) as toast_size,
    ROUND(100.0 * (pg_total_relation_size(schemaname||'.'||tablename) -
                    pg_relation_size(schemaname||'.'||tablename)) /
          NULLIF(pg_total_relation_size(schemaname||'.'||tablename), 0), 2) as toast_pct
FROM pg_tables
WHERE tablename = 'documents';

Performance Monitoring:

Metrics to track:
- toast.fetch_count: Number of TOAST fetches per query
- toast.fetch_latency_ms: Average TOAST fetch time
- toast.hit_rate: TOAST cache hit ratio
- vector.inline_pct: Percentage of vectors stored in-line

Alerts:
if toast.fetch_count > 100 per query:
    → Consider PLAIN storage or dimensionality reduction

if toast.hit_rate < 50%:
    → Increase TOAST cache allocation

7.2 Diagnostic Queries

Check storage strategy per column:

SELECT
    attname,
    attstorage,
    CASE attstorage
        WHEN 'p' THEN 'PLAIN (no TOAST)'
        WHEN 'm' THEN 'MAIN (prefer inline)'
        WHEN 'e' THEN 'EXTERNAL (always out-of-line)'
        WHEN 'x' THEN 'EXTENDED (compress + TOAST)'
    END as storage_strategy
FROM pg_attribute
WHERE attrelid = 'documents'::regclass
  AND attname = 'embedding';

Identify TOAST bloat:

SELECT
    schemaname,
    tablename,
    pg_size_pretty(toast_bloat) as toast_bloat
FROM (
    SELECT
        schemaname,
        tablename,
        pg_total_relation_size(schemaname||'.'||tablename) -
        pg_relation_size(schemaname||'.'||tablename) -
        (SELECT pg_total_relation_size(reltoastrelid)
         FROM pg_class
         WHERE relname = tablename) as toast_bloat
    FROM pg_tables
) t
WHERE toast_bloat > 1073741824  -- >1GB bloat
ORDER BY toast_bloat DESC;

8. Conclusion

Key Findings:

Storage Strategy Impact:
- In-line (PLAIN): 5-10x faster vector search, 1-2 rows/page
- Out-of-line (EXTERNAL): Slower search, 50+ rows/page
- Default (MAIN): Unpredictable, depends on row size
Dimensionality Breakpoints:
- <384 dims (1,560 bytes): Always in-line with 2KB threshold ✓
- 384-1,536 dims: Depends on other columns, use PLAIN to force
- 1,536 dims: Out-of-line required, accept performance penalty
Performance Impact:
- Vector search: 5-10x penalty for out-of-line storage
- Hybrid queries: Similar penalty due to TOAST indirection
- Bulk operations: 5x penalty, worse with fragmentation
Resource Trade-offs:
- In-line: Lower row density, better vector performance
- Out-of-line: Higher row density, better scalar-only performance
- Cache: In-line 2x more efficient for vector workloads

Recommended Configurations:

[storage.vector]
# Performance-critical vector search
default_storage = "PLAIN"
max_inline_dimensions = 1536
toast_threshold_kb = 6  # Relaxed from 2KB

# Fallback for large vectors
large_vector_threshold_dims = 2048
large_vector_storage = "EXTERNAL"

# Monitoring
track_toast_usage = true
warn_on_excessive_toast_io = true
toast_io_threshold = 100  # Fetches per query

Decision Matrix:

Workload	Dims	Other Columns	Recommended Storage	Expected Perf
Pure vector search	Any	Minimal	PLAIN	Excellent
Hybrid (vector + scalar)	<1536	Moderate	PLAIN	Good
Hybrid (vector + scalar)	>1536	Moderate	EXTERNAL	Fair
Scalar-heavy with occasional vector	Any	Large	EXTERNAL	Good (for scalars)
Large multimodal	>2048	Any	EXTERNAL + separate metadata table	Fair

Next Steps:

Implement automatic storage strategy selection based on dimensionality
Add query planner hints for TOAST-aware optimization
Develop vector quantization support for dimension reduction
Build monitoring dashboard for TOAST performance metrics