Self-Maintaining Filter Index (SMFI) User Guide
Self-Maintaining Filter Index (SMFI) User Guide
Quick Start
SMFI is enabled by default and requires no configuration. Your queries automatically benefit from storage-level filtering.
-- These queries automatically use SMFISELECT * FROM orders WHERE customer_id = 123; -- Uses Bloom filterSELECT * FROM events WHERE timestamp > '2025-01-01'; -- Uses Zone mapSELECT * FROM products WHERE price BETWEEN 10 AND 50; -- Uses Zone mapWhat SMFI Does For You
1. Faster Queries (Automatic)
Every query with a WHERE clause is automatically optimized:
| Query Type | Optimization | Typical Speedup |
|---|---|---|
Equality (=) | Bloom filter | 10-20x |
Range (>, <, BETWEEN) | Zone map | 5-10x |
| Combined | Both | 20-50x |
2. Zero Configuration
Unlike traditional indexes:
- No
CREATE INDEXrequired - No
REINDEXmaintenance - No memory tuning needed
3. Zero Memory Overhead
All filter structures live on disk:
- Loaded only when needed
- Shared across all connections
- Automatically cleaned up
Verifying SMFI Is Working
Check System Status
-- Overall SMFI statusSELECT * FROM pg_smfi_status();
-- Per-table statisticsSELECT table_name, bloom_checks, bloom_hits, blocks_prunedFROM pg_smfi_table_stats();Use EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;Look for these indicators:
→ Bloom Filter Check on orders.customer_id→ Zone Map Prune: 847/1000 blocks skipped→ Parallel SIMD Filter: 8 workersCheck Query Patterns
-- See what patterns SMFI has learnedSELECT table_name, column_name, query_frequency, selectivityFROM pg_speculative_filters()ORDER BY query_frequency DESC;Common Settings
Adjusting CPU Usage
By default, background maintenance uses max 15% CPU:
-- Allow more CPU for faster consolidationSET smfi_max_cpu_percent = 25;
-- Reduce impact on production workloadSET smfi_max_cpu_percent = 10;Controlling Parallel Execution
-- Disable parallel filtering (single-threaded)SET smfi_parallel_enabled = off;
-- Set maximum workersSET smfi_max_workers = 4;
-- Minimum rows before parallelizingSET smfi_parallel_threshold = 50000;Tuning Speculative Filters
-- Create filters faster (lower threshold)SET smfi_min_query_frequency = 5;
-- Keep unused filters longerSET smfi_drop_after_days = 14;
-- Allow more auto-created filters per tableSET smfi_max_filters_per_table = 20;Bulk Load Best Practices
Automatic Optimization (v3.4+)
SMFI automatically detects bulk operations and suspends tracking to avoid overhead:
-- Bulk INSERT (100+ rows) - SMFI auto-suspendsINSERT INTO large_table VALUES (1, 'data1'), (2, 'data2'), -- ... 100+ rows ... (1000, 'data1000');-- SMFI automatically resumes and schedules rebuild
-- INSERT ... SELECT - SMFI auto-suspends (when implemented)INSERT INTO large_table SELECT * FROM staging;
-- COPY FROM - SMFI auto-suspends (when implemented)COPY large_table FROM '/data/import.csv';The system:
- Detects bulk operations (≥10,000 rows)
- Suspends per-row tracking for the affected table
- Counts rows affected during suspension
- Automatically resumes tracking when operation completes
- Schedules filter rebuild if rows were modified
Manual Control (Optional)
For very large imports, you can still manually control tracking:
-- 1. Disable SMFI tracking globallySET smfi_tracking_enabled = off;
-- 2. Perform bulk loadCOPY large_table FROM '/data/import.csv';
-- 3. Re-enable trackingSET smfi_tracking_enabled = on;
-- 4. Rebuild filters for the tableCALL smfi_rebuild_table('large_table');Bulk Load Threshold
The auto-suspension threshold is 10,000 rows by default. This is hot-reloadable - no restart required:
-- Check current thresholdSHOW smfi_bulk_load_threshold;
-- Increase threshold (larger batches before auto-suspend)SET smfi_bulk_load_threshold = 50000;
-- Decrease threshold (more aggressive suspension)SET smfi_bulk_load_threshold = 1000;
-- Reset to defaultRESET smfi_bulk_load_threshold;Operations below this threshold use normal per-row tracking (~150ns overhead per row).
Performance Monitoring
Key Metrics
-- Bloom filter effectivenessSELECT table_name, bloom_checks, bloom_hits, ROUND(100.0 * bloom_hits / NULLIF(bloom_checks, 0), 2) AS hit_rate_pctFROM pg_smfi_table_stats()WHERE bloom_checks > 0;-- Zone map effectivenessSELECT table_name, total_blocks, blocks_pruned, ROUND(100.0 * blocks_pruned / NULLIF(total_blocks, 0), 2) AS prune_rate_pctFROM pg_smfi_table_stats()WHERE total_blocks > 0;Identifying Optimization Opportunities
-- Tables that might benefit from speculative filtersSELECT table_name, column_name, pattern_type, query_frequency, selectivityFROM pg_speculative_filters()WHERE selectivity < 0.01 -- Less than 1% selectivityORDER BY query_frequency DESCLIMIT 10;When to Manually Rebuild
SMFI self-maintains, but manual rebuild helps after:
- Large bulk imports
- Significant data changes
- Table truncation
-- Rebuild single tableCALL smfi_rebuild_table('orders');
-- Rebuild all tables (use sparingly)CALL smfi_rebuild_all();Troubleshooting
”Zone Map Prune: 0 blocks skipped”
Cause: Data not ordered by filter column.
Solution: Zone maps work best with ordered data (timestamps, sequences):
-- For time-series data, insert in orderINSERT INTO events (timestamp, data)VALUES (NOW(), '...'); -- Timestamps naturally ordered
-- For random data, zone maps less effective (this is expected)“Bloom Filter Check: not used”
Cause: Column not tracked or low cardinality.
Check:
SELECT * FROM pg_smfi_table_stats() WHERE table_name = 'your_table';Solution:
-- Force rebuildCALL smfi_rebuild_table('your_table');High CPU from Consolidation
Cause: Too many pending deltas, aggressive consolidation.
Solution:
-- Reduce consolidation frequencySET smfi_max_cpu_percent = 10;SET smfi_delta_threshold = 5000;SET smfi_time_threshold = 600;Queries Still Slow
Check if SMFI is being used:
EXPLAIN SELECT * FROM slow_table WHERE column = 'value';If not using SMFI:
- Verify SMFI is enabled:
SHOW smfi_enabled; - Check table stats:
SELECT * FROM pg_smfi_table_stats(); - Rebuild if needed:
CALL smfi_rebuild_table('slow_table');
Settings Reference
| Setting | Default | Description |
|---|---|---|
smfi_enabled | on | Enable/disable SMFI system |
smfi_tracking_enabled | on | Track DML for filter updates |
smfi_parallel_enabled | on | Enable parallel filtering |
smfi_max_cpu_percent | 15 | Max CPU for consolidation |
smfi_delta_threshold | 1000 | Deltas before consolidation |
smfi_time_threshold | 300 | Seconds before consolidation |
smfi_min_query_frequency | 10 | Min queries for auto-filter |
smfi_drop_after_days | 7 | Days before dropping unused filter |
smfi_max_filters_per_table | 10 | Max auto-filters per table |
smfi_parallel_threshold | 10000 | Min rows for parallel |
smfi_max_workers | CPU cores | Max parallel workers |
smfi_bloom_fpr | 0.01 | Bloom false positive rate |
See Also
- SMFI Concept Guide - Architecture details
- SQL Settings Reference - All settings
- SMFI Scalability Report - Benchmarks