( id SERIAL PRIMARY KEY, content TEXT, embedding VECTOR(1536) ); -- Create HNSW index CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops); Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
Problem: May not return enough results SELECT * FROM documents WHERE category = 'electronics' -- Post-filter ORDER BY embedding <=> $1 LIMIT 10; Challenge: May return < 10 results if filter is selective Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
id SERIAL PRIMARY KEY, embedding VECTOR(1536), labels SMALLINT[] -- Category labels ); -- Index includes labels for pre-filtering CREATE INDEX ON documents USING diskann (embedding vector_cosine_ops, labels); Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
Key Issue Query 1 (Spatial only) GiST 2.7 ms 633 Baseline - works fine Query 2 (Vector only) HNSW 79.4 ms N/A Baseline - works fine Query 3 (Combined, High) GiST only 1,364 ms 633 ❌ Vector index NOT used Query 4 (Very High) GiST only 5.0 ms 169 Vector index NOT used Query 5 (Low) GiST only 10,850 ms 5,882 ❌❌ CRITICAL: 10.85 seconds! Query 6 (No indexes) Sequential 95.0 ms 633 Baseline - worst case Query 7 (Moderate) GiST only 57.9 ms 2,578 ❌ Vector index NOT used Query 8 (CTE, High) GiST only 21.3 ms 633 Same plan, cache effects Query 9 (CTE, Low) GiST only 117.1 ms 5,882 Same plan, cache effects Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
<=> (SELECT embedding FROM test_places LIMIT 1) AS distance, ST_AsText(geom) FROM test_places WHERE geom && ( SELECT query_geometry FROM benchmark_query_regions WHERE label_name = 'query_high_selectivity' ) ORDER BY embedding <=> (SELECT embedding FROM test_places LIMIT 1) LIMIT 10; Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
embedding <=> (SELECT embedding FROM test_places LIMIT 1) AS distance FROM test_places WHERE geom && ( SELECT query_geometry FROM benchmark_query_regions WHERE label_name = 'query_low_selectivity' ) ORDER BY embedding <=> (SELECT embedding FROM test_places LIMIT 1) LIMIT 10; Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
<=> (SELECT embedding FROM test_places LIMIT 1) AS distance, ST_AsText(geom) FROM test_places WHERE geom && ( SELECT query_geometry FROM benchmark_query_regions WHERE label_name = 'query_moderate_selectivity' ) ORDER BY embedding <=> (SELECT embedding FROM test_places LIMIT 1) LIMIT 10; Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
- Bitmap Heap Scan: ~632 ms for 633 rows → 1 ms per row - Buffers: shared hit=5,620 - Cold cache, random block access Query 7: - Bitmap Heap Scan: ~48 ms for 2,578 rows → 0.018 ms per row Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
works great for spatial queries • ✅ HNSW (vector) works great for vector similarity • ❌ But combined queries fail: Vector index ignored, 10-100x slower Key Issue: PostgreSQL planner can't efficiently combine separate indexes Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
CREATE INDEX idx_places_diskann_composite ON places USING diskann ( embedding vector_cosine_ops, geom vector_geometry_ops ); Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025
BEFORE graph traversal - GeometryKey stored in each node enables filtering during search - Maintains graph structure optimization Shoaib Burq & M. Hassan | geobase.app | PostGIS Day 2025