Commit 2a559bd
FTS recall cost was O(total entities), not O(hits): the plan drove from
idx_entities_recall and probed an up-to-100k-rowid IN-list materialized
from the FTS subquery, so a 20-hit rare-term query cost the same ~5-6ms
as a 33k-hit query at 100k entities.
Selective queries now materialize the FTS-matched rowids FIRST; when the
full match set fits under FTS_DRIVEN_MAX_MATCHES (512) the same ranking
ORDER BY runs over just those rows via INTEGER PRIMARY KEY lookups (NOT
INDEXED pins the plan - the rowid PK remains explicitly usable under it).
Larger match sets keep the legacy rank-index plan, which is efficient
exactly when matches are dense. A query whose FTS terms match nothing
short-circuits without touching the entities table.
Result semantics are byte-identical (same filters, ranking order, LIMIT/
OFFSET) - equivalence-tested old-plan vs new-plan, including ranking-key
ties, filter parity, probe-overflow fallback, OFFSET paging and the
zero-match case. EXPLAIN QUERY PLAN test asserts the selective arm
SEARCHes entities by rowid and never scans the ranking index.
Measured @100k (release, p50/50 iters): rare term (20 hits) 5.1ms ->
0.08ms (~66x); 1-hit term 3.4ms -> 0.04ms. Dense-match queries pay a
small fixed probe cost (common ~33k-hit term +~0.4ms) - intrinsic FTS5
prefix-doclist materialization; they were and remain O(corpus). Recall
gate green locally: auto recall@5 0.958, MRR 0.910, fts5 0.208.
Closes #401
Co-authored-by: tcconnally <hermes@perseus.observer>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent d70a8c0 commit 2a559bd
2 files changed
Lines changed: 412 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
22 | 36 | | |
23 | 37 | | |
24 | 38 | | |
| |||
0 commit comments