Commit 254c584
fix(db): bound cohere's writer-lock window — split the maintenance transaction and chunk the decay pass
cohere previously ran promotion, a full-table decay UPDATE, link building,
and archive inside ONE BEGIN IMMEDIATE, so the writer-lock hold grew
linearly with store size (measured 0.34s @100k / 0.68s @200k release on a
fast dev box; 4.45s @100k on the #400 deep-dive box) and crossed the
default 5000ms busy_timeout just past ~130k entities — every concurrent
remember failed SQLITE_BUSY during maintenance runs.
The decay pass needs no atomicity with promotion or link building, so
cohere now runs three bounded lock windows: promotion; the ×0.95 decay
groom chunked at COHERE_DECAY_CHUNK_ROWS (1000) rowids per drop-safe
IMMEDIATE transaction with a 2ms inter-chunk yield (SQLite's coarse busy
handler otherwise starves waiters across back-to-back chunks); and
link+archive (candidate_budget-bounded, links read-modify-write kept under
one IMMEDIATE tx so the #388 clobber analysis still holds). The new
non-atomicity boundaries are documented at the split site; every step is
idempotent and each transaction stays drop-safe.
Longest single hold @200k: 0.68s -> 0.09s; write work per hold is now
bounded by the chunk size, and the residual linear component (the
promote/archive full-table read scans) is ~7.5x shallower. Preserves the
#399/#405 no-op write skip (floored rows are not rewritten — chunk UPDATEs
carry the same epsilon guard) and cohere's documented standalone ×0.95
decay semantics (deliberately NOT deduplicated into decay_tick's
Ebbinghaus recompute; only the chunking structure is aligned).
Regression tests: chunk-transaction count for a store above the chunk
size; no-op skip write-count for floored rows; a concurrent writer with a
~250ms fine-grained wait budget succeeding during cohere @150k (fails 3/3
on pre-fix main, passes 3/3 post-fix); plus an #[ignore] lock-window
measurement harness (writer-lock hold probe over a seeded store,
MIMIR_COHERE_MEASURE_ROWS). Also adds a store-size ops note for
MIMIR_BUSY_TIMEOUT_MS at the pool-config site.
Closes #400
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>1 parent a56e9fd commit 254c584
2 files changed
Lines changed: 501 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
6 | 27 | | |
7 | 28 | | |
8 | 29 | | |
| |||
0 commit comments