Skip to content

Commit a64ef49

Browse files
tcconnallyclaude
andcommitted
fix(db): bound cohere's writer-lock window — split the maintenance transaction and chunk the decay pass
cohere previously ran promotion, a full-table decay UPDATE, link building, and archive inside ONE BEGIN IMMEDIATE, so the writer-lock hold grew linearly with store size (measured 0.34s @100k / 0.68s @200k release on a fast dev box; 4.45s @100k on the #400 deep-dive box) and crossed the default 5000ms busy_timeout just past ~130k entities — every concurrent remember failed SQLITE_BUSY during maintenance runs. The decay pass needs no atomicity with promotion or link building, so cohere now runs three bounded lock windows: promotion; the ×0.95 decay groom chunked at COHERE_DECAY_CHUNK_ROWS (1000) rowids per drop-safe IMMEDIATE transaction with a 2ms inter-chunk yield (SQLite's coarse busy handler otherwise starves waiters across back-to-back chunks); and link+archive (candidate_budget-bounded, links read-modify-write kept under one IMMEDIATE tx so the #388 clobber analysis still holds). The new non-atomicity boundaries are documented at the split site; every step is idempotent and each transaction stays drop-safe. Longest single hold @200k: 0.68s -> 0.09s; write work per hold is now bounded by the chunk size, and the residual linear component (the promote/archive full-table read scans) is ~7.5x shallower. Preserves the carry the same epsilon guard) and cohere's documented standalone ×0.95 decay semantics (deliberately NOT deduplicated into decay_tick's Ebbinghaus recompute; only the chunking structure is aligned). Regression tests: chunk-transaction count for a store above the chunk size; no-op skip write-count for floored rows; a concurrent writer with a ~250ms fine-grained wait budget succeeding during cohere @150k (fails 3/3 on pre-fix main, passes 3/3 post-fix); plus an #[ignore] lock-window measurement harness (writer-lock hold probe over a seeded store, MIMIR_COHERE_MEASURE_ROWS). Also adds a store-size ops note for MIMIR_BUSY_TIMEOUT_MS at the pool-config site. Closes #400 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 6735bb2 commit a64ef49

2 files changed

Lines changed: 497 additions & 29 deletions

File tree

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,23 @@ All notable changes to Perseus Vault (formerly Mimir/Mneme) are documented here.
4848
truncation note when capped. Edges dangling outside the returned node set
4949
are dropped (previously the unscoped path emitted edges to archived/deleted
5050
targets that the renderer couldn't resolve).
51+
- `mimir_cohere` no longer holds one writer lock for the whole grooming pass
52+
(#400). The single BEGIN IMMEDIATE previously spanned promotion, a
53+
full-table decay UPDATE, link building, and archive — a lock window linear
54+
in store size (~4.4s @100k entities) that crossed the default 5000ms
55+
`busy_timeout` just past ~130k entities, so concurrent `remember`s failed
56+
SQLITE_BUSY during every maintenance run. cohere now runs three bounded
57+
windows: promotion, a decay pass chunked at 1000 rows per drop-safe
58+
transaction (with a 2ms inter-chunk yield so waiting writers can actually
59+
acquire the lock), and link+archive. Longest single hold measured @200k
60+
entities (release, ~450B rows): 0.68s → 0.09s; the write work under any
61+
one lock is now bounded by the chunk size, and the remaining linear
62+
component (the promote/archive full-table read scans) is ~7.5x shallower
63+
than before. Preserves the #399/#405 no-op write skip (floored
64+
rows are not rewritten), cohere's documented ×0.95 standalone decay
65+
semantics, and per-transaction drop-safety (#388); the run stays correct
66+
under interleaved writers, and the new non-atomicity boundaries are
67+
documented at the split site.
5168

5269
## [2.14.0] - 2026-07-02
5370

0 commit comments

Comments
 (0)