Skip to content

Commit 6a940e4

Browse files
tcconnallytcconnallyclaude
authored
perf(write): defer auto-embed to a background worker — writes no longer pay ~6.7ms of inline ONNX (closes #393) (#415)
Every content-changing write ran the auto-embed synchronously post-commit (62x latency, ~145 writes/s single-writer ceiling). Deferral is within the path's own contract: the embed already ran after tx.commit() and failures were explicitly non-fatal. - remember() now enqueues (id, plaintext) to a bounded queue (1024, drop-new on overflow + rate-limited warning) served by a dedicated worker thread that drains up to 32 jobs per wake; embedding happens with NO pooled connection held, the store draws one briefly (#397 discipline). - Stale guard: the store is a single atomic conditional UPDATE that only lands while the entity's current FTS plaintext still equals the enqueued text (the #377/#379/#381 writer-family rule: no read-decide-write on stale data; FTS holds plaintext, entities.body_json may be per-write GCM ciphertext). A changed or forgotten row refuses the write; the newer body enqueued its own job. - Write path no longer consults the #219 session cache (new/changed bodies are unique by definition — up to 256 full-body compares per write for a guaranteed miss, and each insert evicted query entries that COULD hit). Sync callers (query embedding, explicit mimir_embed) keep it. - Misconfigured-backend logging (enabled, model missing, no endpoint) is rate-limited to 1/min per site instead of one eprintln per write. - Drop for Database: signal, disconnect, wait <=5s for the worker (in-flight embed finishes, queued jobs dropped), detach if wedged in a slow remote call. mimir_embed stays synchronous. - Determinism seam: doc-hidden embed_queue_flush(timeout) blocks until the queue drains; embedding-dependent tests flush instead of sleeping. Tests: regression (write returns <400ms against a 500ms fake embed backend + embedding absent at return, lands after flush — fails pre-fix at 502.7ms); stale guard unit (changed/forgotten body refused, current body lands) + e2e (change body mid-flight, surviving vector is the newest body's); overflow (cap=1, drop-new, flush still terminates). Fake Ollama-format HTTP server exercises the endpoint path in both default and lite builds; existing bundled-ONNX tests updated to flush. Measured (debug, 1KB bodies, bundled ONNX, n=40): write median 7,714us -> ~159us median-of-5 (~48x); full default suite 281 passed / 0 failed, lite build green on the new tests. Closes #393 Co-authored-by: tcconnally <hermes@perseus.observer> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent 2278daf commit 6a940e4

2 files changed

Lines changed: 734 additions & 44 deletions

File tree

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,29 @@ All notable changes to Perseus Vault (formerly Mimir/Mneme) are documented here.
1414
workspace's row (or the global `''` row) phantom counts. Omitted = the
1515
existing deterministic pick, unchanged.
1616

17+
### Changed
18+
- Auto-embed on content-changing writes now runs on a background worker
19+
instead of inline (#393): the synchronous ONNX call added ~6.7ms to every
20+
default-build write (62×, ~145 writes/s single-writer ceiling). Writes now
21+
enqueue (id, plaintext) to a bounded queue (1024 jobs, drop-new on overflow
22+
with a rate-limited warning) and return immediately; the worker drains up
23+
to 32 jobs per wake, embeds, and stores each vector through a stale guard
24+
(an atomic conditional UPDATE against the entity's current FTS plaintext),
25+
so a queued embed can never overwrite a newer body's vector. Deferral is
26+
within the existing contract — auto-embed already ran post-commit with
27+
non-fatal failures; a row simply doesn't surface in dense/hybrid search
28+
until embedded (now milliseconds later; dropped-on-overflow rows are
29+
recoverable via `mimir_embed` batch mode or their next change). Explicit
30+
`mimir_embed` stays synchronous. The write path also no longer consults
31+
the #219 embedding session cache (new/changed bodies can never hit it —
32+
each write paid up to 256 full-body string compares for nothing), and the
33+
misconfigured-backend log (enabled, model missing, no endpoint — formerly
34+
one eprintln per write) is rate-limited to once per minute. `Drop` for
35+
`Database` signals the worker and waits up to 5s: the in-flight embed
36+
finishes, remaining queued jobs are dropped. Measured (debug profile,
37+
1KB bodies, bundled ONNX, n=40, median-of-5-runs): write median
38+
7,714µs → ~159µs (~48×).
39+
1740
### Fixed
1841
- `follow()`'s row resolution no longer collapses real DB errors into
1942
"not found" (#396, the #394 principle): only `QueryReturnedNoRows` maps to

0 commit comments

Comments
 (0)