Commit 9391cd5
The embedding-backed dense/hybrid/auto recall metrics were not reproducible
run-to-run: two full-500 LongMemEval runs of the SAME binary/command gave
84.6% vs 85.0% recall@1 (signatures 9babb85 vs 2477b51). fts5 was always exact;
only the modes that depend on the bundled ONNX model drifted, because
multi-threaded ORT reduces in nondeterministic order → tiny FP differences →
borderline cosine ranks flip on ~2-3 of 500 questions.
Pin the bundled (and file-backed) ONNX session to a single intra-op thread and
enable deterministic compute, so the same input yields a byte-identical
embedding every run. The model is tiny (MiniLM-L6, short inputs) and results are
LRU-cached, so the single-thread cost is negligible.
Validated: two full-500 runs of the rebuilt binary now produce IDENTICAL
signatures (9babb85 == 9babb85). Recall gate still PASS (no quality regression).
README determinism wording updated to reflect that all modes are now
reproducible run-to-run (reverting the caveat #309 had added).
Closes #310.
Co-authored-by: tcconnally <hermes@perseus.observer>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 19f9f38 commit 9391cd5
2 files changed
Lines changed: 24 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| |||
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
88 | | - | |
| 89 | + | |
| 90 | + | |
89 | 91 | | |
90 | 92 | | |
91 | 93 | | |
| |||
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
105 | | - | |
106 | | - | |
| 107 | + | |
| 108 | + | |
107 | 109 | | |
108 | 110 | | |
109 | 111 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
243 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
244 | 253 | | |
245 | 254 | | |
246 | 255 | | |
| |||
250 | 259 | | |
251 | 260 | | |
252 | 261 | | |
253 | | - | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
254 | 266 | | |
255 | 267 | | |
256 | 268 | | |
| |||
0 commit comments