CI: perf gate pinning the 2026-07-02 baselines (#404)#420
Merged
Conversation
Completes #404 (the concurrency half already shipped as concurrency-gate). Adds five #[ignore] perf_gate_* harness tests in src/db.rs (release-only, run via the workflow) and a perf-gate.yml workflow that builds lean release and runs them with budgets/corpus sizes pinned as env vars: - recall FTS rare-term @100k p50 < 30ms (post-#401: ~0.09ms local) - recall browse @100k p50 < 5ms - get_entity @100k p50 < 1ms - as_of @50k history versions on one key p50 < 1ms - decay_tick @100k wall < 10s, plus the #399 regression signature: second-consecutive-tick rewritten rows < 1% and WAL growth < 2x DB size - cohere @100k wall < 5s and post-#400 longest single writer-lock hold < 1s, via the #400 BEGIN IMMEDIATE probe (extracted from cohere_lock_window_measurement into spawn_lock_hold_probe and shared) - history bytes per superseded version @1kb body < 2KB (dbstat when available, checkpointed file delta otherwise) Latency metrics are medians of 5 after one warmup; every metric prints a PERF-GATE table row so the job log shows the numbers, and failures are collected per test so the full table prints even when one budget blows. Seeding uses the fastest direct-SQL path (batched transactions, FTS kept in sync, no dedup/embed side effects) — 100k rows seed in ~1s, whole harness ~10s local release. Closes #404 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Completes #404. The concurrency-gate half already merged in #405 (
.github/workflows/concurrency-gate.yml); this PR delivers the remaining perf-gate half.Closes #404
What
.github/workflows/perf-gate.yml— lean release build (--no-default-features, same rationale as concurrency-gate: the SQLite paths under test are independent of the embedding stack), path-filtered tosrc/**+Cargo.toml/Cargo.lock+ the workflow file. All corpus sizes and budgets are pinned as env vars in the yml so the numbers are visible next to the workflow.#[ignore]harness tests insrc/db.rs(perf_gate_*), run by the workflow with--ignored --nocapture --test-threads=1. Each seeds its own temp DB via the fastest direct-SQL insert path (batched transactions, FTS kept in sync, no dedup/embed side effects — the same seeding approach as the FTS recall cost is O(total entities), not O(hits) — a 20-hit query costs the same as a 33k-hit query #401/fix: drive selective FTS recall from the match set, not the ranking index (#401) #409/fix(db): bound cohere's writer-lock window — split the maintenance transaction and chunk the decay pass (#400) #411 benches; 100k rows seed in ~1s), measures medians of 5 (after one untimed warmup) for latency metrics, prints aPERF-GATE |table row per metric to the job log, and collects failures so the full table still prints when one budget blows.BEGIN IMMEDIATE/busy_timeout=0probe was extracted fromcohere_lock_window_measurementinto a sharedspawn_lock_hold_probehelper (that measurement test now uses the same helper — no behavior change).Local run (release, MSVC, lean features — same invocation as the workflow)
Whole harness: ~10s local (seeding included); the CI job is dominated by the lean release build, comfortably inside the 5-10 min target. Full default suite still green locally: 313 passed, 0 failed, 10 ignored.
Budget re-validation post-fixes (none loosened, none tightened)
Every budget from the issue was kept verbatim — they were written with 3-5× CI-variance headroom over the original measurements, and the fixes that landed since only widened the margins:
¹ Browse measures 0.51ms vs the 35µs baseline — the deep-dive likely measured the raw SQL; the gate measures the full
recall()entry (row hydration + preview handling included). Still 10× inside budget.² 44ms ≈ one 1000-row decay chunk / the candidate-budget-bounded link+archive tx, exactly the post-#400 bound.
Tightest budgets (flagging, not loosening): cohere wall (2.9s vs 5s, ~1.7×) and history bytes/row (1372B vs 2048B, ~1.5×) have the least headroom. Cohere wall is dominated by the chunked decay pass which scales with machine speed — a slow CI runner could plausibly approach 5s; if it flaps in CI, the right move is raising
MIMIR_PERF_BUDGET_COHERE_WALL_Sin the yml with a note, not weakening the lock-hold bound (the actual #400 invariant). History bytes/row is deterministic (dbstat page accounting), so 1.5× is safe.Notes
--test-threads=1so the latency medians never contend with another test's 100k seed or decay tick.dbstat(exact pages owned byentity_history); if a SQLite build lacks the DBSTAT vtab it falls back to checkpointed whole-file delta per version (strictly more conservative) and says which method it used in the log.last_accessed = nowand stored decay 0.5, so tick 1 exercises the full-write path (worst case for the wall budget) and tick 2 is the steady-state no-op the decay_tick rewrites every non-archived row every tick — 412MB WAL per tick on a 45MB DB at 100k entities #399 epsilon skip guarantees.🤖 Generated with Claude Code