Skip to content

CI: perf gate pinning the 2026-07-02 baselines (#404)#420

Merged
tcconnally merged 1 commit into
mainfrom
ci/perf-gate
Jul 2, 2026
Merged

CI: perf gate pinning the 2026-07-02 baselines (#404)#420
tcconnally merged 1 commit into
mainfrom
ci/perf-gate

Conversation

@tcconnally

Copy link
Copy Markdown
Collaborator

Completes #404. The concurrency-gate half already merged in #405 (.github/workflows/concurrency-gate.yml); this PR delivers the remaining perf-gate half.

Closes #404

What

Local run (release, MSVC, lean features — same invocation as the workflow)

PERF-GATE | recall FTS rare-term @100k p50                 |        0.086 ms      | budget <     30.000 ms      | ok
PERF-GATE | recall browse (empty query) @100k p50          |        0.514 ms      | budget <      5.000 ms      | ok
PERF-GATE | get_entity @100k p50                           |        0.020 ms      | budget <      1.000 ms      | ok
PERF-GATE | as_of @50k history rows (one key) p50          |        0.020 ms      | budget <      1.000 ms      | ok
PERF-GATE | decay_tick @100k wall (full rewrite)           |        1.104 s       | budget <     10.000 s       | ok
PERF-GATE | decay_tick second-consecutive updated rows     |        0.000 %       | budget <      1.000 %       | ok
PERF-GATE | decay_tick second-tick WAL growth / DB size    |        0.000 x       | budget <      2.000 x       | ok
PERF-GATE | cohere @100k wall                              |        2.949 s       | budget <      5.000 s       | ok
PERF-GATE | cohere @100k longest writer-lock hold          |       44.350 ms      | budget <   1000.000 ms      | ok
PERF-GATE | history bytes/row @1KB body                    |     1372.160 B       | budget <   2048.000 B       | ok

Whole harness: ~10s local (seeding included); the CI job is dominated by the lean release build, comfortably inside the 5-10 min target. Full default suite still green locally: 313 passed, 0 failed, 10 ignored.

Budget re-validation post-fixes (none loosened, none tightened)

Every budget from the issue was kept verbatim — they were written with 3-5× CI-variance headroom over the original measurements, and the fixes that landed since only widened the margins:

Budget Issue baseline Measured now (local release) Margin
rare-term recall p50 < 30ms 6.4ms 0.086ms (post-#401) ~350×
browse p50 < 5ms 35µs 0.51ms¹ ~10×
get_entity p50 < 1ms 41µs 20µs ~50×
as_of p50 < 1ms 17µs 20µs ~50×
decay wall < 10s 2.2-3.1s 1.1s (tick 1 = full 100k rewrite, the conservative case) ~9×
2nd-tick rows < 1% ~0 post-#399 0 rows
2nd-tick WAL < 2× DB ~9× pre-#399 0.0×
cohere wall < 5s 2.9s ~1.7×
lock hold < 1s <0.1s @200k post-#400 44ms² ~22×
history bytes/row < 2KB 1,448B 1,372B (via dbstat) ~1.5×

¹ Browse measures 0.51ms vs the 35µs baseline — the deep-dive likely measured the raw SQL; the gate measures the full recall() entry (row hydration + preview handling included). Still 10× inside budget.
² 44ms ≈ one 1000-row decay chunk / the candidate-budget-bounded link+archive tx, exactly the post-#400 bound.

Tightest budgets (flagging, not loosening): cohere wall (2.9s vs 5s, ~1.7×) and history bytes/row (1372B vs 2048B, ~1.5×) have the least headroom. Cohere wall is dominated by the chunked decay pass which scales with machine speed — a slow CI runner could plausibly approach 5s; if it flaps in CI, the right move is raising MIMIR_PERF_BUDGET_COHERE_WALL_S in the yml with a note, not weakening the lock-hold bound (the actual #400 invariant). History bytes/row is deterministic (dbstat page accounting), so 1.5× is safe.

Notes

  • --test-threads=1 so the latency medians never contend with another test's 100k seed or decay tick.
  • The history-bytes measure prefers dbstat (exact pages owned by entity_history); if a SQLite build lacks the DBSTAT vtab it falls back to checkpointed whole-file delta per version (strictly more conservative) and says which method it used in the log.
  • The decay gate seeds rows with last_accessed = now and stored decay 0.5, so tick 1 exercises the full-write path (worst case for the wall budget) and tick 2 is the steady-state no-op the decay_tick rewrites every non-archived row every tick — 412MB WAL per tick on a 45MB DB at 100k entities #399 epsilon skip guarantees.

🤖 Generated with Claude Code

Completes #404 (the concurrency half already shipped as concurrency-gate).
Adds five #[ignore] perf_gate_* harness tests in src/db.rs (release-only,
run via the workflow) and a perf-gate.yml workflow that builds lean release
and runs them with budgets/corpus sizes pinned as env vars:

- recall FTS rare-term @100k p50 < 30ms (post-#401: ~0.09ms local)
- recall browse @100k p50 < 5ms
- get_entity @100k p50 < 1ms
- as_of @50k history versions on one key p50 < 1ms
- decay_tick @100k wall < 10s, plus the #399 regression signature:
  second-consecutive-tick rewritten rows < 1% and WAL growth < 2x DB size
- cohere @100k wall < 5s and post-#400 longest single writer-lock hold
  < 1s, via the #400 BEGIN IMMEDIATE probe (extracted from
  cohere_lock_window_measurement into spawn_lock_hold_probe and shared)
- history bytes per superseded version @1kb body < 2KB (dbstat when
  available, checkpointed file delta otherwise)

Latency metrics are medians of 5 after one warmup; every metric prints a
PERF-GATE table row so the job log shows the numbers, and failures are
collected per test so the full table prints even when one budget blows.
Seeding uses the fastest direct-SQL path (batched transactions, FTS kept
in sync, no dedup/embed side effects) — 100k rows seed in ~1s, whole
harness ~10s local release.

Closes #404

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: add a concurrency gate (promote the pool load test at 2× oversubscription) and a perf gate pinning the 2026-07-02 baselines

1 participant