ci: perf-gate pinning the 2026-07-02 capacity baselines (closes #404) (#420)

tcconnally · tcconnally · claude · web-flow · commit 19161cfd1f52 · 2026-07-02T18:34:47.000-05:00
Completes #404 (the concurrency half already shipped as concurrency-gate). Adds five #[ignore] perf_gate_* harness tests in src/db.rs (release-only, run via the workflow) and a perf-gate.yml workflow that builds lean release and runs them with budgets/corpus sizes pinned as env vars: - recall FTS rare-term @100k p50 < 30ms (post-#401: ~0.09ms local) - recall browse @100k p50 < 5ms - get_entity @100k p50 < 1ms - as_of @50k history versions on one key p50 < 1ms - decay_tick @100k wall < 10s, plus the #399 regression signature: second-consecutive-tick rewritten rows < 1% and WAL growth < 2x DB size - cohere @100k wall < 5s and post-#400 longest single writer-lock hold < 1s, via the #400 BEGIN IMMEDIATE probe (extracted from cohere_lock_window_measurement into spawn_lock_hold_probe and shared) - history bytes per superseded version @1kb body < 2KB (dbstat when available, checkpointed file delta otherwise) Latency metrics are medians of 5 after one warmup; every metric prints a PERF-GATE table row so the job log shows the numbers, and failures are collected per test so the full table prints even when one budget blows. Seeding uses the fastest direct-SQL path (batched transactions, FTS kept in sync, no dedup/embed side effects) — 100k rows seed in ~1s, whole harness ~10s local release. Closes #404 Co-authored-by: tcconnally <hermes@perseus.observer> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
diff --git a/.github/workflows/perf-gate.yml b/.github/workflows/perf-gate.yml
@@ -0,0 +1,66 @@
+name: Perf gate
+
+# Pins the 2026-07-02 capacity-deep-dive baselines (#404): release build,
+# seeded temp DBs (fast direct-SQL seeding, no dedup/embed side effects),
+# medians of 5 for latency metrics. The budgets below already carry 3-5x
+# headroom over the ORIGINAL measured numbers for CI variance — and most now
+# have far more after the fixes that landed since the issue was written
+# (#401 rare-term recall, #400 bounded cohere lock holds, #399/#405 decay
+# no-op skip, #393/#415/#418 async auto-embed). The decay job also asserts
+# the #399 regression SIGNATURE (second-consecutive-tick rewritten rows < 1%
+# and WAL growth < 2x DB size), which wall time alone won't catch. Each test
+# prints a `PERF-GATE |` metrics table to the job log so a regression is
+# diagnosable from the run, not just red. Lean build (--no-default-features):
+# the SQLite paths under test are independent of the embedding stack, and the
+# lean build keeps the gate fast. (#404)
+
+on:
+  push:
+    branches: [main, master]
+    paths:
+      - "src/**"
+      - "Cargo.toml"
+      - "Cargo.lock"
+      - ".github/workflows/perf-gate.yml"
+  pull_request:
+    branches: [main, master]
+    paths:
+      - "src/**"
+      - "Cargo.toml"
+      - "Cargo.lock"
+      - ".github/workflows/perf-gate.yml"
+
+env:
+  CARGO_TERM_COLOR: always
+
+jobs:
+  perf-gate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install Rust
+        uses: dtolnay/rust-toolchain@stable
+      - name: Build (lean, release)
+        run: cargo build --release --no-default-features --tests
+      - name: Perf gate (100k corpus, medians of 5)
+        env:
+          # Corpus sizes (issue #404 scales)
+          MIMIR_PERF_ROWS: "100000"
+          MIMIR_PERF_HISTORY_ROWS: "50000"
+          MIMIR_PERF_HISTORY_VERSIONS: "200"
+          # Budgets (issue #404; measured baselines in parentheses)
+          MIMIR_PERF_BUDGET_RECALL_RARE_MS: "30" # rare-term FTS p50 (6.4ms; ~0.08ms post-#401)
+          MIMIR_PERF_BUDGET_BROWSE_MS: "5" # browse p50 (35µs)
+          MIMIR_PERF_BUDGET_GET_ENTITY_MS: "1" # get_entity p50 (41µs)
+          MIMIR_PERF_BUDGET_AS_OF_MS: "1" # as_of p50 @50k history rows (17µs)
+          MIMIR_PERF_BUDGET_DECAY_WALL_S: "10" # decay_tick wall (2.2-3.1s)
+          MIMIR_PERF_BUDGET_DECAY_SECOND_TICK_PCT: "1" # #399 signature: 2nd tick rewrites ~0 rows
+          MIMIR_PERF_BUDGET_DECAY_WAL_RATIO: "2" # #399 signature: WAL < 2x DB (was ~9x)
+          MIMIR_PERF_BUDGET_COHERE_WALL_S: "5" # cohere wall
+          MIMIR_PERF_BUDGET_COHERE_HOLD_MS: "1000" # post-#400 longest writer-lock hold
+          MIMIR_PERF_BUDGET_HISTORY_BYTES_PER_ROW: "2048" # @1KB body (1,448)
+        # --test-threads=1: the latency medians must not contend with another
+        # 100k seed/tick running in parallel.
+        run: |
+          cargo test --release --no-default-features perf_gate_ -- \
+            --ignored --nocapture --test-threads=1
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -35,6 +35,19 @@ All notable changes to Perseus Vault (formerly Mimir/Mneme) are documented here.
 - `mimir_stats` reports history growth (#398): `total_history_rows`,
   `history_bytes` (stored body bytes), and `top_history_keys` (top-10 keys by
   version count with bytes) — the hot state-like keys to cap first.
+- `perf-gate` CI workflow (#404, completing the issue — the concurrency half
+  shipped as `concurrency-gate`): a release-build gate that seeds temp DBs
+  via the fastest direct-SQL path and pins the 2026-07-02 capacity-deep-dive
+  baselines with 3-5× CI-variance headroom — rare-term FTS recall, browse and
+  get_entity p50 @100k, as_of p50 @50k history versions of one key,
+  decay_tick wall @100k plus the #399 regression signature (second
+  consecutive tick rewrites < 1% of rows, WAL growth < 2× DB size), cohere
+  wall @100k plus the post-#400 longest single writer-lock hold < 1s
+  (measured with the #400 BEGIN IMMEDIATE probe), and on-disk history bytes
+  per superseded version at a ~1KB body. Medians of 5 for latency metrics;
+  every metric prints a `PERF-GATE |` table row to the job log so a
+  regression is diagnosable from the run. Budgets and corpus sizes are pinned
+  as env vars in the workflow.
 - `mimir_follow` accepts an optional `workspace_hash` (#396, the #338
   pattern): when set, the efficacy stamp resolves its target row with strict
   workspace equality — the same semantics as a workspace-scoped recall — so a
diff --git a/src/db.rs b/src/db.rs