Skip to content

Commit 19161cf

Browse files
tcconnallytcconnallyclaude
authored
ci: perf-gate pinning the 2026-07-02 capacity baselines (closes #404) (#420)
Completes #404 (the concurrency half already shipped as concurrency-gate). Adds five #[ignore] perf_gate_* harness tests in src/db.rs (release-only, run via the workflow) and a perf-gate.yml workflow that builds lean release and runs them with budgets/corpus sizes pinned as env vars: - recall FTS rare-term @100k p50 < 30ms (post-#401: ~0.09ms local) - recall browse @100k p50 < 5ms - get_entity @100k p50 < 1ms - as_of @50k history versions on one key p50 < 1ms - decay_tick @100k wall < 10s, plus the #399 regression signature: second-consecutive-tick rewritten rows < 1% and WAL growth < 2x DB size - cohere @100k wall < 5s and post-#400 longest single writer-lock hold < 1s, via the #400 BEGIN IMMEDIATE probe (extracted from cohere_lock_window_measurement into spawn_lock_hold_probe and shared) - history bytes per superseded version @1kb body < 2KB (dbstat when available, checkpointed file delta otherwise) Latency metrics are medians of 5 after one warmup; every metric prints a PERF-GATE table row so the job log shows the numbers, and failures are collected per test so the full table prints even when one budget blows. Seeding uses the fastest direct-SQL path (batched transactions, FTS kept in sync, no dedup/embed side effects) — 100k rows seed in ~1s, whole harness ~10s local release. Closes #404 Co-authored-by: tcconnally <hermes@perseus.observer> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent a0557d4 commit 19161cf

3 files changed

Lines changed: 564 additions & 38 deletions

File tree

.github/workflows/perf-gate.yml

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
name: Perf gate
2+
3+
# Pins the 2026-07-02 capacity-deep-dive baselines (#404): release build,
4+
# seeded temp DBs (fast direct-SQL seeding, no dedup/embed side effects),
5+
# medians of 5 for latency metrics. The budgets below already carry 3-5x
6+
# headroom over the ORIGINAL measured numbers for CI variance — and most now
7+
# have far more after the fixes that landed since the issue was written
8+
# (#401 rare-term recall, #400 bounded cohere lock holds, #399/#405 decay
9+
# no-op skip, #393/#415/#418 async auto-embed). The decay job also asserts
10+
# the #399 regression SIGNATURE (second-consecutive-tick rewritten rows < 1%
11+
# and WAL growth < 2x DB size), which wall time alone won't catch. Each test
12+
# prints a `PERF-GATE |` metrics table to the job log so a regression is
13+
# diagnosable from the run, not just red. Lean build (--no-default-features):
14+
# the SQLite paths under test are independent of the embedding stack, and the
15+
# lean build keeps the gate fast. (#404)
16+
17+
on:
18+
push:
19+
branches: [main, master]
20+
paths:
21+
- "src/**"
22+
- "Cargo.toml"
23+
- "Cargo.lock"
24+
- ".github/workflows/perf-gate.yml"
25+
pull_request:
26+
branches: [main, master]
27+
paths:
28+
- "src/**"
29+
- "Cargo.toml"
30+
- "Cargo.lock"
31+
- ".github/workflows/perf-gate.yml"
32+
33+
env:
34+
CARGO_TERM_COLOR: always
35+
36+
jobs:
37+
perf-gate:
38+
runs-on: ubuntu-latest
39+
steps:
40+
- uses: actions/checkout@v4
41+
- name: Install Rust
42+
uses: dtolnay/rust-toolchain@stable
43+
- name: Build (lean, release)
44+
run: cargo build --release --no-default-features --tests
45+
- name: Perf gate (100k corpus, medians of 5)
46+
env:
47+
# Corpus sizes (issue #404 scales)
48+
MIMIR_PERF_ROWS: "100000"
49+
MIMIR_PERF_HISTORY_ROWS: "50000"
50+
MIMIR_PERF_HISTORY_VERSIONS: "200"
51+
# Budgets (issue #404; measured baselines in parentheses)
52+
MIMIR_PERF_BUDGET_RECALL_RARE_MS: "30" # rare-term FTS p50 (6.4ms; ~0.08ms post-#401)
53+
MIMIR_PERF_BUDGET_BROWSE_MS: "5" # browse p50 (35µs)
54+
MIMIR_PERF_BUDGET_GET_ENTITY_MS: "1" # get_entity p50 (41µs)
55+
MIMIR_PERF_BUDGET_AS_OF_MS: "1" # as_of p50 @50k history rows (17µs)
56+
MIMIR_PERF_BUDGET_DECAY_WALL_S: "10" # decay_tick wall (2.2-3.1s)
57+
MIMIR_PERF_BUDGET_DECAY_SECOND_TICK_PCT: "1" # #399 signature: 2nd tick rewrites ~0 rows
58+
MIMIR_PERF_BUDGET_DECAY_WAL_RATIO: "2" # #399 signature: WAL < 2x DB (was ~9x)
59+
MIMIR_PERF_BUDGET_COHERE_WALL_S: "5" # cohere wall
60+
MIMIR_PERF_BUDGET_COHERE_HOLD_MS: "1000" # post-#400 longest writer-lock hold
61+
MIMIR_PERF_BUDGET_HISTORY_BYTES_PER_ROW: "2048" # @1KB body (1,448)
62+
# --test-threads=1: the latency medians must not contend with another
63+
# 100k seed/tick running in parallel.
64+
run: |
65+
cargo test --release --no-default-features perf_gate_ -- \
66+
--ignored --nocapture --test-threads=1

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,19 @@ All notable changes to Perseus Vault (formerly Mimir/Mneme) are documented here.
3535
- `mimir_stats` reports history growth (#398): `total_history_rows`,
3636
`history_bytes` (stored body bytes), and `top_history_keys` (top-10 keys by
3737
version count with bytes) — the hot state-like keys to cap first.
38+
- `perf-gate` CI workflow (#404, completing the issue — the concurrency half
39+
shipped as `concurrency-gate`): a release-build gate that seeds temp DBs
40+
via the fastest direct-SQL path and pins the 2026-07-02 capacity-deep-dive
41+
baselines with 3-5× CI-variance headroom — rare-term FTS recall, browse and
42+
get_entity p50 @100k, as_of p50 @50k history versions of one key,
43+
decay_tick wall @100k plus the #399 regression signature (second
44+
consecutive tick rewrites < 1% of rows, WAL growth < 2× DB size), cohere
45+
wall @100k plus the post-#400 longest single writer-lock hold < 1s
46+
(measured with the #400 BEGIN IMMEDIATE probe), and on-disk history bytes
47+
per superseded version at a ~1KB body. Medians of 5 for latency metrics;
48+
every metric prints a `PERF-GATE |` table row to the job log so a
49+
regression is diagnosable from the run. Budgets and corpus sizes are pinned
50+
as env vars in the workflow.
3851
- `mimir_follow` accepts an optional `workspace_hash` (#396, the #338
3952
pattern): when set, the efficacy stamp resolves its target row with strict
4053
workspace equality — the same semantics as a workspace-scoped recall — so a

0 commit comments

Comments
 (0)