Add the smallest useful baseline-snapshot contract for the evaluation harness:
- a typed snapshot artifact for the latest completed benchmark runs
- operator-facing snapshot capture via CLI
- status visibility into stored baseline snapshots
This slice does not yet add PR-to-baseline reporting or required-check wiring.
PR4 added a store-vs-store CI comparator, but Engram still lacks a stable versioned baseline artifact that can be captured intentionally and reused later. Without that, later PR delta reporting still depends on ad hoc branch state instead of a named baseline.
PR32 fixes that by introducing a typed snapshot of the latest completed benchmark runs under the eval store.
benchmarkBaselineSnapshotsEnabled- typed
EvalBaselineSnapshotvalidation baselines/<snapshotId>.jsonstorage under the eval rootopenclaw engram benchmark-baseline-snapshot --snapshot-id <id>benchmark-statusvisibility for baseline snapshot counts and the latest snapshot summary
- PR delta reporting
- benchmark execution
- required CI rollout
- shadow-recall changes
Each baseline snapshot stores:
snapshotIdcreatedAtsourceRootDirbenchmarkCountbenchmarks[]:benchmarkIdrunIdcompletedAtgitRefpassRate- shared metrics
- optional
notes - optional
gitRef
The snapshot captures the latest completed run per benchmark at snapshot time.
npx tsx --test tests/eval-baseline-snapshots.test.ts tests/cli-benchmark-status.test.ts tests/evals-benchmark-tools.test.ts tests/evals-shadow-recording.test.tsnpm run check-typesnpm run check-config-contractnpm testnpm run build
- PR33: compare candidate eval stores against a named stored baseline snapshot and emit a human-readable delta report
- PR34: wire the delta reporter into required CI/docs so memory regressions block merge by default