Skip to content

Commit 09b448f

Browse files
committed
feat: add verifiable checkpoint provenance
1 parent ac08bb7 commit 09b448f

12 files changed

Lines changed: 737 additions & 65 deletions

File tree

.planning/REQUIREMENTS.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,15 @@
143143
- [x] **RECOLLECT-06**: Operator/NOC surfaces expose recent recollection decisions, skipped-search reasons, false-positive rate, and the downstream answer/tool step that used or ignored injected memory.
144144
- [x] **RECOLLECT-07**: Recollection and context-pack receipts label each memory item by belief stage: bronze raw source snapshot, silver candidate claim, or gold admitted operational truth; agents may rely on gold directly, must caveat silver, and may use bronze only as source evidence unless promotion policy admits it.
145145

146+
## PROV Verifiable Action Provenance + Tamper-Evident Audit (Proposed)
147+
148+
*Source: 2026-07-01 external developer question (Arden) on crash-consistent, auditable "proof" linking agent output to consumed memories and tools. See ROADMAP.md Backlog item 18.*
149+
150+
- [ ] **PROV-01**: Provenance is captured at the read/tool-call boundary (which memories were read, which tools/commands ran, with source id + hash) rather than self-reported by the agent at checkpoint time, so every output carries a verified set of consumed inputs.
151+
- [ ] **PROV-02**: The audit entry for a significant action is written inside the same database transaction as the action itself, so the action and its audit row commit or fail together and rows cannot be silently dropped; the current "audit never breaks the primary action" contract is preserved or explicitly redesigned.
152+
- [ ] **PROV-03**: Audit entries are hash-chained (each row references the prior row's hash) so tampering, deletion, or gaps in the trail are detectable, with a verification path that reports the first broken link.
153+
- [ ] **PROV-04**: On crash/restart, the resumed checkpoint plus the transactional audit chain reconstruct a verifiable trail with no unaccounted actions between the last checkpoint and the crash; verification work stays off the hot path and provenance receipts expose no raw sensitive payloads.
154+
146155
---
147156

148157
## Future Requirements (Bounded Spikes Complete; Adoption Deferred)

.planning/ROADMAP.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -538,6 +538,13 @@ Full archive: `.planning/milestones/v1.7-ROADMAP.md`
538538
- Requirements: `UX-FOLLOWUP-07` and `INSTALL-FOLLOWUP-01` in `.planning/REQUIREMENTS.md`.
539539
- Goal: make service health and ownership visible from the UI, then preserve Docker as an explicit optional test/demo path instead of letting local containers, images, or demo volumes become the default operator footprint.
540540

541+
18. **P1 — Plan Verifiable Action Provenance + Tamper-Evident Audit.**
542+
- Source signal: 2026-07-01 external developer question (Arden) on whether agent action "proof" — binding an output to the memories it consumed and tools it used — stays consistent and auditable across crashes/restarts. Current state does not enforce this: `provenancePointers` are agent-supplied and pass straight through (`apps/memroos/src/app/api/agent-checkpoints/route.ts`), and `writeAuditLog` is fire-and-forget and never throws (`apps/memroos/src/lib/audit.ts`), so rows can drop silently.
543+
- Related backlog: overlaps the P1 Harness Control Plane evidence-bundle work (item 5) and the "Universal evidence bundles" / "Audit/HIL hardening: hash chaining" Later Ideas; this item is the integrity/consistency slice of that surface.
544+
- Requirements: `PROV-01..04` in `.planning/REQUIREMENTS.md`.
545+
- Goal: move provenance from honor-system to enforced — capture consumed memories/tools at the read/tool-call boundary rather than self-report, write the audit entry in the same transaction as the action so they commit or fail atomically, and hash-chain audit entries so gaps or edits are detectable; on crash/restart the resumed checkpoint reconstructs a verifiable trail.
546+
- Gate: the availability contract that audit failures must not break the primary action must be preserved or explicitly redesigned; no heavy verification work on the hot path; no raw sensitive payloads exposed in provenance receipts.
547+
541548
### Later Ideas
542549

543550
- [ ] HIL edit-and-continue semantics (modify task state before resuming graph)

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This version has breaking changes — APIs, conventions, and file structure may
77
<!-- gitnexus:start -->
88
# GitNexus — Code Intelligence
99

10-
This project is indexed by GitNexus as **memroos** (16021 symbols, 30129 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
10+
This project is indexed by GitNexus as **memroos** (16216 symbols, 30433 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
1111

1212
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
1313

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<!-- gitnexus:start -->
44
# GitNexus — Code Intelligence
55

6-
This project is indexed by GitNexus as **memroos** (16021 symbols, 30129 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
6+
This project is indexed by GitNexus as **memroos** (16216 symbols, 30433 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
77

88
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
99

apps/memroos/src/__tests__/proxy.test.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,18 @@ describe("proxy", () => {
7878
expect(await response.json()).toEqual({ error: "authentication required" });
7979
});
8080

81+
it("lets runtime health respond without a session token", async () => {
82+
const response = await proxy(
83+
new NextRequest("http://localhost:3002/api/health", {
84+
method: "GET",
85+
headers: { host: "localhost:3002" },
86+
})
87+
);
88+
89+
expect(response.status).toBe(200);
90+
expect(await response.text()).toBe("");
91+
});
92+
8193
it("lets agent onboarding bootstrap routes handle their own signed-token authorization", async () => {
8294
const scriptResponse = await proxy(
8395
new NextRequest("http://localhost:3002/api/onboarding/script?token=signed-token", {

apps/memroos/src/app/api/agent-checkpoints/__tests__/route.test.ts

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,28 @@
11
// @vitest-environment node
2-
import { describe, expect, it } from "vitest";
2+
import Database from "better-sqlite3";
3+
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
4+
import { initSchema } from "@/lib/db-schema";
5+
6+
let db: Database.Database;
7+
8+
vi.mock("@/lib/db", () => ({
9+
getDb: () => db,
10+
closeDb: () => {},
11+
}));
312

413
const checkpointsRoute = await import("../route");
514
const metricsRoute = await import("../metrics/route");
615

716
describe("/api/agent-checkpoints", () => {
17+
beforeEach(() => {
18+
db = new Database(":memory:");
19+
initSchema(db);
20+
});
21+
22+
afterEach(() => {
23+
db.close();
24+
});
25+
826
it("blocks direct non-local checkpoint writes without operator authorization", async () => {
927
const req = new Request("https://memroos.example.com/api/agent-checkpoints", {
1028
method: "POST",

apps/memroos/src/lib/__tests__/agent-checkpoints.test.ts

Lines changed: 133 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,14 @@
22
import Database from "better-sqlite3";
33
import { afterEach, beforeEach, describe, expect, it } from "vitest";
44

5-
import { createAgentCheckpoint, resumeFromCheckpoint, getCheckpointMetrics } from "@/lib/agent-checkpoints";
5+
import {
6+
createAgentCheckpoint,
7+
getCheckpointMetrics,
8+
resumeFromCheckpoint,
9+
verifyCheckpointAuditChain,
10+
} from "@/lib/agent-checkpoints";
611
import { initSchema } from "@/lib/db-schema";
12+
import { recordEfficiencyEvent } from "@/lib/efficiency-telemetry";
713

814
let db: Database.Database;
915

@@ -61,4 +67,130 @@ describe("agent lightweight checkpoint/resume", () => {
6167
expect(metrics.avgWriteLatencyMs).toBeGreaterThan(0);
6268
expect(metrics.avgCheckpointSize).toBeGreaterThan(0);
6369
});
70+
71+
it("writes a checkpoint audit entry with verified boundary provenance receipts", () => {
72+
const runId = "test-run-provenance";
73+
74+
recordEfficiencyEvent(db, {
75+
eventType: "source_read",
76+
taskId: runId,
77+
agentId: "codex",
78+
payload: {
79+
sourceId: "README.md",
80+
sourceHash: "sha256:readme",
81+
toolId: "read_file",
82+
},
83+
createdAt: "2026-07-02T08:00:00.000Z",
84+
});
85+
recordEfficiencyEvent(db, {
86+
eventType: "memory_write",
87+
taskId: runId,
88+
agentId: "codex",
89+
payload: {
90+
source: "agent_memory",
91+
dedupHash: "sha256:memory",
92+
firstSeenAt: "2026-07-02T07:59:00.000Z",
93+
isRediscovery: false,
94+
},
95+
createdAt: "2026-07-02T08:01:00.000Z",
96+
});
97+
98+
const checkpoint = createAgentCheckpoint(db, {
99+
runId,
100+
ownerAgentId: "codex",
101+
objective: "Capture verified provenance",
102+
nextSafeAction: "Resume from checkpoint",
103+
provenancePointers: ["agent-supplied-pointer"],
104+
});
105+
106+
expect(checkpoint.provenanceAudit).toMatchObject({
107+
provenanceReceiptCount: 2,
108+
previousEntryHash: null,
109+
});
110+
expect(checkpoint.provenanceAudit?.checkpointHash).toMatch(/^sha256:[a-f0-9]{64}$/);
111+
expect(checkpoint.provenanceAudit?.entryHash).toMatch(/^sha256:[a-f0-9]{64}$/);
112+
113+
const row = db
114+
.prepare("SELECT * FROM audit_entries WHERE event_type = 'agent.checkpointed' AND entity_id = ?")
115+
.get(`agent_checkpoint:${checkpoint.id}`) as { metadata_json: string } | undefined;
116+
expect(row).toBeDefined();
117+
118+
const metadata = JSON.parse(row!.metadata_json) as {
119+
provenanceReceipts: Array<{ kind: string; sourceId: string; sourceHash: string }>;
120+
legacyPointerCount: number;
121+
};
122+
expect(metadata.legacyPointerCount).toBe(1);
123+
expect(metadata.provenanceReceipts).toHaveLength(2);
124+
expect(metadata.provenanceReceipts.map((receipt) => receipt.kind).sort()).toEqual([
125+
"memory_write",
126+
"source_read",
127+
]);
128+
expect(JSON.stringify(metadata.provenanceReceipts)).not.toContain("agent-supplied-pointer");
129+
130+
const resumed = resumeFromCheckpoint(db, "default-tenant", runId);
131+
expect(resumed?.provenanceAudit?.entryHash).toBe(checkpoint.provenanceAudit?.entryHash);
132+
133+
expect(verifyCheckpointAuditChain(db, "default-tenant")).toMatchObject({
134+
valid: true,
135+
checked: 1,
136+
});
137+
});
138+
139+
it("rolls back the checkpoint insert when the transactional audit write fails", () => {
140+
db.exec("DROP TABLE audit_entries");
141+
142+
expect(() =>
143+
createAgentCheckpoint(db, {
144+
runId: "test-run-rollback",
145+
ownerAgentId: "codex",
146+
objective: "Prove checkpoint audit atomicity",
147+
nextSafeAction: "Retry after schema repair",
148+
})
149+
).toThrow(/audit_entries/);
150+
151+
expect(db.prepare("SELECT COUNT(*) AS count FROM agent_checkpoints").get()).toEqual({
152+
count: 0,
153+
});
154+
});
155+
156+
it("detects a broken checkpoint audit chain row", () => {
157+
createAgentCheckpoint(db, {
158+
runId: "test-run-chain",
159+
ownerAgentId: "codex",
160+
objective: "Create a valid checkpoint audit row",
161+
nextSafeAction: "Insert forged audit row",
162+
});
163+
164+
db.prepare(
165+
`INSERT INTO audit_entries
166+
(id, tenant_id, actor_id, actor_role, event_type, entity_type, entity_id, reason, metadata_json, created_at)
167+
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
168+
).run(
169+
"forged-checkpoint-audit",
170+
"default-tenant",
171+
"codex",
172+
"system",
173+
"agent.checkpointed",
174+
"agent_checkpoint",
175+
"agent_checkpoint:forged",
176+
"forged checkpoint audit row",
177+
JSON.stringify({
178+
schemaVersion: 1,
179+
checkpointId: "forged",
180+
runId: "test-run-chain",
181+
checkpointHash: "sha256:forged",
182+
previousEntryHash: "sha256:not-the-prior-row",
183+
provenanceReceipts: [],
184+
legacyPointerCount: 0,
185+
entryHash: "sha256:not-a-real-entry-hash",
186+
}),
187+
"2026-07-02T08:02:00.000Z"
188+
);
189+
190+
expect(verifyCheckpointAuditChain(db, "default-tenant")).toMatchObject({
191+
valid: false,
192+
checked: 2,
193+
firstBrokenEntryId: "forged-checkpoint-audit",
194+
});
195+
});
64196
});

0 commit comments

Comments
 (0)