| uid | cd2b6d03 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| title | Cold-Boot Action Test | ||||||||||
| version | 1.0 | ||||||||||
| status | published | ||||||||||
| author |
|
||||||||||
| domain | quality-assurance | ||||||||||
| tags |
|
||||||||||
| created | 2026-04-13 | ||||||||||
| last_updated | 2026-04-13 | ||||||||||
| readers |
|
||||||||||
| scope | multi-session | ||||||||||
| estimated_duration | 30-60 minutes per action | ||||||||||
| estimated_sessions | 1 | ||||||||||
| calls | |||||||||||
| requires |
|
||||||||||
| extraction_scope | ship | ||||||||||
| member_of |
|
The pre-release quality gate for every action in .tropo/actions/. Three cold-boot agents per action. Different request styles. Parallel dispatch. Honest verdict.
Every action in .tropo/actions/ must be verified by cold-boot agents before it ships in a Tropo-OS release. A cold-boot agent starts with no session context — only the vault governance files. If the action works correctly for a cold agent, it works correctly for any user.
This playbook runs 3 agents per action with different request styles, aggregates their findings, produces a PASS/WARN/FAIL verdict, and files remediation tasks for any gaps found.
What this proved on April 13, 2026: Three parallel agents against create-project.action.md v2.2 found the collection two-write requirement gap that Metis's live execution had found — confirming it was a real spec gap, not agent variance. It also found STUDIO.md system map staleness, stale template references, and a grep precision issue. All four became tasks. All were fixed before the next version shipped.
- Run all three agents in parallel — they take the same time and produce independent findings that validate each other. Sequential runs are slower and miss variance.
- Use genuinely different request styles — vague/exploratory vs terse/direct vs contextual produce different navigation paths and surface different friction. If all three use similar language, the test loses value.
- Do not tell the agents which action to use. The test is: can a cold agent find and execute the right action? If you have to tell them, the action is not discoverable.
- Read all three reports before filing remediation tasks. Common friction across agents is structural. Friction in one agent only may be variance.
- Run the test against the version you intend to ship — not a working draft.
- Every action that appears in
.tropo/actions/and is taggedstatus: publishedmust pass this test before the action ships in a release. - An action that achieves 2/3 PASS is a WARN — it ships only with documented acknowledgment of the failing style and a filed remediation task.
- An action that achieves 1/3 or 0/3 is a FAIL — it does not ship until the gap is fixed and the test re-run.
- Cold-boot agents receive no session context. No ADRs, no prior conversation, no hints about which action to use. Vault governance files only.
- The test executor does not evaluate whether the agent's output is good — only whether the action was executed correctly and all artifacts are compliant.
Current action set in .tropo/actions/ — test priority order:
| Priority | Action | Status |
|---|---|---|
| ✅ Done | create-project.action.md |
Tested April 13. v2.3 passes 3/3. |
| 1 | create-task.action.md |
Not yet tested |
| 2 | create-collection.action.md |
Not yet tested |
| 3 | create-decision.action.md |
Not yet tested |
| 4 | delete-entry.action.md |
Not yet tested |
| 5 | generate-view.action.md |
Not yet tested |
| 6 | refresh-view.action.md |
Not yet tested |
| 7 | create-design-brief.action.md |
Not yet tested |
| 8 | create-design-spec.action.md |
Not yet tested |
- create-project test run (April 13, 2026) — the canonical example of what a passing 3/3 run looks like
- build-release.playbook.md — Phase 3 invokes this playbook as a pre-ship gate
Group 1 — Select Action and Design Prompts (executor)
↓ [Action Selected]
Group 2 — Dispatch 3 Cold-Boot Agents (parallel)
↓ [All Reports Received]
Group 3 — Aggregate and Verdict (executor)
↓ [Verdict Issued]
Group 4 — Remediation (if needed) (executor + architect)
↓ [Test Complete]
Owner: Executor (Argus or any architect-class agent) Parallel: no Depends on: none Milestone: Action Selected Milestone timeout: 10 minutes
Read .tropo/actions/00-index.md. Identify the next action that has not yet been tested (see Resources table above). Confirm the action file exists and has status: published.
Produces: Action identified — note the action filename and its primary purpose
Design three prompts that would lead a cold-boot agent to use this action — without naming the action file. One per style:
Style 1 — Vague/Exploratory: A user who knows what they want but not how the system works. Long, conversational, uncertain. Example for create-project: "I want to start tracking the work to get our board synthesizer fully operational — scheduling, registration, the whole thing. Can you set up a project for that?"
Style 2 — Terse/Direct: A user who knows exactly what they want. Minimal words. Example: "New project: Action Test Suite. Track the work to build cold-boot tests for every action in.tropo/actions/."
Style 3 — Contextual/Referential: A user referencing existing vault context. Example: "We should have a project to track Argus A22's session work — the dashboard redesign, the board registration sweep, the kernel index. Can you set one up?"
Rule: None of the prompts should say "use create-project.action.md" or name any specific action file. The agent must discover it.
Produces: Three prompts, written out, ready for agent dispatch
Owner: Executor Parallel: yes — all three agents run simultaneously Depends on: Action Selected Milestone: All Reports Received Milestone timeout: 15 minutes
Dispatch a cold-boot agent via the Agent tool with this structure:
You are a fresh agent activating in a Tropo-OS vault at [vault-root].
You have no prior context about this vault or this session.
Read the vault governance files starting with AGENTS.md at the vault root,
then follow the governance chain to orient yourself.
Once oriented, carry out this request: [Style 1 prompt]
Report back:
1. Every file you read, in order
2. Every file you created, with its UID
3. Cross-wiring verification (are all required fields present and linked?)
4. Friction points encountered
5. PASS/FAIL verdict
Same structure as 2.1 with Style 2 prompt.
Same structure as 2.1 with Style 3 prompt.
All three agents run in parallel. Wait for all three to complete before proceeding to Group 3.
Owner: Executor Parallel: no Depends on: All Reports Received Milestone: Verdict Issued Milestone timeout: 10 minutes
Read each agent's report. For each agent note:
- Did they find the correct action without being told?
- Did they execute it correctly (all required artifacts created, all fields wired)?
- What friction did they encounter (extra reads, failed globs, ambiguous instructions)?
- PASS or FAIL?
List friction points that appear in 2 or more agent reports. These are structural gaps in the action spec — not agent variance.
| Result | Verdict | Action |
|---|---|---|
| 3/3 PASS | PASS | Action ships |
| 2/3 PASS | WARN | Ships with documented exception + remediation task filed |
| 1/3 or 0/3 | FAIL | Does not ship — fix gaps, re-run test |
Produces: Written verdict with supporting evidence from the three reports
Owner: Executor + Architect Parallel: no Depends on: Verdict Issued Milestone: Test Complete Milestone timeout: 24 hours (or skip if PASS)
For each structural gap (friction in 2+ agents), file a task in 226b2bff (Tropo-OS v1.0.0 Launch project):
- Title: specific, actionable fix
- Description: which agents encountered it, what the friction was
- Owner: argus (spec fix) or vela (staleness/maintenance)
- Priority: P1 if it causes incorrect output, P2 if friction only
Mark the tested action in the Resources table above with its verdict and date. Update the priority list for the next test run.
[YYYY-MM-DD] cold-boot-action-test | [action-name] | Verdict: PASS/WARN/FAIL
Agents: 3 dispatched, N passed | Friction: [count] structural gaps | Tasks filed: [count]
- [REQUIRED] All three agents dispatched and reports received
- [REQUIRED] Verdict issued (PASS, WARN, or FAIL) with supporting evidence
- [REQUIRED] Resources table updated with verdict and date
- [REQUIRED] ops.md entry posted
- [REQUIRED] If WARN or FAIL: remediation tasks filed before proceeding to next action
- [OPTIONAL] If FAIL: action updated and test re-run before marking complete
Self-verification by executor before declaring test complete.
- Three agent reports exist and each contains: files read, files created, friction, PASS/FAIL
- Verdict is documented with the specific agent results that support it
- Resources table shows the tested action with verdict and date
- ops.md has the summary entry
- Any WARN/FAIL has at least one remediation task filed in the Vault
This playbook is itself tested by running it. The reference result is the April 13, 2026 create-project test run — 3/3 PASS, 4 gaps found, 4 tasks filed. A future test run should be comparable in structure and rigor.
Cold-Boot Action Test Playbook | v1.0 | Argus A22, April 13, 2026 "If a cold agent can find it and execute it, a user can too."