uid

cd2b6d03

title

Cold-Boot Action Test

version

1.0

status

published

author

name	role
Argus A22	Chief Architect

domain

quality-assurance

Cold-Boot Action Test

The pre-release quality gate for every action in .tropo/actions/. Three cold-boot agents per action. Different request styles. Parallel dispatch. Honest verdict.

Intent

Every action in .tropo/actions/ must be verified by cold-boot agents before it ships in a Tropo-OS release. A cold-boot agent starts with no session context — only the vault governance files. If the action works correctly for a cold agent, it works correctly for any user.

This playbook runs 3 agents per action with different request styles, aggregates their findings, produces a PASS/WARN/FAIL verdict, and files remediation tasks for any gaps found.

What this proved on April 13, 2026: Three parallel agents against create-project.action.md v2.2 found the collection two-write requirement gap that Metis's live execution had found — confirming it was a real spec gap, not agent variance. It also found STUDIO.md system map staleness, stale template references, and a grep precision issue. All four became tasks. All were fixed before the next version shipped.

Suggestions

Run all three agents in parallel — they take the same time and produce independent findings that validate each other. Sequential runs are slower and miss variance.
Use genuinely different request styles — vague/exploratory vs terse/direct vs contextual produce different navigation paths and surface different friction. If all three use similar language, the test loses value.
Do not tell the agents which action to use. The test is: can a cold agent find and execute the right action? If you have to tell them, the action is not discoverable.
Read all three reports before filing remediation tasks. Common friction across agents is structural. Friction in one agent only may be variance.
Run the test against the version you intend to ship — not a working draft.

Rules

Every action that appears in .tropo/actions/ and is tagged status: published must pass this test before the action ships in a release.
An action that achieves 2/3 PASS is a WARN — it ships only with documented acknowledgment of the failing style and a filed remediation task.
An action that achieves 1/3 or 0/3 is a FAIL — it does not ship until the gap is fixed and the test re-run.
Cold-boot agents receive no session context. No ADRs, no prior conversation, no hints about which action to use. Vault governance files only.
The test executor does not evaluate whether the agent's output is good — only whether the action was executed correctly and all artifacts are compliant.

Resources

Actions Under Test

Current action set in .tropo/actions/ — test priority order:

Priority	Action	Status
✅ Done	`create-project.action.md`	Tested April 13. v2.3 passes 3/3.
1	`create-task.action.md`	Not yet tested
2	`create-collection.action.md`	Not yet tested
3	`create-decision.action.md`	Not yet tested
4	`delete-entry.action.md`	Not yet tested
5	`generate-view.action.md`	Not yet tested
6	`refresh-view.action.md`	Not yet tested
7	`create-design-brief.action.md`	Not yet tested
8	`create-design-spec.action.md`	Not yet tested

Reference

create-project test run (April 13, 2026) — the canonical example of what a passing 3/3 run looks like
build-release.playbook.md — Phase 3 invokes this playbook as a pre-ship gate

Groups

Group 1 — Select Action and Design Prompts (executor)
 ↓ [Action Selected]
Group 2 — Dispatch 3 Cold-Boot Agents (parallel)
 ↓ [All Reports Received]
Group 3 — Aggregate and Verdict (executor)
 ↓ [Verdict Issued]
Group 4 — Remediation (if needed) (executor + architect)
 ↓ [Test Complete]

Group 1 — Select Action and Design Prompts

Owner: Executor (Argus or any architect-class agent) Parallel: no Depends on: none Milestone: Action Selected Milestone timeout: 10 minutes

Step 1.1 — Select the Action Under Test

Read .tropo/actions/00-index.md. Identify the next action that has not yet been tested (see Resources table above). Confirm the action file exists and has status: published.

Produces: Action identified — note the action filename and its primary purpose

Step 1.2 — Design the Three Prompts

Design three prompts that would lead a cold-boot agent to use this action — without naming the action file. One per style:

Style 1 — Vague/Exploratory: A user who knows what they want but not how the system works. Long, conversational, uncertain. Example for create-project: "I want to start tracking the work to get our board synthesizer fully operational — scheduling, registration, the whole thing. Can you set up a project for that?"

Style 2 — Terse/Direct: A user who knows exactly what they want. Minimal words. Example: "New project: Action Test Suite. Track the work to build cold-boot tests for every action in.tropo/actions/."

Style 3 — Contextual/Referential: A user referencing existing vault context. Example: "We should have a project to track Argus A22's session work — the dashboard redesign, the board registration sweep, the kernel index. Can you set one up?"

Rule: None of the prompts should say "use create-project.action.md" or name any specific action file. The agent must discover it.

Produces: Three prompts, written out, ready for agent dispatch

Group 2 — Dispatch Three Cold-Boot Agents

Owner: Executor Parallel: yes — all three agents run simultaneously Depends on: Action Selected Milestone: All Reports Received Milestone timeout: 15 minutes

Step 2.1 — Dispatch Agent 1 (Vague/Exploratory)

Dispatch a cold-boot agent via the Agent tool with this structure:

You are a fresh agent activating in a Tropo-OS vault at [vault-root].
You have no prior context about this vault or this session.

Read the vault governance files starting with AGENTS.md at the vault root,
then follow the governance chain to orient yourself.

Once oriented, carry out this request: [Style 1 prompt]

Report back:
1. Every file you read, in order
2. Every file you created, with its UID
3. Cross-wiring verification (are all required fields present and linked?)
4. Friction points encountered
5. PASS/FAIL verdict

Step 2.2 — Dispatch Agent 2 (Terse/Direct)

Same structure as 2.1 with Style 2 prompt.

Step 2.3 — Dispatch Agent 3 (Contextual/Referential)

Same structure as 2.1 with Style 3 prompt.

All three agents run in parallel. Wait for all three to complete before proceeding to Group 3.

Group 3 — Aggregate and Issue Verdict

Owner: Executor Parallel: no Depends on: All Reports Received Milestone: Verdict Issued Milestone timeout: 10 minutes

Step 3.1 — Read All Three Reports

Read each agent's report. For each agent note:

Did they find the correct action without being told?
Did they execute it correctly (all required artifacts created, all fields wired)?
What friction did they encounter (extra reads, failed globs, ambiguous instructions)?
PASS or FAIL?

Step 3.2 — Identify Common Friction

List friction points that appear in 2 or more agent reports. These are structural gaps in the action spec — not agent variance.

Step 3.3 — Issue Verdict

Result	Verdict	Action
3/3 PASS	PASS	Action ships
2/3 PASS	WARN	Ships with documented exception + remediation task filed
1/3 or 0/3	FAIL	Does not ship — fix gaps, re-run test

Produces: Written verdict with supporting evidence from the three reports

Group 4 — Remediation (if WARN or FAIL)

Owner: Executor + Architect Parallel: no Depends on: Verdict Issued Milestone: Test Complete Milestone timeout: 24 hours (or skip if PASS)

Step 4.1 — File Remediation Tasks

For each structural gap (friction in 2+ agents), file a task in 226b2bff (Tropo-OS v1.0.0 Launch project):

Title: specific, actionable fix
Description: which agents encountered it, what the friction was
Owner: argus (spec fix) or vela (staleness/maintenance)
Priority: P1 if it causes incorrect output, P2 if friction only

Step 4.2 — Update the Resources Table

Mark the tested action in the Resources table above with its verdict and date. Update the priority list for the next test run.

Step 4.3 — Post to ops.md

[YYYY-MM-DD] cold-boot-action-test | [action-name] | Verdict: PASS/WARN/FAIL
Agents: 3 dispatched, N passed | Friction: [count] structural gaps | Tasks filed: [count]

Outcomes

[REQUIRED] All three agents dispatched and reports received
[REQUIRED] Verdict issued (PASS, WARN, or FAIL) with supporting evidence
[REQUIRED] Resources table updated with verdict and date
[REQUIRED] ops.md entry posted
[REQUIRED] If WARN or FAIL: remediation tasks filed before proceeding to next action
[OPTIONAL] If FAIL: action updated and test re-run before marking complete

Verification

Method

Self-verification by executor before declaring test complete.

Criteria

Three agent reports exist and each contains: files read, files created, friction, PASS/FAIL
Verdict is documented with the specific agent results that support it
Resources table shows the tested action with verdict and date
ops.md has the summary entry
Any WARN/FAIL has at least one remediation task filed in the Vault

Cold-Boot Test Cases for This Playbook

This playbook is itself tested by running it. The reference result is the April 13, 2026 create-project test run — 3/3 PASS, 4 gaps found, 4 tasks filed. A future test run should be comparable in structure and rigor.

Cold-Boot Action Test Playbook | v1.0 | Argus A22, April 13, 2026 "If a cold agent can find it and execute it, a user can too."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cold-Boot Action Test

Intent

Suggestions

Rules

Resources

Actions Under Test

Reference

Groups

Group 1 — Select Action and Design Prompts

Step 1.1 — Select the Action Under Test

Step 1.2 — Design the Three Prompts

Group 2 — Dispatch Three Cold-Boot Agents

Step 2.1 — Dispatch Agent 1 (Vague/Exploratory)

Step 2.2 — Dispatch Agent 2 (Terse/Direct)

Step 2.3 — Dispatch Agent 3 (Contextual/Referential)

Group 3 — Aggregate and Issue Verdict

Step 3.1 — Read All Three Reports

Step 3.2 — Identify Common Friction

Step 3.3 — Issue Verdict

Group 4 — Remediation (if WARN or FAIL)

Step 4.1 — File Remediation Tasks

Step 4.2 — Update the Resources Table

Step 4.3 — Post to ops.md

Outcomes

Verification

Method

Criteria

Cold-Boot Test Cases for This Playbook

Uh oh!

FilesExpand file tree

cd2b6d03.md

Latest commit

History

cd2b6d03.md

File metadata and controls

Cold-Boot Action Test

Intent

Suggestions

Rules

Resources

Actions Under Test

Reference

Groups

Group 1 — Select Action and Design Prompts

Step 1.1 — Select the Action Under Test

Step 1.2 — Design the Three Prompts

Group 2 — Dispatch Three Cold-Boot Agents

Step 2.1 — Dispatch Agent 1 (Vague/Exploratory)

Step 2.2 — Dispatch Agent 2 (Terse/Direct)

Step 2.3 — Dispatch Agent 3 (Contextual/Referential)

Group 3 — Aggregate and Issue Verdict

Step 3.1 — Read All Three Reports

Step 3.2 — Identify Common Friction

Step 3.3 — Issue Verdict

Group 4 — Remediation (if WARN or FAIL)

Step 4.1 — File Remediation Tasks

Step 4.2 — Update the Resources Table

Step 4.3 — Post to ops.md

Outcomes

Verification

Method

Criteria

Cold-Boot Test Cases for This Playbook