Skip to content

Latest commit

 

History

History
270 lines (191 loc) · 10.5 KB

File metadata and controls

270 lines (191 loc) · 10.5 KB
uid cd2b6d03
title Cold-Boot Action Test
version 1.0
status published
author
name role
Argus A22
Chief Architect
domain quality-assurance
tags
testing
cold-boot
actions
pre-release
quality
created 2026-04-13
last_updated 2026-04-13
readers
agent
scope multi-session
estimated_duration 30-60 minutes per action
estimated_sessions 1
calls
requires
roles services channels playbooks
architect
coordination
extraction_scope ship
member_of
76bab75f

Cold-Boot Action Test

The pre-release quality gate for every action in .tropo/actions/. Three cold-boot agents per action. Different request styles. Parallel dispatch. Honest verdict.


Intent

Every action in .tropo/actions/ must be verified by cold-boot agents before it ships in a Tropo-OS release. A cold-boot agent starts with no session context — only the vault governance files. If the action works correctly for a cold agent, it works correctly for any user.

This playbook runs 3 agents per action with different request styles, aggregates their findings, produces a PASS/WARN/FAIL verdict, and files remediation tasks for any gaps found.

What this proved on April 13, 2026: Three parallel agents against create-project.action.md v2.2 found the collection two-write requirement gap that Metis's live execution had found — confirming it was a real spec gap, not agent variance. It also found STUDIO.md system map staleness, stale template references, and a grep precision issue. All four became tasks. All were fixed before the next version shipped.


Suggestions

  • Run all three agents in parallel — they take the same time and produce independent findings that validate each other. Sequential runs are slower and miss variance.
  • Use genuinely different request styles — vague/exploratory vs terse/direct vs contextual produce different navigation paths and surface different friction. If all three use similar language, the test loses value.
  • Do not tell the agents which action to use. The test is: can a cold agent find and execute the right action? If you have to tell them, the action is not discoverable.
  • Read all three reports before filing remediation tasks. Common friction across agents is structural. Friction in one agent only may be variance.
  • Run the test against the version you intend to ship — not a working draft.

Rules

  • Every action that appears in .tropo/actions/ and is tagged status: published must pass this test before the action ships in a release.
  • An action that achieves 2/3 PASS is a WARN — it ships only with documented acknowledgment of the failing style and a filed remediation task.
  • An action that achieves 1/3 or 0/3 is a FAIL — it does not ship until the gap is fixed and the test re-run.
  • Cold-boot agents receive no session context. No ADRs, no prior conversation, no hints about which action to use. Vault governance files only.
  • The test executor does not evaluate whether the agent's output is good — only whether the action was executed correctly and all artifacts are compliant.

Resources

Actions Under Test

Current action set in .tropo/actions/ — test priority order:

Priority Action Status
✅ Done create-project.action.md Tested April 13. v2.3 passes 3/3.
1 create-task.action.md Not yet tested
2 create-collection.action.md Not yet tested
3 create-decision.action.md Not yet tested
4 delete-entry.action.md Not yet tested
5 generate-view.action.md Not yet tested
6 refresh-view.action.md Not yet tested
7 create-design-brief.action.md Not yet tested
8 create-design-spec.action.md Not yet tested

Reference


Groups

Group 1 — Select Action and Design Prompts (executor)
 ↓ [Action Selected]
Group 2 — Dispatch 3 Cold-Boot Agents (parallel)
 ↓ [All Reports Received]
Group 3 — Aggregate and Verdict (executor)
 ↓ [Verdict Issued]
Group 4 — Remediation (if needed) (executor + architect)
 ↓ [Test Complete]

Group 1 — Select Action and Design Prompts

Owner: Executor (Argus or any architect-class agent) Parallel: no Depends on: none Milestone: Action Selected Milestone timeout: 10 minutes

Step 1.1 — Select the Action Under Test

Read .tropo/actions/00-index.md. Identify the next action that has not yet been tested (see Resources table above). Confirm the action file exists and has status: published.

Produces: Action identified — note the action filename and its primary purpose

Step 1.2 — Design the Three Prompts

Design three prompts that would lead a cold-boot agent to use this action — without naming the action file. One per style:

Style 1 — Vague/Exploratory: A user who knows what they want but not how the system works. Long, conversational, uncertain. Example for create-project: "I want to start tracking the work to get our board synthesizer fully operational — scheduling, registration, the whole thing. Can you set up a project for that?"

Style 2 — Terse/Direct: A user who knows exactly what they want. Minimal words. Example: "New project: Action Test Suite. Track the work to build cold-boot tests for every action in.tropo/actions/."

Style 3 — Contextual/Referential: A user referencing existing vault context. Example: "We should have a project to track Argus A22's session work — the dashboard redesign, the board registration sweep, the kernel index. Can you set one up?"

Rule: None of the prompts should say "use create-project.action.md" or name any specific action file. The agent must discover it.

Produces: Three prompts, written out, ready for agent dispatch


Group 2 — Dispatch Three Cold-Boot Agents

Owner: Executor Parallel: yes — all three agents run simultaneously Depends on: Action Selected Milestone: All Reports Received Milestone timeout: 15 minutes

Step 2.1 — Dispatch Agent 1 (Vague/Exploratory)

Dispatch a cold-boot agent via the Agent tool with this structure:

You are a fresh agent activating in a Tropo-OS vault at [vault-root].
You have no prior context about this vault or this session.

Read the vault governance files starting with AGENTS.md at the vault root,
then follow the governance chain to orient yourself.

Once oriented, carry out this request: [Style 1 prompt]

Report back:
1. Every file you read, in order
2. Every file you created, with its UID
3. Cross-wiring verification (are all required fields present and linked?)
4. Friction points encountered
5. PASS/FAIL verdict

Step 2.2 — Dispatch Agent 2 (Terse/Direct)

Same structure as 2.1 with Style 2 prompt.

Step 2.3 — Dispatch Agent 3 (Contextual/Referential)

Same structure as 2.1 with Style 3 prompt.

All three agents run in parallel. Wait for all three to complete before proceeding to Group 3.


Group 3 — Aggregate and Issue Verdict

Owner: Executor Parallel: no Depends on: All Reports Received Milestone: Verdict Issued Milestone timeout: 10 minutes

Step 3.1 — Read All Three Reports

Read each agent's report. For each agent note:

  • Did they find the correct action without being told?
  • Did they execute it correctly (all required artifacts created, all fields wired)?
  • What friction did they encounter (extra reads, failed globs, ambiguous instructions)?
  • PASS or FAIL?

Step 3.2 — Identify Common Friction

List friction points that appear in 2 or more agent reports. These are structural gaps in the action spec — not agent variance.

Step 3.3 — Issue Verdict

Result Verdict Action
3/3 PASS PASS Action ships
2/3 PASS WARN Ships with documented exception + remediation task filed
1/3 or 0/3 FAIL Does not ship — fix gaps, re-run test

Produces: Written verdict with supporting evidence from the three reports


Group 4 — Remediation (if WARN or FAIL)

Owner: Executor + Architect Parallel: no Depends on: Verdict Issued Milestone: Test Complete Milestone timeout: 24 hours (or skip if PASS)

Step 4.1 — File Remediation Tasks

For each structural gap (friction in 2+ agents), file a task in 226b2bff (Tropo-OS v1.0.0 Launch project):

  • Title: specific, actionable fix
  • Description: which agents encountered it, what the friction was
  • Owner: argus (spec fix) or vela (staleness/maintenance)
  • Priority: P1 if it causes incorrect output, P2 if friction only

Step 4.2 — Update the Resources Table

Mark the tested action in the Resources table above with its verdict and date. Update the priority list for the next test run.

Step 4.3 — Post to ops.md

[YYYY-MM-DD] cold-boot-action-test | [action-name] | Verdict: PASS/WARN/FAIL
Agents: 3 dispatched, N passed | Friction: [count] structural gaps | Tasks filed: [count]

Outcomes

  • [REQUIRED] All three agents dispatched and reports received
  • [REQUIRED] Verdict issued (PASS, WARN, or FAIL) with supporting evidence
  • [REQUIRED] Resources table updated with verdict and date
  • [REQUIRED] ops.md entry posted
  • [REQUIRED] If WARN or FAIL: remediation tasks filed before proceeding to next action
  • [OPTIONAL] If FAIL: action updated and test re-run before marking complete

Verification

Method

Self-verification by executor before declaring test complete.

Criteria

  • Three agent reports exist and each contains: files read, files created, friction, PASS/FAIL
  • Verdict is documented with the specific agent results that support it
  • Resources table shows the tested action with verdict and date
  • ops.md has the summary entry
  • Any WARN/FAIL has at least one remediation task filed in the Vault

Cold-Boot Test Cases for This Playbook

This playbook is itself tested by running it. The reference result is the April 13, 2026 create-project test run — 3/3 PASS, 4 gaps found, 4 tasks filed. A future test run should be comparable in structure and rigor.


Cold-Boot Action Test Playbook | v1.0 | Argus A22, April 13, 2026 "If a cold agent can find it and execute it, a user can too."