All notable changes to this project will be documented here. Format follows Keep a Changelog, and this project adheres to Semantic Versioning.
- New
simplifierteam agent (agents/simplifier.md). De-over-engineers code that already works and breaks no hard rule but carries more complexity than its problem requires — premature abstraction, needless indirection, speculative generality, premature optimization, and drifted duplication. Distinct trigger fromrefactorer: refactorer fixes hard-rule violations; simplifier removes superfluous complexity that violates no hard rule (already-functional, already-compliant code never invokes refactorer, so over-engineering needs its own trigger — folding both into one agent would be an SRP violation in agent design). Behavior-preserving (tests pass before and after, one simplification per commit), stakes-calibrated, and language-/project-agnostic (idiom-mapping across Python, TypeScript, Go, Rust, Java, Swift). Its over-engineering heuristics are an open, non-exhaustive catalog (YAGNI, rule-of-three, needless indirection, premature optimization, speculative generality, drifted duplication, plus KISS, Gall's Law, Ousterhout shallow modules, dead code, boolean blindness, …) — each removal must name and source its principle per the zetetic §8 standard. Newsimplifiermemory scope (memory/scope-registry.json). - A2 / CR-4 — adversarial-verify pre-verdict workflow now carries a fifth,
perspective-diverse lens:
simplicity(agentTypesimplifier), prompted to REFUTE by hunting superfluous complexity that no hard rule forbids. Joins the four existing refute lenses (residual-fp, missed-cases, robustness, test-adequacy); synthesis stays deterministic and fail-closed. (The A2/adversarial-verify work shipped in code prior to this release without a CHANGELOG entry; recorded here.)
code-reviewerMove 5 now hands off superfluous-complexity smells (over- engineering with no hard-rule violation) tosimplifieras an advisory note, distinct from the §4 size-breach blocking gate. Output format and blind-spot hand-off list updated accordingly.- README badge, body,
CONTRIBUTING.md, and the marketplace plugin/metadata descriptions refreshed: 97 reasoning patterns + 21 team agents (118 total).rules/agent-routing-table.mdregenerated (118 agents). memory/scope-coverage.md: simplifier added to the team table; counts reconciled to 20 scope-owning team agents, 117 tabulated agents, 27 distinct registry scopes (7 systemic + 18 team + 1 research + 1 genius). This doc tabulates only scope-owning agents, so its total is intentionally one less than the 118 agent definition files the README/marketplace count —memory-writeris a scribe that owns no scope. A new footnote in the doc records the distinction.
tools/tests/adversarial-verify/core.test.mjsupdated for the five-lens core (lens count, lens-keys ordering, distinct-agentType count, five-element fixtures). 13/13 deterministic synthesis-core tests pass.
- CAP-2 — adapt to a demanding team lead. New
reviewer-prefsmemory scope (memory/scope-registry.json) holding a lead's standing review preferences. Owners are the lead (_user) and the orchestrator/curator; the reviewed agents (engineer,refactorer,code-reviewer) are readers only, so an agent cannot invent its own prefs. The three agents read the scope at the top of their workflow with a fixed precedence — acoding-standards.mdblocking rule always outranks a preference — and a graceful fallback when the scope is absent. tools/seed-reviewer-prefs.sh— bootstraps a lead's prefs from evidence of how they already work (theirghreview comments + merged PRs), emitting a provenance-taggedstatus: inferreddraft. It presents raw evidence only; abstracting it into confirmed preferences is the lead/orchestrator's judgment step (§9 — no faked judgment). Injection-safe, owner-only writes, deterministic exit codes. 19-case test suite undertools/tests/seed-reviewer-prefs/.
- README badge,
assets/banner.svg,CONTRIBUTING.md, and the marketplace plugin description refreshed to the current reproducible counts: 64 skills, 18 hooks, 35 tools, 25 commands, 97 reasoning patterns + 20 team agents (117 total), 288 tests.
memory/scope-coverage.mdmiscount: thecheckpointssystemic scope was never tabulated. Reconciled to 26 scopes (7 systemic + 17 team + 1 research + 1 genius); the doc's embedded verification command now matches.
- In-process pytest suites for
manifest_gate.py(90.6% mutation score) andsemantic_layer.py(97.4%), wiring both critical zones into the mutation policy.acceptance_gate.pymutation coverage at 77.6% (residual = documented equivalents).
.craftsmanship.confsize-gate scoping: a test suite had setFILE_MAX=2000globally, weakening the production 500-line gate. Now test-scoped (TEST_FILE_MAX/TEST_FILE_RE); production staysFILE_MAX=500.- Routing table regenerated to include the
memory-writeragent.
- Craftsmanship checker (
tools/craftsmanship-checker.sh+tools/lib/craftsmanship-detectors.sh). Mechanically enforcescoding-standards.md§4 size limits and select structural rules.FILE_TOO_LONG(>500 lines) blocks; function/class/parameter/nesting block for recognized languages; grab-bag module names and layer-direction advise. Judgment rules (SRP/OCP/LSP/ISP, rule-of-three, dead-code) are deliberately NOT mechanized — a hook that fakes a verdict it cannot reach just trains you to ignore it. Mirrorszetetic-checker.sh(--staged/--files/--full, exit 0/1/2); wired into the commit/push hooks and a new CI job (hard on newly-added files, informational full-tree sweep). - Per-repo
.craftsmanship.conf(.craftsmanship.conf.example). Every threshold and per-rule severity (block/advise/off) is team-tunable; defaults are the sourced §4 numbers. A CI drift-guard test pins the defaults to the §4 table inrules/agent-reference/craftsmanship-moves.md. - Regression suites
tools/tests/hook-layer/(87 cases) andtools/tests/threshold-drift/(10 cases), run as a hard CI gate.
- Registry drift, at the root.
scripts/setup.sh's hook merge was skip-if-present and silently lethooks/hooks.jsonand.claude-plugin/plugin.jsondiverge; it is now a content-equality re-sync. Restored the droppedstop-acceptance-gate.pyStop entry (plugin.json 17 → 18 hooks). - macOS hook no-op. 11 stdin-reading hooks used
timeout 3 cat; stock macOS ships notimeout/gtimeout, so the payload was zeroed and the hooks silently did nothing. Replaced with a portable bounded read. - Fail-open holes.
pre-tool-secret-shield.py,stop-acceptance-gate.py, andstop-context-guard.pyraised on valid non-object JSON instead of failing open; guarded with anisinstanceshape check. - secret-shield false blocks + a regression. No longer hard-blocks
.env.example/.sample/.templatetemplates orkeyring/keychainsource files; restored theprintenv $SECRET/env NAMEsecret-read block. All true-positive secret blocks preserved. - git-verb guard bypasses. The commit/push gates were silently skipped by
git push;,git commit&,(git commit),sudo git commit,env X=1 git commit, andGIT_EDITOR=… git commit. One unified anchor across all 5 hooks closes both classes with no new false positives. - claim-gate latency. A per-line subprocess loop (~38–70s on routine edits) is now a 3-pass scan (<1s).
- Smaller hook fixes:
notification-handler.shpipefailcrash,session-end.shabort in a non-git directory,post-commit-difficulty.shpromiscuous grep,post-tool-error-routing.shkeyword anchoring,pre-edit-layer-check.shadvisory-honesty wording.
- Mutation testing stays operationalized as
test-engineerMove 8 (incoding-standards.md§3.2 andcraftsmanship-moves.md); a real per-stack runner integration in the acceptance gate is a tracked follow-up — an inert draft was not shipped (no current caller, per §9). - The craftsmanship CI gate is a ratchet: newly-added files must fully comply; the
legacy tree's pre-existing §4 debt (e.g.
scripts/setup.sh) is surfaced informationally, not blocked, until refactored.
- Closed-loop autonomous build (
.claude/workflows/autonomous-build-loop.js). Drives a build task to a candidate on an isolated iteration branch: refine → plan → verify-plan → orchestrator build → best-effort in-loop acceptance checks → iterate until green or the budget is spent. Repo-generic —repoPath,gateRunner, and an optional basegateConfigare inputs, so the loop runs from any working directory against a repo that need not contain this tooling. It drafts and converges a candidate; it does not self-certify. - Deterministic acceptance gate (
tools/acceptance_gate.py,tools/acceptance-gate.sh). Runs configured command gates and aggregates their exit codes — a gate passes iff its command exits 0, never a model grading its own output (Huang et al., arXiv:2310.01798). Gates any repo via--root, evaluates the committed tip via--diff-base/--diff-headin a throwaway worktree, rejects an empty diff, and fails closed. Unit-tested undertools/tests/acceptance-gate/(incl. external-repo-from-a-foreign-cwd and fail-closed cases). - Real-exec Stop-hook gate (
hooks/stop-acceptance-gate.py) — the build loop's gate component (invoked by the workflow, not registered as a global lifecycle hook). - Self-hosted web ingestion engine (
tools/web_ingest.py,tools/web_extract.py,tools/web-ingest.sh). A dependency-free replica of Firecrawl's self-hostable core (scrape/map/crawl): stdlib fetch,robots.txtrespected, main-content markdown extraction, conditional-GET caching. No web search and no LLM extraction by design; TLS verification is never disabled. - Query-indexed semantic layer over Cortex (
tools/semantic_layer.py,tools/semantic-layer.sh,memory/semantic-layer.yaml,memory/semantic-layer.schema.yaml). A YAML index keyed by query + intent (ingest/verify/compare/monitor) withfresh/stale/supersededstates. Never writes Cortex itself — the agent owns the write and passes back thecortex_idas a pointer. - Manifest-membership gate (
tools/manifest_gate.py,tools/manifest-gate.sh, tests undertools/tests/manifest-gate/). Fail-closed grounding check: every fact'ssourcemust be a URL the web-ingest engine actually fetched this session. Complements the semantic layer's presence check with a membership check; a purestdin → stdoutfilter.
- README: broader refresh. Corrected counts to ground truth — agents
116 → 117 (a 20th team-role agent,
memory-writer), skills 63 → 64; documented the autonomous build loop and the new knowledge-ingestion / semantic-layer subsystems; refreshed the genius-trigger source comment (recounted 2026-06-23). - Marketplace manifest counts refreshed (20 team agents, 64 skills, 25 commands, 24 logical tools, 17 registered hooks).
web-ingestfollows 308 redirects and records the final resolved URL.- Manifest-gate comments reworded to satisfy the absolute-claim checker (§8).
- Release workflow no longer references a non-existent
tests/run-all.sh. The tag-timerelease.ymlinvokedbash tests/run-all.sh, which has never existed, so every tagged release failed at the test step with exit 127. It now runs the same suite asci.yml(thescripts/test-memory-*.shandscripts/test-agent-id-propagation.shsuites) plus the structural auditor. - Corrected the zetetic-checker invocation from the unsupported
--treeflag to--full; valid modes are--staged,--files, and--full. - Added
jqto the release job's apt dependencies (required by the memory suites, matching CI).
- All 97 genius agents drop the explicit
tools:front-matter line. Each genius now inherits the full session tool set instead of pinning a hardcoded allow-list that had drifted from the live tool registry; pinning silently starved an agent of any tool the list omitted. Single uniform change — 97 files, 97 deletions, reasoning sections untouched. - Validated by a full isolation sweep before release: with the plugin disabled and an un-namespaced clone live, all 97 genius agents spawned and responded, each confirming file/search tools visible — 97/97, zero failures.
[2.18.0] — letta-code follow-up: lean genius corpus, compact routing, reflective checkpoints, memory contract hardening
- R1 completed for the genius corpus: all 97 genius agents split into lean core + on-demand reference stubs (same two-tier move 2.17.0 applied to the 19 team agents). Doc-covered protocol sections deleted; memory/token-budget/ worktree replaced by parameterized stubs keeping every safety-critical invariant inline; uniform reference-docs index appended. 5,169,780 → 3,248,016 chars (37.2%, ~4.8K tokens saved per spawn per agent). Reasoning sections byte-identical before/after, asserted by the rollout script.
- R2: routing never reads full files. New generated
rules/agent-routing-table.md(~25KB, name + shape keywords + description for all 116 agents, from frontmatter viascripts/generate-routing-table.py) replaces full Reads of the 132KB INDEX.md in genius:route, genius:index and the orchestrator (~30K tokens saved per routing decision). pre-commit warns when the table is stale. - R3: checkpoint stubs follow the letta summary schema — goals / file references (paths + line ranges) / errors and fixes / current state / next steps, ≤500 words, tool outputs clipped to 2K chars, frontmatter description retrieval cue. Resume contract: checkpoint + ONE targeted recall, never re-reading what the checkpoint summarizes. All 116 agent token-budget stubs and the shared token-budget.md doc teach the schema.
- R4: reflection at WARN, not at HARD. stop-context-guard.py's WARN firing is now a one-time blocking reflection (like letta's compaction event): the model spawns the new budgeted memory-writer agent (haiku, ≤16K context) to persist the semantic checkpoint + cortex:remember entries while headroom remains, then resumes the task — the HARD block becomes a formality.
- R5: mandatory
description:frontmatter on memory .md files, enforced at the memory-tool.sh chokepoint on create/rethink (instructive error,MEMORY_NO_DESC_CHECK=1test escape hatch). Contract §4.8. - R6: conflict-aware memory verbs —
rethink <path> <text> [expected_sha](atomic whole-file rewrite, letta memory_rethink) andsha <path>(CAS token);str_replacegains optional compare-and-swap. Contract §3.6b/§3.6c/§4.7; exposed via the memory_extensions MCP tool. agents/memory-writer.md— single-purpose budgeted reflection scribe.
- All 19 team agent definitions split into a lean core plus on-demand
reference docs. Shared protocol detail (token budget, memory protocol
and architecture, worktree protocol, effort calibration, codebase
intelligence, dynamic workflows, mid-task system messages) moved to 8
docs under
rules/agent-reference/, referenced from a uniform index table in every agent. Definitions shrink 987,888 → 625,267 chars (~36.7%, ~5.0K tokens saved per spawn per agent). Inline stubs remain self-sufficient for safety-critical invariants (checkpoint thresholds, memory scoping, worktree commit rules); reference docs are elaboration and recovery material, validated via headless fresh-session runs. - Agent frontmatter parameterized (
agent_topic,memory_scope,model); Haiku agents carry 170K/~120K budgets plus an escalate-to-orchestrator line, Opus agents 200K/~180K. Fixed latex-engineer/professorMEMORY_AGENT_ID=haikubug (now agent name) and the orchestrator's dangling<dynamic-workflows>prose reference.
- stop-context-guard.py re-vendored from
session-optimizer v1.1.0.
Thresholds are now per-model and loaded from
~/.claude/ctxguard-thresholds.json(embedded fallback when the config is absent or malformed; first substring match on the lowercased model id wins): Fable 5 / Mythos warn 120K hard 160K (2x Opus rates — carrying rent and the 5-min cache-expiry resume penalty bite twice as hard), Haiku 4.5 warn 120K hard 170K (200K IS the window; leave headroom for the checkpoint turn), Opus/Sonnet warn 180K hard 200K (cost discipline; window is 1M). The hard-cap block message now reports the per-model budget instead of a fixed 200K. <token-budget>model-limits table updated in all 117 agent docs. Adds the Claude Fable 5 row (160K hard cap, ~120K checkpoint), corrects Haiku's cap to 170K, and points toctxguard-thresholds.jsonas the authoritative source shared with the statusline and the Stop guard.
hooks/ctxguard-thresholds.json— vendored copy of the shared threshold config.session-start.shseeds it to~/.claude/ctxguard-thresholds.jsonwhen absent (idempotent — never overwrites user edits, so tuned thresholds survive plugin updates).
- Missing hooks in the plugin manifest.
.claude-plugin/plugin.jsonhad drifted fromhooks/hooks.json: the inlinehooksblock omitted three hooks that the canonical wiring defines, so they never registered when the plugin loaded —pre-tool-secret-shield.py(PreToolUse),stop-context-guard.py(Stop, the context-budget guard from session-optimizer), andsession-end-memory-drain.sh(Stop). The manifest now mirrorshooks/hooks.jsonexactly (17 hook commands). The Stop block fires all three lifecycle hooks; PreToolUse re-includes the secret shield.
- Marketplace entry hook count corrected (16 → 17).
- Public-readiness baseline: CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md.
- GitHub issue templates (bug / feature / audit-finding) and PR template with audit-cycle checklist.
prd-spec-generatorrow in the companion-projects table.
- LICENSE copyright corrected to Clément Deust (sole independent author); ecosystem-context preamble + explicit non-affiliation statement added.
- LinkedIn post first-comment options refined for algorithm-aware reach.
- Memory MCP. Local replica of Anthropic's managed-agent
memory_20250818tool with scope-based ACL, queue isolation, and full MCP wire compatibility. 241 tests passing across functional, ACL, concurrency, stale-lock, MCP, and PII suites. - PII / secret scrubbing on memory write path (contract §7.2).
pre-tool-secret-shieldhook — blocks any agent from reading.env,.aws/credentials,*.pem,*.key, or shell-history files.- PII scanner daemon. Persistent process eliminates Python cold-start; median scan time reduced 34→8 ms.
- Memory contract on every agent.
memory_scopefrontmatter +memorybody block added to all 19 team agents and all 97 genius agents (so each agent declares what it persists and where). - README rewrite (Tier 1 visibility), 6 supporting docs, full CI matrix, Codespaces config (subsequently removed per cross-check feedback).
- CI concurrency suite made Linux-portable (was macOS-specific).
- LinkedIn post series introducing zetetic (rewritten in plain prose; no em-dashes).
For older releases (v2.13.0 and earlier), see git history. The project predates this CHANGELOG; pre-2.13.1 versioning was driven by tag-only release notes on GitHub.