feat: add double-check skill (cross-provider verification)#177
Merged
Conversation
New auto-discovered skill that gets an independent second opinion on
finished work from a *different* AI provider's CLI agent (codex, claude,
gemini, or cursor-agent) running locally, then drives a constructive
back-and-forth between the two agents until both genuinely agree.
Host-agnostic: detects the hosting agent's model lab and picks a verifier
from a different lab, at its best model and highest reasoning effort, in a
read-only sandbox. Includes a provider command reference and an adversarial
verifier brief template in resources/.
The skill was itself authored using the loop it prescribes — codex
(gpt-5.5, xhigh) acted as the cross-provider verifier across three rounds
until it returned no-issues.
- Add claude/.claude/skills/double-check/{SKILL.md,resources/}
- Register in CLAUDE.md skill list + trigger line
- Add README discovery bullet, bump skill counts 27 -> 28
- Add minor changeset
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two refinements from review feedback, both re-verified by the codex cross-provider loop (no-issues): - Context vs target: the verifier should read enough surrounding context (docs, callers, tests, conventions) to judge the work properly, while keeping the review *target* fixed on the named work — understand broadly, judge narrowly. Replaces the earlier over-tight "read only named paths" wording that starved the verifier of context. - Work not yet on disk: handle double-checking an in-progress plan or change that lives only in the host agent's conversation. The host must materialize it (scratch file or inline) and tell the verifier that THAT is the work and the committed repo is background context only, so it never reviews stale committed code. Adds a "where the work lives" section to the brief template and a "Delivering work that isn't on disk yet" subsection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a new auto-discovered skill,
double-check, that gets an independent second opinion on finished work from a different AI provider's CLI agent (codex / claude / gemini / cursor-agent) running locally on the machine, then drives a constructive back-and-forth between the two agents until both genuinely agree — convergence, not a single rubber-stamp pass.Why
A model is the worst-placed reviewer of its own work: it shares every blind spot that produced the bug. A genuinely different reasoning system catches what self-review can't.
Design highlights
cursor-agentrunning Sonnet is still Anthropic and doesn't count against a Claude host.--sandbox read-only -c model_reasoning_effort="xhigh", claude--model opus --effort max --permission-mode plan.Files
claude/.claude/skills/double-check/SKILL.mdclaude/.claude/skills/double-check/resources/providers.md— verified CLI invocations + best model/effort/sandbox flags + model-lab tableclaude/.claude/skills/double-check/resources/brief-template.md— adversarial verifier briefCLAUDE.md/README.md— registered the skill, counts 27 → 28.changeset/double-check-skill.md— minorDogfooded
The skill was authored using the exact loop it prescribes, with codex (gpt-5.5, xhigh) as the cross-provider verifier:
issues-foundissues-foundno-issuesReal issues codex caught and that were fixed: Claude verifier not at max effort;
cursor-agenthas no read-only mode; lab-vs-binary diversity; a Claude-specific tool in a host-agnostic skill; a verdict-contract loophole hiding nit-only findings; and a secrets-disclosure gap (referencing a file transmits it).All repo tests pass.
🤖 Generated with Claude Code