Skip to content

Latest commit

 

History

History
121 lines (90 loc) · 5.18 KB

File metadata and controls

121 lines (90 loc) · 5.18 KB
name code-review
description Review the current diff for correctness bugs and reuse/simplification/efficiency cleanups at the given effort level.

high effort → 3+4 angles × 6 candidates → 1-vote verify (recall-biased) → ≤10 findings

You are reviewing for recall at high effort: catch every real bug a careful reviewer would catch in one sitting. At this level, catching real bugs matters more than avoiding false positives. Err on the side of surfacing.

Phase 0 — Gather the diff

Run git diff @{upstream}...HEAD (or git diff main...HEAD / git diff HEAD~1 if there's no upstream) to get the unified diff under review. If there are uncommitted changes, or the range diff is empty, also run git diff HEAD and include the working-tree changes in scope — the review often runs before the commit. If a PR number, branch name, or file path was passed as an argument, review that target instead. Treat this diff as the review scope.

Phase 1 — Find candidates (3 correctness angles + 3 cleanup angles + 1 altitude angle, up to 6 each)

Run 7 independent finder angles via the Agent tool. Each surfaces up to 6 candidate findings with file, line, a one-line summary, and a concrete failure_scenario.

Angle A — line-by-line diff scan

Read every hunk in the diff, line by line. Then Read the enclosing function for each hunk — bugs in unchanged lines of a touched function are in scope (the PR re-exposes or fails to fix them). For every line ask: what input, state, timing, or platform makes this line wrong? Look for inverted/wrong conditions, off-by-one, null/undefined deref, missing await, falsy-zero checks, wrong-variable copy-paste, error swallowed in catch, unescaped regex metachars.

Angle B — removed-behavior auditor

For every line the diff DELETES or replaces, name the invariant or behavior it enforced, then search the new code for where that invariant is re-established. If you can't find it, that's a candidate: a removed guard, a dropped error path, a narrowed validation, a deleted test that was covering a real case.

Angle C — cross-file tracer

For each function the diff changes, find its callers (Grep for the symbol) and check whether the change breaks any call site: a new precondition, a changed return shape, a new exception, a timing/ordering dependency. Also check callees: does a parallel change in the same PR make a call unsafe?

Reuse

The angles above hunt for bugs; this one and the next two hunt for cleanup in the changed code. Flag new code that re-implements something the codebase already has — Grep shared/utility modules and files adjacent to the change, and name the existing helper to call instead.

Simplification

Flag unnecessary complexity the diff adds: redundant or derivable state, copy-paste with slight variation, deep nesting, dead code left behind. Name the simpler form that does the same job.

Efficiency

Flag wasted work the diff introduces: redundant computation or repeated I/O, independent operations run sequentially, blocking work added to startup or hot paths. Name the cheaper alternative.

Altitude

Check that each change is implemented at the right depth, not as a fragile bandaid. Special cases layered on shared infrastructure are a sign the fix isn't deep enough — prefer generalizing the underlying mechanism over adding special cases.

Cleanup and altitude candidates use the same file/line/summary shape; in failure_scenario, state the concrete cost (what is duplicated, wasted, or harder to maintain) instead of a crash. Correctness bugs always outrank cleanup and altitude findings when the output cap forces a cut.

Pass every candidate with a nameable failure scenario through — finders that silently drop half-believed candidates bypass the verify step and are the dominant cause of misses.

Phase 2 — Verify (1-vote, recall-biased)

Dedup near-duplicates (same defect, same location, same reason → keep one). For each remaining candidate, run one verifier via the Agent tool: give it the diff, the relevant file(s), and the candidate; it returns exactly one of CONFIRMED / PLAUSIBLE / REFUTED.

PLAUSIBLE by default — do not refute a candidate for being "speculative" or "depends on runtime state" when the state is realistic: concurrency races, nil/undefined on a rare-but-reachable path (error handler, cold cache, missing optional field), falsy-zero treated as missing, off-by-one on a boundary the code does not exclude, retry storms / partial failures, regex/allowlist that lost an anchor. These are PLAUSIBLE.

REFUTED only when constructible from the code: factually wrong (quote the actual line); provably impossible (type/constant/invariant — show it); already handled in this diff (cite the guard); or pure style with no observable effect.

Keep CONFIRMED and PLAUSIBLE. Drop REFUTED.

Output

Return findings as a JSON array of at most 10 objects:

[
  {
    "file": "path/to/file.ext",
    "line": 123,
    "summary": "one-sentence statement of the bug",
    "failure_scenario": "concrete inputs/state → wrong output/crash"
  }
]

Ranked most-severe first. If more than 10 survive, keep the 10 most severe. If nothing survives verification, return [].