feat(workflows): Kimi cross-model harness + eval gate (MiniMax parity) by narutomugens-byte · Pull Request #2049 · coleam00/Archon

narutomugens-byte · 2026-06-30T14:44:03Z

Summary

Problem: Kimi (kimi-for-coding via the Pi provider) had no first-class workflow coverage in the repo, while MiniMax shipped a full committed variant set. Cross-model Kimi work lived only as volatile, home-scoped personal workflows.
Why it matters: Brings Kimi to parity with the existing MiniMax precedent so the cheapest coding backend is usable from repo workflows (Opus plans/reviews, Kimi writes), plus lands the whitepaper-derived pre-merge eval gate the build harness pairs with.
What changed: Adds four experimental workflow definitions — a PIV build harness, a GitHub-issue-fix variant, an e2e connectivity smoke, and the agentic-eval-gate verification gate.
What did NOT change (scope boundary): No code, no bundled defaults, no DB schema. kimi-coding is already a registered Pi vendor (pi-vendor-map.generated.ts), so this is YAML-only. The pre-existing "$node.output" double-quote validator warning on the two Kimi bash nodes is matched to the MiniMax precedent on purpose and left for a separate family-wide cleanup.

UX Journey

Before

Operator wants Kimi to do a build/fix in a repo workflow
  → no repo workflow exists; only home-scoped personal YAMLs (volatile, per-machine)
  → MiniMax has the full set; Kimi does not  → no parity
  → no committed pre-merge eval gate to pair with a build loop

After

Operator                Archon                       Provider
────────                ──────                       ────────
run opus-plan-kimi-build ─▶ plan        (Opus/large) ─▶ writes file-level plan (no edits)
                           implement   (Pi/Kimi)     ─▶ *applies the edits*
                           review      (Opus/large)  ─▶ findings + fix-plan (no edits)
                           fix         (Pi/Kimi)     ─▶ applies fix-plan exactly
                           re-review   (Opus/large)  ─▶ ship / needs-work verdict

run agentic-eval-gate   ─▶ get-diff/run-checks (bash) ─▶ output-eval + trajectory-eval
                           ─▶ verdict: PASS only if output=pass AND trajectory=sound
run e2e-kimi-smoke      ─▶ hello/identify/json (Kimi) ─▶ assert routing via Pi session jsonl
run archon-fix-github-issue-kimi ─▶ classify→smoke-validate→…→PR→review→self-fix (all Kimi)

Architecture Diagram

Before

.archon/workflows/
├── maintainer/        repo-triage-minimax, maintainer-standup-minimax
├── experimental/      archon-fix-github-issue-minimax
└── test-workflows/    e2e-minimax-smoke, minimax-{isolate,seq,smoke}
                       (no Kimi equivalents; no committed eval gate)

After

.archon/workflows/
├── experimental/   [+] opus-plan-kimi-build.yaml         (PIV build harness)
│                   [+] archon-fix-github-issue-kimi.yaml (issue-fix variant)
│                   [+] agentic-eval-gate.yaml            (whitepaper pre-merge gate)
└── test-workflows/ [+] e2e-kimi-smoke.yaml               (connectivity smoke)
                        === routes to Pi provider → kimi-coding/kimi-for-coding
                        === eval gate routes to claude (large/medium/small tiers)

Connection inventory:

From	To	Status	Notes
Kimi workflows	Pi provider (`kimi-coding`)	new	vendor already registered; no code wiring
opus-plan-kimi-build (plan/review)	`large` tier → Claude Opus	new	reasoning nodes
opus-plan-kimi-build (implement/fix)	Pi `kimi-coding/kimi-for-coding`	new	code-writing nodes
agentic-eval-gate	Claude (medium/small tiers)	new	read-only judge; cheap-model routing

Label Snapshot

Risk: risk: low
Size: size: M
Scope: workflows
Module: workflows:definitions

Change Metadata

Change type: feature
Primary scope: workflows

Linked Issue

Closes #
Related # (MiniMax parity precedent)

Validation Evidence (required)

bun run cli validate workflows opus-plan-kimi-build --json
# → valid:true, errors:0, warnings:0
bun run cli validate workflows archon-fix-github-issue-kimi --json
# → valid:true, errors:0, warnings:1  (matches MiniMax precedent)
bun run cli validate workflows e2e-kimi-smoke --json
# → valid:true, errors:0, warnings:1  (matches MiniMax precedent)
bun run cli validate workflows agentic-eval-gate --json
# → valid:true, errors:0, warnings:0

Evidence provided: workflow validation output above (all four valid, 0 errors). The single
warning on the two Kimi bash nodes ("$node.output" double-quoting) is identical to the
committed MiniMax counterparts (archon-fix-github-issue-minimax, e2e-minimax-smoke) —
kept for faithful parity.
If any command is intentionally skipped: full bun run validate (type-check/lint/test/
check:bundled) skipped — this PR adds only repo .archon/workflows/ YAML, touches no
TypeScript, no bundled defaults, and no DB schema, so those gates are unaffected.

Security Impact (required)

New permissions/capabilities? No
New external network calls? No (uses the already-registered Pi kimi-coding vendor and the operator's existing local Pi/Claude credentials)
Secrets/tokens handling changed? No (workflows document the existing archon ai key set kimi-coding flow; no new secret paths)
File system access scope changed? No

Compatibility / Migration

Backward compatible? Yes (purely additive — four new workflow files)
Config/env changes? No
Database migration needed? No

Human Verification (required)

Verified scenarios: all four workflows pass archon validate workflows (0 errors). DAG
shapes, depends_on edges, when: gates, and provider/model routing reviewed against the
MiniMax precedent file-by-file.
Edge cases checked: confirmed kimi-coding is a registered Pi vendor in
pi-vendor-map.generated.ts; confirmed the validator warnings match the MiniMax variants;
confirmed the eval gate avoids the $BASE_BRANCH eager-resolution trap (resolves base in-shell).
What was not verified: live end-to-end runs against a real Kimi credential (requires the
operator's kimi-coding Pi auth; the smoke workflow exists precisely to do this on demand).

Side Effects / Blast Radius (required)

Affected subsystems/workflows: workflow discovery only (four new entries in /workflow list).
Potential unintended effects: none for existing workflows (additive, distinct names).
Guardrails/monitoring: archon validate workflows in CI; e2e-kimi-smoke for runtime routing proof.

Rollback Plan (required)

Fast rollback: git revert e479b57b, or delete the four YAML files — no state, no migration.
Feature flags/toggles: none needed (workflows are opt-in by invocation).
Observable failure symptoms: a Kimi workflow 401s → stale KIMI_API_KEY in ~/.archon/.env overriding Pi auth (documented in each file's header).

Risks and Mitigations

Risk: Pi has no native structured-output mode → output_format nodes rely on best-effort JSON extraction, which can be flakier than Claude/Codex.
- Mitigation: documented in the workflow headers; schema is still validated with up-to-3 re-asks; smoke test exercises the JSON path explicitly.
Risk: pre-existing "$node.output" double-quote warning carried over from the MiniMax precedent.
- Mitigation: intentionally matched for parity; flagged here for a separate family-wide cleanup so MiniMax and Kimi are fixed together.

Summary by CodeRabbit

New Features
- Added several new experimental workflows for agentic change verification, issue-fixing, and build planning/review.
- Introduced safer validation steps that check changes against specs, claims, and available local checks before approving results.
Tests
- Added an end-to-end smoke test for the Kimi provider, including connectivity, structured output, and session verification checks.

…iniMax parity) Bring the home-scoped Kimi build harness into .archon/workflows/, following the committed MiniMax precedent. Code already supports Kimi via the Pi `kimi-coding` vendor — this is workflow-YAML only. - experimental/opus-plan-kimi-build.yaml: cross-model PIV build loop (Opus plans/reviews via the `large` tier, Kimi implements/fixes via pi + kimi-coding/kimi-for-coding). Default isolation off `dev`. - experimental/archon-fix-github-issue-kimi.yaml: faithful mirror of archon-fix-github-issue-minimax.yaml, all nodes on Kimi. - test-workflows/e2e-kimi-smoke.yaml: connectivity/capability smoke mirroring e2e-minimax-smoke.yaml, asserts Pi session-log routing to kimi-coding. - experimental/agentic-eval-gate.yaml: standalone output+trajectory pre-merge gate ("set the bar at the eval, not the demo"); pairs with the build harness. All four pass `archon validate workflows`. The two non-blocking shell-quoting warnings are inherited verbatim from the MiniMax sources (parity, not new). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-30T14:44:14Z

📝 Walkthrough

Walkthrough

Four new Archon workflow YAML files are added: agentic-eval-gate (a PASS/FAIL verification gate using diff analysis and two model evaluators), archon-fix-github-issue-kimi (a Kimi/Pi-provider fix pipeline with smoke-validate and conditional review agents), opus-plan-kimi-build (a five-node cross-model Opus↔Kimi PIV loop), and e2e-kimi-smoke (a smoke test validating Pi/Kimi provider routing via session JSONL files).

Changes

Agentic Evaluation Gate

Layer / File(s)	Summary
Workflow metadata and get-diff node `.archon/workflows/experimental/agentic-eval-gate.yaml`	Defines workflow-level metadata with read-only config and the `get-diff` node that computes a resilient diff across worktree/base/last/root modes.
run-checks, output-eval, and trajectory-eval nodes `.archon/workflows/experimental/agentic-eval-gate.yaml`	`run-checks` conditionally runs type-check/lint (not tests) via bun or npm; `output-eval` judges diff against spec; `trajectory-eval` judges whether verification steps were actually performed.
verdict node `.archon/workflows/experimental/agentic-eval-gate.yaml`	Synthesizes both evaluations into a single structured PASS/FAIL decision; PASS requires output verdict "pass" and soundness "sound".

Kimi fix-github-issue workflow

Layer / File(s)	Summary
Workflow metadata, provider config, and issue fetch/classify `.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml`	Sets pi provider and `kimi-coding/kimi-for-coding` model; defines `extract-issue-number`, `fetch-issue`, and `classify` nodes for structured issue ingestion.
smoke-validate and conditional routing `.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml`	`smoke-validate` checks cited file/line/symbol claims against the codebase; `web-research` is gated on external research need or inaccurate claims; routing goes to `investigate` (bugs) or `plan` (non-bugs).
bridge-artifacts, implement, validate, create-pr `.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml`	Ensures `investigation.md` exists by copying `plan.md` if missing; runs `implement` and `validate`; creates a draft PR with template discovery and strict artifact-commit rules.
Review orchestration through report `.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml`	`review-scope`/`review-classify` gate conditional agents (`error-handling`, `test-coverage`, `comment-quality`, `docs-impact`); `code-review` always runs; `synthesize`, `self-fix`, `simplify`, and `report` follow sequentially.

Opus-plan / Kimi-build cross-model PIV loop

Layer / File(s)	Summary
Workflow metadata and plan node `.archon/workflows/experimental/opus-plan-kimi-build.yaml`	Defines PIV metadata with `mutates_checkout: true` and the Opus `plan` node that produces a concrete file-level plan without editing files.
implement, review, fix, re-review nodes `.archon/workflows/experimental/opus-plan-kimi-build.yaml`	Kimi `implement` applies the plan; Opus `review` produces severity-ranked findings and a fix-plan; Kimi `fix` applies minimal-diff fixes; Opus `re-review` outputs a final ship/needs-work verdict.

e2e Kimi smoke test

Layer / File(s)	Summary
Smoke workflow: metadata, test nodes, and assert `.archon/workflows/test-workflows/e2e-kimi-smoke.yaml`	Defines pi provider/kimi model, `hello`/`identify`/`json` nodes, and an `assert` node that validates output content and confirms Pi session routing via `.jsonl` files modified in the last 10 minutes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

coleam00/Archon#1438: The opus-plan-kimi-build.yaml workflow sets mutates_checkout: true, directly relying on the mutates_checkout field semantics introduced by this PR in the executor/loader.

Poem

🐇 Hop hop, four workflows land in the warren today,
Kimi and Opus take turns in the fray,
A gate checks your diff with a PASS or a FAIL,
Smoke tests sniff sessions down each JSONL trail,
The rabbit approves — let the pipelines prevail! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately summarizes the new Kimi workflow harness and eval gate.
Description check	✅ Passed	The description covers the required template sections and is mostly complete, including diagrams, validation, and risks.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…e-check/lint have() already wraps $1 in quotes when grepping package.json, so calling have '"type-check"' / have '"lint"' searched for ""type-check"" (doubled quotes) and never matched — type-check and lint always reported "skipped", blinding the trajectory eval to the very checks it must weigh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.archon/workflows/experimental/agentic-eval-gate.yaml:
- Around line 66-74: The fallback in the eval gate logic is too aggressive: when
origin/$BASE exists but HEAD has no commits ahead of it, the current MODE=last
path in the shell block still evaluates the previous commit instead of treating
the branch as having no changes. Update the decision flow in the same
conditional chain so the branch only selects MODE=base when there are commits
ahead of origin/$BASE, and otherwise reports no changes to evaluate instead of
falling back to the last commit; keep the fix localized around the STATUS_PORC,
git rev-parse/git log check, and the MODE/ LABEL assignments.
- Around line 66-95: The worktree path in the diff-gating logic drops newly
untracked files because `run_diff()` uses `git diff ... HEAD`, so additions can
be reported as no changes. Update the `MODE=worktree` branch and the `run_diff`
helper in `agentic-eval-gate.yaml` so worktree mode includes untracked paths
alongside tracked changes, and make `--name-only` report those files too. Keep
the existing `MODE`, `LABEL`, and `run_diff` structure, but ensure the worktree
comparison captures all files visible from `git status --porcelain`.

In @.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml:
- Around line 74-83: The `fetch-issue` step is interpolating model output into
the bash script before sanitization, which can allow command execution; update
the workflow so `extract-issue-number.output` is treated as data only and never
rendered directly into shell evaluation. In `fetch-issue`, read the output
through a safe quoting/escaping path or environment variable, then perform the
numeric extraction and validation before calling `gh issue view`, keeping the
hardening inside this step and preserving the existing `ISSUE_NUM` check.

In @.archon/workflows/experimental/opus-plan-kimi-build.yaml:
- Around line 154-163: Pass the original plan context into both the
fresh-context fix and re-review nodes so they can see the plan-derived
verification steps; update the prompt wiring around the reviewer output handling
in opus-plan-kimi-build.yaml to include the original plan’s “Done
means”/acceptance commands alongside $review.output and $fix.output. Keep the
existing NO FIXES NEEDED / ship guard intact, but ensure the fresh-context nodes
can reliably rerun the intended build/lint/test commands and validate against
the original criteria.

In @.archon/workflows/test-workflows/e2e-kimi-smoke.yaml:
- Around line 104-126: The session validation in the Kimi smoke workflow is too
broad because it scans any recent Pi session and can be satisfied by unrelated
concurrent runs. Update the logic around the recent_sessions/matched check to
capture a baseline before the hello step starts, then only consider session
files created or modified after that marker. Use the existing session-log grep
flow to verify provider=kimi-coding and modelId=kimi-for-coding, but restrict it
to files correlated with this run rather than any session touched in the last 10
minutes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17febcd7-186b-42fe-bad0-413d5d9b1645

📥 Commits

Reviewing files that changed from the base of the PR and between 59bbd00 and 043185a.

📒 Files selected for processing (4)

.archon/workflows/experimental/agentic-eval-gate.yaml
.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml
.archon/workflows/experimental/opus-plan-kimi-build.yaml
.archon/workflows/test-workflows/e2e-kimi-smoke.yaml

coderabbitai · 2026-06-30T15:47:40Z

+      if [ -n "$STATUS_PORC" ]; then
+        MODE=worktree; LABEL="(uncommitted working-tree changes vs HEAD)"
+      elif git rev-parse --verify --quiet "origin/$BASE" >/dev/null && [ -n "$(git log --oneline "origin/$BASE..HEAD" 2>/dev/null)" ]; then
+        MODE=base; LABEL="(this branch vs origin/$BASE)"
+      elif [ -n "$HAS_PARENT" ]; then
+        MODE=last; LABEL="(fallback: last commit)"
+      else
+        MODE=root; LABEL="(root commit: full initial tree vs empty tree)"
+      fi


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't fall back to HEAD~1..HEAD when the branch is clean and not ahead of base.

If origin/$BASE exists and HEAD has no commits ahead of it, Lines 68-71 still switch to MODE=last. That makes the gate evaluate the previous commit instead of reporting “no changes to evaluate,” so a clean synced branch can fail on historical work unrelated to the current diff.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.archon/workflows/experimental/agentic-eval-gate.yaml around lines 66 - 74, The fallback in the eval gate logic is too aggressive: when origin/$BASE exists but HEAD has no commits ahead of it, the current MODE=last path in the shell block still evaluates the previous commit instead of treating the branch as having no changes. Update the decision flow in the same conditional chain so the branch only selects MODE=base when there are commits ahead of origin/$BASE, and otherwise reports no changes to evaluate instead of falling back to the last commit; keep the fix localized around the STATUS_PORC, git rev-parse/git log check, and the MODE/ LABEL assignments.

coderabbitai · 2026-06-30T15:47:40Z

+      if [ -n "$STATUS_PORC" ]; then
+        MODE=worktree; LABEL="(uncommitted working-tree changes vs HEAD)"
+      elif git rev-parse --verify --quiet "origin/$BASE" >/dev/null && [ -n "$(git log --oneline "origin/$BASE..HEAD" 2>/dev/null)" ]; then
+        MODE=base; LABEL="(this branch vs origin/$BASE)"
+      elif [ -n "$HAS_PARENT" ]; then
+        MODE=last; LABEL="(fallback: last commit)"
+      else
+        MODE=root; LABEL="(root commit: full initial tree vs empty tree)"
+      fi
+
+      run_diff() {
+        case "$MODE" in
+          worktree) git diff "$@" HEAD ;;
+          base)     git diff "$@" "origin/$BASE...HEAD" ;;
+          last)     git diff "$@" HEAD~1..HEAD ;;
+          root)     git diff "$@" "$EMPTY_TREE" HEAD ;;
+        esac
+      }
+
+      echo "=== DIFF ==="
+      echo "$LABEL"
+      DIFF_OUT="$(run_diff)"
+      if [ -n "$DIFF_OUT" ]; then
+        echo "$DIFF_OUT"
+      else
+        echo "NO CHANGES DETECTED - working tree clean and nothing to compare against."
+        echo "Evaluators: state there is nothing to evaluate; do not invent findings."
+      fi
+      echo "=== CHANGED FILES ==="
+      run_diff --name-only


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Include untracked files in worktree mode.

git status --porcelain will enter MODE=worktree for newly added files, but git diff HEAD and git diff --name-only HEAD omit untracked paths. A diff that only adds new files can therefore show up as “NO CHANGES DETECTED” or miss files the evaluators need to inspect.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.archon/workflows/experimental/agentic-eval-gate.yaml around lines 66 - 95, The worktree path in the diff-gating logic drops newly untracked files because `run_diff()` uses `git diff ... HEAD`, so additions can be reported as no changes. Update the `MODE=worktree` branch and the `run_diff` helper in `agentic-eval-gate.yaml` so worktree mode includes untracked paths alongside tracked changes, and make `--name-only` report those files too. Keep the existing `MODE`, `LABEL`, and `run_diff` structure, but ensure the worktree comparison captures all files visible from `git status --porcelain`.

coderabbitai · 2026-06-30T15:47:40Z

+  - id: fetch-issue
+    bash: |
+      # Strip quotes, whitespace, markdown backticks from AI output
+      ISSUE_NUM=$(echo "$extract-issue-number.output" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1)
+      if [ -z "$ISSUE_NUM" ]; then
+        echo "Failed to extract issue number from: $extract-issue-number.output" >&2
+        exit 1
+      fi
+      gh issue view "$ISSUE_NUM" --json title,body,labels,comments,state,url,author
+    depends_on: [extract-issue-number]


🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Avoid interpolating model output directly into bash.

Line 77 sanitizes after shell evaluation. If $extract-issue-number.output is rendered into the script, output like $(...) or backticks can execute before tr/grep, turning a malformed model response into runner command execution.

🛡️ Proposed hardening

- id: fetch-issue bash: | # Strip quotes, whitespace, markdown backticks from AI output - ISSUE_NUM=$(echo "$extract-issue-number.output" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1) + ISSUE_RAW=$(cat <<'__ARCHON_ISSUE_NUMBER_OUTPUT__' +$extract-issue-number.output +__ARCHON_ISSUE_NUMBER_OUTPUT__ + ) + ISSUE_NUM=$(printf '%s\n' "$ISSUE_RAW" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1) if [ -z "$ISSUE_NUM" ]; then - echo "Failed to extract issue number from: $extract-issue-number.output" >&2 + printf 'Failed to extract issue number from: %s\n' "$ISSUE_RAW" >&2 exit 1 fi gh issue view "$ISSUE_NUM" --json title,body,labels,comments,state,url,author

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- id: fetch-issue

bash: |

# Strip quotes, whitespace, markdown backticks from AI output

ISSUE_NUM=$(echo "$extract-issue-number.output" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1)

if [ -z "$ISSUE_NUM" ]; then

echo "Failed to extract issue number from: $extract-issue-number.output" >&2

exit 1

fi

gh issue view "$ISSUE_NUM" --json title,body,labels,comments,state,url,author

depends_on: [extract-issue-number]

- id: fetch-issue

bash: |

# Strip quotes, whitespace, markdown backticks from AI output

ISSUE_RAW=$(cat <<'__ARCHON_ISSUE_NUMBER_OUTPUT__'

$extract-issue-number.output

__ARCHON_ISSUE_NUMBER_OUTPUT__

)

ISSUE_NUM=$(printf '%s\n' "$ISSUE_RAW" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1)

if [ -z "$ISSUE_NUM" ]; then

printf 'Failed to extract issue number from: %s\n' "$ISSUE_RAW" >&2

exit 1

fi

gh issue view "$ISSUE_NUM" --json title,body,labels,comments,state,url,author

depends_on: [extract-issue-number]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.archon/workflows/experimental/archon-fix-github-issue-kimi.yaml around lines 74 - 83, The `fetch-issue` step is interpolating model output into the bash script before sanitization, which can allow command execution; update the workflow so `extract-issue-number.output` is treated as data only and never rendered directly into shell evaluation. In `fetch-issue`, read the output through a safe quoting/escaping path or environment variable, then perform the numeric extraction and validation before calling `gh issue view`, keeping the hardening inside this step and preserving the existing `ISSUE_NUM` check.

coderabbitai · 2026-06-30T15:47:40Z

+      ## Reviewer output (findings + fix plan + verdict)
+
+      $review.output
+
+      ## Instructions
+
+      1. If the Fix Plan section is `NO FIXES NEEDED` (or the verdict is `ship` with no
+         fix items), make NO changes and say so — stop here.
+      2. Otherwise apply each fix-plan item precisely. Read each file before editing.
+      3. Re-run any build/lint/test command and confirm it passes after your edits.


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Pass the original plan into the fresh-context fix and re-review nodes.

Both nodes reference plan-derived verification, but context: fresh means they only see $review.output / $fix.output. If the review does not restate the plan’s “Done means” commands, Kimi cannot reliably rerun them, and Opus cannot validate the final state against the original acceptance criteria.

Proposed prompt context fix

## Reviewer output (findings + fix plan + verdict) $review.output + + ## Original Plan + + $plan.output ## Instructions @@ ## Your earlier review (findings + fix plan) $review.output ## What the implementer reported doing $fix.output + + ## Original Plan + + $plan.output

Also applies to: 178-191

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.archon/workflows/experimental/opus-plan-kimi-build.yaml around lines 154 - 163, Pass the original plan context into both the fresh-context fix and re-review nodes so they can see the plan-derived verification steps; update the prompt wiring around the reviewer output handling in opus-plan-kimi-build.yaml to include the original plan’s “Done means”/acceptance commands alongside $review.output and $fix.output. Keep the existing NO FIXES NEEDED / ship guard intact, but ensure the fresh-context nodes can reliably rerun the intended build/lint/test commands and validate against the original criteria.

coderabbitai · 2026-06-30T15:47:40Z

+      recent_sessions=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -10 -print 2>/dev/null)
+      if [ -z "$recent_sessions" ]; then
+        echo "FAIL: no Pi session jsonl modified in the last 10 minutes"
+        exit 1
+      fi
+
+      matched=""
+      while IFS= read -r session; do
+        # Two separate greps for order-independence — JSON field ordering
+        # isn't part of Pi's contract, so a single regex with `.*` between
+        # the two fields would silently false-FAIL if Pi ever reorders.
+        if grep -q '"provider":"kimi-coding"' "$session" \
+           && grep -q '"modelId":"kimi-for-coding"' "$session"; then
+          matched="$session"
+          break
+        fi
+      done <<< "$recent_sessions"
+
+      if [ -n "$matched" ]; then
+        echo "PASS: Pi session log confirms provider=kimi-coding, modelId=kimi-for-coding"
+        echo "      session: $matched"
+      else
+        echo "FAIL: no recent Pi session log confirmed kimi routing — possible misroute"


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Correlate the session log to this run, not any recent Kimi call.

This check passes on any Kimi session touched in the last 10 minutes, so an unrelated or concurrent kimi-coding/kimi-for-coding run can make the smoke test PASS even if these nodes were misrouted. Capture a baseline before hello starts (timestamp or file list) and only inspect session files created/updated after that marker.

Suggested direction

+ - id: mark-session-baseline + bash: | + baseline="$(mktemp)" + touch "$baseline" + echo "{\"baseline\":\"$baseline\"}" + output_format: + type: object + properties: + baseline: + type: string + required: [baseline] + - id: hello + depends_on: [mark-session-baseline] prompt: 'What is 2+2? Answer with just the number, nothing else.' allowed_tools: [] effort: low idle_timeout: 60000 ... - - id: assert - depends_on: [hello, identify, json] + - id: assert + depends_on: [mark-session-baseline, hello, identify, json] bash: | + baseline="$mark-session-baseline.output.baseline" ... - recent_sessions=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -10 -print 2>/dev/null) + recent_sessions=$(find "$HOME/.pi/agent/sessions" -type f -name '*.jsonl' -newer "$baseline" -print 2>/dev/null)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

recent_sessions=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -10 -print 2>/dev/null)

if [ -z "$recent_sessions" ]; then

echo "FAIL: no Pi session jsonl modified in the last 10 minutes"

exit 1

fi

matched=""

while IFS= read -r session; do

# Two separate greps for order-independence — JSON field ordering

# isn't part of Pi's contract, so a single regex with `.*` between

# the two fields would silently false-FAIL if Pi ever reorders.

if grep -q '"provider":"kimi-coding"' "$session" \

&& grep -q '"modelId":"kimi-for-coding"' "$session"; then

matched="$session"

break

fi

done <<< "$recent_sessions"

if [ -n "$matched" ]; then

echo "PASS: Pi session log confirms provider=kimi-coding, modelId=kimi-for-coding"

echo " session: $matched"

else

echo "FAIL: no recent Pi session log confirmed kimi routing — possible misroute"

baseline="$mark-session-baseline.output.baseline"

recent_sessions=$(find "$HOME/.pi/agent/sessions" -type f -name '*.jsonl' -newer "$baseline" -print 2>/dev/null)

if [ -z "$recent_sessions" ]; then

echo "FAIL: no Pi session jsonl modified in the last 10 minutes"

exit 1

fi

matched=""

while IFS= read -r session; do

# Two separate greps for order-independence — JSON field ordering

# isn't part of Pi's contract, so a single regex with `.*` between

# the two fields would silently false-FAIL if Pi ever reorders.

if grep -q '"provider":"kimi-coding"' "$session" \

&& grep -q '"modelId":"kimi-for-coding"' "$session"; then

matched="$session"

break

fi

done <<< "$recent_sessions"

if [ -n "$matched" ]; then

echo "PASS: Pi session log confirms provider=kimi-coding, modelId=kimi-for-coding"

echo " session: $matched"

else

echo "FAIL: no recent Pi session log confirmed kimi routing — possible misroute"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.archon/workflows/test-workflows/e2e-kimi-smoke.yaml around lines 104 - 126, The session validation in the Kimi smoke workflow is too broad because it scans any recent Pi session and can be satisfied by unrelated concurrent runs. Update the logic around the recent_sessions/matched check to capture a baseline before the hello step starts, then only consider session files created or modified after that marker. Use the existing session-log grep flow to verify provider=kimi-coding and modelId=kimi-for-coding, but restrict it to files correlated with this run rather than any session touched in the last 10 minutes.

narutomugens-byte marked this pull request as ready for review June 30, 2026 15:34

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflows): Kimi cross-model harness + eval gate (MiniMax parity)#2049

feat(workflows): Kimi cross-model harness + eval gate (MiniMax parity)#2049
narutomugens-byte wants to merge 2 commits into
coleam00:devfrom
narutomugens-byte:archon/thread-f8b06149

narutomugens-byte commented Jun 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Uh oh!

coderabbitai Bot Jun 30, 2026

Uh oh!

coderabbitai Bot Jun 30, 2026

Uh oh!

coderabbitai Bot Jun 30, 2026

Uh oh!

coderabbitai Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

narutomugens-byte commented Jun 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

narutomugens-byte commented Jun 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading