fix(workflows): restore interactive-loop cross-gate session continuity (reverts #1923) by ianstantiate · Pull Request #2005 · coleam00/Archon

ianstantiate · 2026-06-16T03:46:52Z

Summary

Problem: #1923 forced a fresh Claude session on the first iteration after every interactive-loop approval gate, so the agent lost all memory of its prior turns across the gate (tracked in #2004).
Why it matters: It breaks the single most valuable use of interactive loops — a multi-turn conversation / interview / iterative refinement that a human steers across gates. Every post-gate turn started with no memory of the prior turns, seeing only $LOOP_USER_INPUT.
What changed: Removed the (isLoopResume && i === startIteration) term from needsFreshSession in dag-executor.ts, restoring loop.fresh_context || i === 1; reinstated the resume test assertion (sessionArg === 'loop-session-1'); removed fix(workflows): interactive loop resume uses fresh session on first iteration #1923's regression test that encoded the fresh-session behavior; reverted the loop-nodes.md doc note.
What did NOT change (scope boundary): #1291's fail-loud isError handling, fresh_context: true loops, non-interactive loops, and persist_session loops are all untouched. This PR does not attempt to diagnose or fix the original #1208 crash — that is separate work.

UX Journey

Before

User                     Archon (interactive loop)          Claude SDK
────                     ─────────────────────────          ──────────
approve gate "feedback" ▶ resume at iteration 2
                          needsFreshSession = TRUE           (isLoopResume && i === startIteration)
                          resumeSessionId = undefined ──────▶ FRESH session — no memory of iter 1
sees context-free reply ◀─ agent only has $LOOP_USER_INPUT

After

User                     Archon (interactive loop)          Claude SDK
────                     ─────────────────────────          ──────────
approve gate "feedback" ▶ resume at iteration 2
                          [needsFreshSession = false]        (fresh_context=false, i>1)
                          [resumeSessionId = gate session] ─▶ RESUMES the iteration-1 session
sees context-aware reply◀─ agent retains prior turns + the new $LOOP_USER_INPUT

Architecture Diagram

Before

dag-executor.ts :: executeLoopNode  (isLoopResume=true, startIteration=2)
  needsFreshSession = fresh_context || i===1 || (isLoopResume && i===startIteration)  ← TRUE on resume
  resumeSessionId   = undefined
  aiClient.sendQuery(prompt, cwd, undefined, opts)   ← fresh session, prior context lost

After

dag-executor.ts :: executeLoopNode  (isLoopResume=true, startIteration=2)
  [~] needsFreshSession = fresh_context || i===1        ← false on resume
  resumeSessionId       = currentSessionId (gate sessionId)
  aiClient.sendQuery(prompt, cwd, gateSessionId, opts)  ← threads the stored session

Connection inventory:

From	To	Status	Notes
`dag-executor.ts::executeLoopNode`	`aiClient.sendQuery`	modified	`resumeSessionId` again threads the stored gate session on the first resumed iteration (was forced to `undefined` by #1923)
`dag-executor.ts::executeLoopNode`	`loopGateMeta.sessionId`	modified	now passed through as `resumeSessionId` again (was read then discarded)
loop iteration result handler	#1291 `isError` fail-loud throw	unchanged	a genuine resume failure still throws loudly with `errors[]`

Label Snapshot

Risk: risk: low
Size: size: XS
Scope: workflows (+ docs, tests)
Module: workflows:dag-executor (executeLoopNode)

Change Metadata

Change type: bug (revert of a regression)
Primary scope: workflows

Linked Issue

Validation Evidence (required)

bun run validate was run on this branch. Per-step results:

bun run check:bundled         # ✅ pass
bun run check:bundled-skill   # ✅ pass
bun run check:bundled-schema  # ✅ pass
bun run check:pi-vendor-map   # ✅ pass (OK)
bun run type-check            # ✅ exit 0 — all 10 packages
bun run lint --max-warnings 0 # ✅ exit 0
bun run format:check          # ✅ changed files clean (see note)
bun run test                  # ✅ 5144 pass / 0 fail — all packages

Every bun run validate step passes. The full per-package test suite is 5144 pass / 0 fail.
format:check note: the tree-wide prettier --check . only warns on untracked local files that are not part of this PR and don't exist in a clean checkout/CI. The 3 files this PR changes are Prettier-clean (bun x prettier --check on them → "All matched files use Prettier code style!"), and the pre-commit hook (lint-staged: prettier + eslint) ran clean on them at commit time.
The revert diff is the exact inverse of fix(workflows): interactive loop resume uses fresh session on first iteration #1923 on dag-executor.ts and loop-nodes.md, and was verified byte-identical to a true git revert of fix(workflows): interactive loop resume uses fresh session on first iteration #1923.

Security Impact (required)

New permissions/capabilities? No
New external network calls? No
Secrets/tokens handling changed? No
File system access scope changed? No
This is a behavior revert in the workflow loop executor; no security surface is touched.

Compatibility / Migration

Backward compatible? Yes — restores the behavior shipped in the latest tagged release; the reverted behavior was dev-only and unreleased.
Config/env changes? No
Database migration needed? No

Human Verification (required)

Verified scenarios: confirmed the change is the exact inverse of fix(workflows): interactive loop resume uses fresh session on first iteration #1923 (needsFreshSession term removed, test assertion restored to 'loop-session-1', fix(workflows): interactive loop resume uses fresh session on first iteration #1923 regression test removed, docs note reverted); type-check/lint/loop-test green; fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291's "fails loudly on error_during_execution" test still passes.
Edge cases checked: fresh_context: true and non-interactive loop paths are logically unchanged (their needsFreshSession terms are untouched); fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291 fail-loud path unchanged.
What was not verified: a live end-to-end interactive-loop resume against a real provider (no live run was performed). The full bun run validate and per-package test suite (5144 pass / 0 fail) were run — see Validation Evidence.

Side Effects / Blast Radius (required)

Affected subsystems/workflows: workflow-engine loop execution only (executeLoopNode); any interactive: true loop that resumes after a gate.
Potential unintended effects: environments that genuinely hit the original #1208 error_during_execution will once again attempt a session resume on the first post-gate iteration. With #1291 in place this now fails loudly with the real SDK errors[] rather than being masked — which is the intended behavior, and the data needed to actually diagnose fix: interactive loop resume crashes with error_during_execution (stale session) #1208.
Guardrails/monitoring: fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291's loop_node.iteration_sdk_error log + loop_iteration_failed event surface any genuine resume failure with the SDK error strings.

Rollback Plan (required)

Fast rollback: revert this PR's merge commit (3-file change, +5/−93) — re-applies fix(workflows): interactive loop resume uses fresh session on first iteration #1923's fresh-on-resume behavior.
Feature flags / toggles: none (intentionally no new config; an opt-in flag was considered and dropped per YAGNI since the change is unreleased).
Observable failure symptoms: if restoring continuity surfaced the unconfirmed fix: interactive loop resume crashes with error_during_execution (stale session) #1208 crash for some environment, it would appear as a loud error_during_execution failure on the first post-gate iteration (with errors[] detail), not a silent regression.

Risks and Mitigations

Risk: the original, unconfirmed fix: interactive loop resume crashes with error_during_execution (stale session) #1208 error_during_execution (suspected environmental — Docker/VPS, Slack batch-streaming, CLAUDE_CODE_OAUTH_TOKEN refresh, MCP/tool/network) could resurface for affected users once session resume is restored.
- Mitigation: fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291's fail-loud handling now surfaces the actual SDK error strings so the real cause can be identified; fixing fix: interactive loop resume crashes with error_during_execution (stale session) #1208 is tracked separately and intentionally out of scope here. As a stopgap, affected loops can set fresh_context: true (every iteration fresh) until fix: interactive loop resume crashes with error_during_execution (stale session) #1208 is properly diagnosed.

Summary by CodeRabbit

Bug Fixes
- Fixed loop node session management during interactive resume: session IDs are now properly preserved when resuming iterations, unless fresh_context is explicitly enabled.
Documentation
- Updated loop node guide to clarify fresh_context behavior during iterations.

Reverts the change from coleam00#1923, which forced a fresh Claude session on the first iteration after every interactive-loop approval gate by adding `(isLoopResume && i === startIteration)` to `needsFreshSession` in dag-executor.ts. That broke cross-gate conversation continuity: every turn after a human gate began a context-free session carrying only $LOOP_USER_INPUT (with $LOOP_PREV_OUTPUT empty), making multi-turn interactive loops (interviews, iterative refinement) impossible. Restores the released behavior `needsFreshSession = loop.fresh_context || i === 1`, reinstates the test assertion guarding cross-gate continuity (sessionArg === 'loop-session-1'), and removes coleam00#1923's regression test that encoded the fresh-session behavior. coleam00#1291's fail-loud isError handling is left untouched. Confirming and fixing the original coleam00#1208 crash is separate work. Closes coleam00#2004

coderabbitai · 2026-06-16T03:47:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 25855e56-a5ef-45a8-a9aa-e9e1cd000398

📥 Commits

Reviewing files that changed from the base of the PR and between e77a338 and d6cb519.

📒 Files selected for processing (3)

packages/docs-web/src/content/docs/guides/loop-nodes.md
packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts

📝 Walkthrough

Walkthrough

Reverts a regression introduced in PR #1923: the needsFreshSession expression in executeLoopNode is simplified to loop.fresh_context || i === 1, removing the clause that forced a fresh AI session on the first iteration after an interactive loop resume. The test is updated to assert the stored gate session id is reused, and the corresponding docs note is trimmed.

Changes

Interactive Loop Session Threading Regression Fix

Layer / File(s)	Summary
Loop session freshness logic and test `packages/workflows/src/dag-executor.ts`, `packages/workflows/src/dag-executor.test.ts`	`needsFreshSession` is changed to `loop.fresh_context \|\| i === 1`, dropping the `isLoopResume && i === startIteration` clause. The test assertion for the resumed iteration is updated to expect the stored gate session id (`loop-session-1`) instead of `undefined`.
`fresh_context` docs cleanup `packages/docs-web/src/content/docs/guides/loop-nodes.md`	Removes the sentence describing the always-fresh first-resumed-iteration behavior and the `$LOOP_USER_INPUT` note, matching the restored runtime behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

coleam00/Archon#1923: Directly introduced the isLoopResume && i === startIteration clause in executeLoopNode that this PR reverts, along with the docs wording and test expectations being changed here.
coleam00/Archon#1367: Also modifies executeLoopNode interactive loop resume first-iteration behavior in dag-executor.ts/dag-executor.test.ts, specifically the $LOOP_PREV_OUTPUT substitution on the same resumed iteration path.

Poem

🐇 Hop, hop, the session lives on,
No fresh start when the gate is gone!
The context flows from turn to turn,
A regression reversed — lesson learned.
The loop remembers every word I've said,
No more context lost, no blank slate dread! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main change: reverting PR `#1923` to restore session continuity in interactive loops after approval gates.
Description check	✅ Passed	The description comprehensively covers all template sections: problem, impact, changes made, scope boundaries, UX journey, architecture diagrams, metadata, validation evidence, security, compatibility, human verification, side effects, rollback plan, and risks.
Linked Issues check	✅ Passed	The PR fully satisfies all four coding objectives from issue `#2004`: removes the `(isLoopResume && i === startIteration)` clause, restores session continuity with stored gate session IDs, preserves the original behavior with `fresh_context` handling intact, and clearly maintains scope boundaries.
Out of Scope Changes check	✅ Passed	All three file changes are directly in-scope: dag-executor.ts session logic revert, dag-executor.test.ts regression test removal, and loop-nodes.md documentation revert. No unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Wirasm · 2026-06-26T12:51:41Z

Thanks @ianstantiate — good catch, and the right call. #1923 traded away the single most valuable interactive-loop behavior (cross-gate session continuity) to "fix" #1208, and this restores it cleanly. It also lines up with the original #1208 investigation: error_during_execution arrives as a yielded isError: true result (environmental — Docker/VPS, OAuth refresh, etc.), not the thrown stale-session rejection — so forcing a fresh session on resume was never actually addressing the root cause.

I ran a multi-agent review over the diff. Summary below.

✅ Confirmed solid

The revert is logically correct and complete — condition back to loop.fresh_context || i === 1, no orphaned references, and the restored 'loop-session-1' assertion matches the logic.
fix(workflows): fail loudly on SDK isError results in DAG and loop nodes #1291's fail-loud path is untouched and verified: for Claude, a stale gate session → error_during_execution → isError: true → throws → loop_iteration_failed. Loud, not silent — so the revert is safe for the provider fix: interactive loop resume crashes with error_during_execution (stale session) #1208 was reported on.
Deleting fix(workflows): interactive loop resume uses fresh session on first iteration #1923's regression test is correct (it encoded the now-abandoned contract), and the loop-nodes.md revert is accurate.

Recommended hardening (non-blocking — small, just to lock it in)

I1 — the 2-call integration test never asserts the resumed session id. One-liner at that test: expect(resumeSessionArg).toBe('loop-session-1').
I2 — the re-exposed fix: interactive loop resume crashes with error_during_execution (stale session) #1208 path isn't tested on the resumed-interactive path (the existing loud-fail test only covers a non-interactive fresh run). A ~45-line test — resumed interactive loop + stale session → passes the stale id, fails loudly, one iteration only — would document the restored contract and guard against silent regression.
I3 — the bare // Session threading comment no longer records why resume threads the stored session (the fix(workflows): interactive loop resume uses fresh session on first iteration #1923 comment explaining the opposite was removed). A 1–2 line note would help, e.g. "on resume startIteration > 1 so i === 1 is false → the stored session threads forward intentionally; a stale session surfaces via the fail-loud guard below." Saves the next reader from guessing intent vs. oversight.
S1 — restore the deleted test's no-failure-events assertion into the kept test (~10 lines) — recovers its most valuable invariant.
S2 — docs: the ### interactive section doesn't mention that sessions thread across gates (the core restored UX), and there's no documented fresh_context: true stopgap for environments that still hit fix: interactive loop resume crashes with error_during_execution (stale session) #1208. Worth adding both.
S3 — heads up: the branch is ~10 commits behind dev and mergeability shows UNKNOWN; a rebase + re-check before merge.

One follow-up worth filing (pre-existing, not introduced here)

For non-Claude providers (Codex/Pi/Copilot/OpenCode), a cold-resume falls back to a fresh session and emits a ⚠️ system chunk — but executeLoopNode's stream loop has no msg.type === 'system' handler, so that warning is silently dropped (and #1842's cold-resume signal doesn't reach the loop path). So the observability net is real for Claude but absent for the others on the loop path. Not a blocker for this PR — just flagging it for a separate issue.

Net: the core revert is good to go. Happy for you to take the I1–I3 polish if you'd like (they're small) — or if you'd rather, hand it over and I'll finish it off. Whichever's easier. Either way, thanks for catching this.

Wirasm · 2026-06-27T10:36:28Z

Taking this over to get it across the line — thanks @ianstantiate! I rebased your fix onto current dev (where dag-executor.ts has since changed via #1842 et al.), re-validated it (type-check + 286 dag-executor tests green), and opened it as #2046 with your commit authorship preserved. The fork-PR CI wasn't running here, so re-homing on origin lets the full suite run automatically. Closing in favor of #2046 (Closes #2004). Appreciate the clean, well-documented revert!

…y (reverts #1923) (#2046) Reverts #1923 so interactive loops re-thread the stored gate session on the first post-gate iteration, restoring cross-gate conversation continuity. Closes #2004. Takeover of #2005. Co-authored-by: ianstantiate <251394470+ianstantiate@users.noreply.github.com>

Wirasm mentioned this pull request Jun 27, 2026

fix(workflows): restore interactive-loop cross-gate session continuity (reverts #1923) #2046

Merged

Wirasm closed this Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflows): restore interactive-loop cross-gate session continuity (reverts #1923)#2005

fix(workflows): restore interactive-loop cross-gate session continuity (reverts #1923)#2005
ianstantiate wants to merge 1 commit into
coleam00:devfrom
ianstantiate:fix/revert-1923-loop-session

ianstantiate commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Wirasm commented Jun 26, 2026

Uh oh!

Wirasm commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ianstantiate commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Wirasm commented Jun 26, 2026

✅ Confirmed solid

Recommended hardening (non-blocking — small, just to lock it in)

One follow-up worth filing (pre-existing, not introduced here)

Uh oh!

Wirasm commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ianstantiate commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading