fix(cache): Coarsely fence the compute stream behind the host loadback stream on. by LorrinWWW · Pull Request #370 · lightseekorg/tokenspeed

LorrinWWW · 2026-06-06T18:27:53Z

Summary

retraction→reload→resume-decode (and while loadback is in flight), since a captured decode graph can't honor the per-layer wait_until whose events live on the non-captured load_stream, allowing the replay to read KV mid-reload.

Test Plan

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9297222c87

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-08T04:17:49Z

+            if recovery_with_loadback or host_exec.ack_load_queue:
+                self.model_executor.execution_stream.wait_stream(host_exec.load_stream)


Avoid fencing eager loadbacks behind the full load stream

When an EXTEND/MIXED iteration submits a LoadBackOp, host_exec.ack_load_queue already contains the just-enqueued loadback because submit_plan()/flush() ran above, so this condition serializes the entire execution_stream behind load_stream even though recovery_with_loadback is false. Those eager forwards can use the layerwise consumers set immediately above to overlap compute with per-layer reloads; waiting for the whole loadback here removes that overlap and can significantly stall long prefill/mixed recovery batches. If the intent is only to fence prior in-flight loadbacks plus pure-decode graph replays, this needs to distinguish pre-existing acks from the current eager loadback.

Useful? React with 👍 / 👎.

loadback-decode-graph-fence

5c508a5

LorrinWWW requested a review from a team as a code owner June 6, 2026 18:27

Merge branch 'main' into jue/loadback-decode-graph-fence

9297222

chatgpt-codex-connector Bot reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cache): Coarsely fence the compute stream behind the host loadback stream on.#370

fix(cache): Coarsely fence the compute stream behind the host loadback stream on.#370
LorrinWWW wants to merge 2 commits into
mainfrom
jue/loadback-decode-graph-fence

LorrinWWW commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if recovery_with_loadback or host_exec.ack_load_queue:
		self.model_executor.execution_stream.wait_stream(host_exec.load_stream)

Conversation

LorrinWWW commented Jun 6, 2026

Summary

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant