refactor(spec-decode): simplify Qwen3.5 NextN attention path for #217 (2/3) by rjzhb · Pull Request #429 · lightseekorg/tokenspeed

rjzhb · 2026-06-12T05:10:12Z

Summary

This is the second PR in the series refactoring the spec-decode attention path from #217. It applies the same base-hook pattern introduced in #390 (1/3) for Llama Eagle3, now to Qwen3.5 NextN.

Qwen3_5DraftForCausalLM / Qwen3_5DraftAttentionDecoderLayer collapse the bespoke draft forward path into a two-method subclass of the base Qwen3_5AttentionDecoderLayer:

_attn overrides the draft first-step dispatch path: correction + q-slice + DECODE.
Inactive steps delegate to super()._attn.

The correction logic, spec_num_tokens - accept_lengths trimming of draft_seq_lens_buf, now lives in a single _apply_correction method next to its only consumer, and is plumbed through ForwardContext, mirroring #390 (1/3). _maybe_narrow_residual handles the NextN residual narrowing.

This is restricted to single-layer drafts for now, asserted in __init__. _apply_correction mutates per-layer state, so multi-layer NextN support needs the trim to be hoisted before this restriction can be relaxed.

Signed-off-by: rjzhb <rjzhb222@163.com>

…ntion-hooks

Signed-off-by: rjzhb <rjzhb222@163.com>

…tion-hooks # Conflicts: # python/tokenspeed/runtime/execution/drafter/eagle.py # python/tokenspeed/runtime/models/llama_eagle3.py

Signed-off-by: rjzhb <rjzhb222@163.com>

…tion-hooks

This reverts commit 404279b.

Signed-off-by: rjzhb <rjzhb222@163.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d52f2f7bc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: rjzhb <rjzhb222@163.com>

…kv_cache Signed-off-by: rjzhb <rjzhb222@163.com>

Signed-off-by: rjzhb <rjzhb222@163.com>

rjzhb and others added 17 commits June 9, 2026 04:20

refactor(spec-decode): wrap Eagle3 attention via base llama._attn

d7882e7

Signed-off-by: rjzhb <rjzhb222@163.com>

update

6fbab67

Signed-off-by: rjzhb <rjzhb222@163.com>

feat(spec-decode): extend Llama Eagle3 dispatch B to prefill catch-up

399b793

Signed-off-by: rjzhb <rjzhb222@163.com>

update

90ec73a

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge branch 'main' into refactor/llama-attention-hooks

3dbdb30

fix(spec-decode): correct LlamaForCausalLMEagle3 import path

2ca7ebe

Signed-off-by: rjzhb <rjzhb222@163.com>

fix(spec-decode): cover EXTEND/MIXED catch-up in dispatch B flag broaden

cc698bf

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge remote-tracking branch 'upstream/main' into refactor/llama-atte…

b0b485d

…ntion-hooks

Merge branch 'main' into refactor/llama-attention-hooks

bad01b8

fix(spec-decode): fall back when fused KV prewrite arg is None

5ac5d79

Signed-off-by: rjzhb <rjzhb222@163.com>

refactor(spec-decode): wrap Qwen3.5 NextN attention via base hooks

555252f

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge remote-tracking branch 'upstream/main' into refactor/qwen-atten…

f01a2ab

…tion-hooks # Conflicts: # python/tokenspeed/runtime/execution/drafter/eagle.py # python/tokenspeed/runtime/models/llama_eagle3.py

fix(qwen3.5-nextn): drop idle-mode early return in draft attn

fafc901

Signed-off-by: rjzhb <rjzhb222@163.com>

fix(spec-decode): pad decode seq_lens to spec_num_tokens

404279b

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge remote-tracking branch 'upstream/main' into refactor/qwen-atten…

c8f0632

…tion-hooks

Revert "fix(spec-decode): pad decode seq_lens to spec_num_tokens"

e9e39d0

This reverts commit 404279b.

fix(spec-decode): clamp catch-up trim result to avoid negative seq_lens

ca10fca

Signed-off-by: rjzhb <rjzhb222@163.com>

rjzhb marked this pull request as ready for review June 12, 2026 23:26

rjzhb requested a review from a team as a code owner June 12, 2026 23:27

LorrinWWW reviewed Jun 12, 2026

View reviewed changes

Comment thread python/tokenspeed/runtime/models/qwen3_5_nextn.py

rjzhb and others added 5 commits June 12, 2026 23:50

chore(qwen3.5-nextn): move sigmoid_mul import below first-party group

263ef70

Signed-off-by: rjzhb <rjzhb222@163.com>

chore(qwen3.5-nextn): drop isort skip markers, accept canonical order

4364323

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge branch 'main' into refactor/qwen-attention-hooks

e2c032f

fix(pd): record draft layerwise cache step on EXTEND catch-up

72ddde0

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge branch 'main' into refactor/qwen-attention-hooks

4d52f2f

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread python/tokenspeed/runtime/models/qwen3_5_nextn.py Outdated

rjzhb added 4 commits June 14, 2026 06:46

fix(spec-decode): route draft attn via self.attn; guard step_counter

2806ea4

Signed-off-by: rjzhb <rjzhb222@163.com>

docs(spec-decode): clarify draft catch-up attention comments

b9645c4

Signed-off-by: rjzhb <rjzhb222@163.com>

refactor(spec-decode): centralize draft cache-step record via record_…

af1ab65

…kv_cache Signed-off-by: rjzhb <rjzhb222@163.com>

refactor(attn): share pre/post cache-step record via record_cache_step

f8c9490

Signed-off-by: rjzhb <rjzhb222@163.com>

rjzhb and others added 2 commits June 17, 2026 19:30

refactor(attn): rename helper to record_pd_cache_step

d609578

Signed-off-by: rjzhb <rjzhb222@163.com>

Merge branch 'main' into refactor/qwen-attention-hooks

3290daf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(spec-decode): simplify Qwen3.5 NextN attention path for #217 (2/3)#429

refactor(spec-decode): simplify Qwen3.5 NextN attention path for #217 (2/3)#429
rjzhb wants to merge 28 commits into
lightseekorg:mainfrom
rjzhb:refactor/qwen-attention-hooks

rjzhb commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rjzhb commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rjzhb commented Jun 12, 2026 •

edited

Loading