Commit ce9f2f6
committed
fix(scheduler): skip mid-flight prefix publish for sliding-window models
The prior commit's mid-flight prefix publish regressed gpt-oss-120b GPQA-diamond (~0.71 -> 0.547, both NVIDIA + AMD): gpt-oss is non-hybrid and uses sliding-window attention, where sharing a prefix mid-flight (while the publishing request is still decoding) corrupts SWA prefix reuse. Full-attention prefix caching (ut-runtime-prefix-cache-e2e) and hybrid/MLA models were unaffected.
Add has_sliding_window to SchedulerConfig, derived in event_loop.py from hf_config.sliding_window (mirroring ModelRunner's SWA detection). For SWA models the scheduler passes a null kv_prefix_cache to SchedulePrefillEvent/ScheduleDecodeEvent so InsertPrefixCache skips the mid-flight publish; the prefix is published only at FinishEvent -- the prior, known-correct behavior. Full-attention non-hybrid models keep the mid-flight reuse; hybrid (DeepSeek-V4) is unchanged.
Signed-off-by: Qingyang Wu <willqywu@gmail.com>1 parent 8373813 commit ce9f2f6
6 files changed
Lines changed: 24 additions & 5 deletions
File tree
- python/tokenspeed/runtime/engine
- tokenspeed-scheduler
- bindings
- csrc
- fsm
- scheduler
- operations
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
182 | 186 | | |
183 | 187 | | |
184 | 188 | | |
| |||
327 | 331 | | |
328 | 332 | | |
329 | 333 | | |
| 334 | + | |
330 | 335 | | |
331 | 336 | | |
332 | 337 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
| |||
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
240 | 241 | | |
241 | 242 | | |
242 | 243 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | | - | |
113 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
Lines changed: 7 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
| 194 | + | |
| 195 | + | |
194 | 196 | | |
195 | | - | |
| 197 | + | |
| 198 | + | |
196 | 199 | | |
197 | 200 | | |
198 | 201 | | |
| |||
217 | 220 | | |
218 | 221 | | |
219 | 222 | | |
| 223 | + | |
220 | 224 | | |
221 | | - | |
| 225 | + | |
| 226 | + | |
222 | 227 | | |
223 | 228 | | |
224 | 229 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
104 | 108 | | |
105 | 109 | | |
106 | 110 | | |
| |||
0 commit comments