Commit 8373813
committed
fix(scheduler): publish prefix to radix tree during prefill for non-hybrid models
A request's prompt-prefix KV was inserted into the shared device radix tree only at FinishEvent for non-hybrid models: the mid-flight InsertHybridCache early-returned when hybrid_prefix_cache_ was null (every non-DeepSeek-V4/Mamba model). A burst of concurrent requests sharing a prefix (RL rollouts with N samples/prompt, or a shared chat-template/system prefix) therefore all prefilled before any finished -> ~0% prefix-cache reuse, vs ~26% for SGLang which publishes during prefill (cache_unfinished_req).
Rename InsertHybridCache -> InsertPrefixCache; publish the freshly-computed prefix through the base KV prefix cache when there is no hybrid cache (hybrid path unchanged: still via hybrid_cache->GetKVPrefixCache()). The node is pinned via the request's DeviceNodeRef so it is not evicted while in use; Mamba checkpoint publication stays hybrid-only. Thread kv_prefix_cache_ into SchedulePrefillEvent and ScheduleDecodeEvent so they can publish for non-hybrid models.
Signed-off-by: Qingyang Wu <willqywu@gmail.com>1 parent 4b87a50 commit 8373813
3 files changed
Lines changed: 32 additions & 14 deletions
File tree
- tokenspeed-scheduler/csrc
- fsm
- scheduler/operations
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
105 | | - | |
| 105 | + | |
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
111 | 116 | | |
112 | 117 | | |
113 | 118 | | |
| |||
120 | 125 | | |
121 | 126 | | |
122 | 127 | | |
123 | | - | |
124 | | - | |
| 128 | + | |
125 | 129 | | |
126 | | - | |
| 130 | + | |
| 131 | + | |
127 | 132 | | |
128 | 133 | | |
129 | 134 | | |
| |||
213 | 218 | | |
214 | 219 | | |
215 | 220 | | |
216 | | - | |
| 221 | + | |
217 | 222 | | |
218 | 223 | | |
219 | 224 | | |
| |||
263 | 268 | | |
264 | 269 | | |
265 | 270 | | |
266 | | - | |
| 271 | + | |
267 | 272 | | |
268 | 273 | | |
269 | 274 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
56 | 63 | | |
57 | 64 | | |
58 | 65 | | |
| |||
106 | 113 | | |
107 | 114 | | |
108 | 115 | | |
109 | | - | |
| 116 | + | |
110 | 117 | | |
111 | 118 | | |
112 | | - | |
| 119 | + | |
| 120 | + | |
113 | 121 | | |
114 | 122 | | |
115 | 123 | | |
| |||
118 | 126 | | |
119 | 127 | | |
120 | 128 | | |
| 129 | + | |
121 | 130 | | |
122 | 131 | | |
123 | 132 | | |
124 | 133 | | |
125 | 134 | | |
126 | | - | |
127 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
128 | 140 | | |
129 | 141 | | |
130 | 142 | | |
131 | 143 | | |
132 | 144 | | |
133 | 145 | | |
134 | 146 | | |
| 147 | + | |
135 | 148 | | |
136 | 149 | | |
137 | 150 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
195 | | - | |
| 195 | + | |
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
| |||
218 | 218 | | |
219 | 219 | | |
220 | 220 | | |
221 | | - | |
| 221 | + | |
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
| |||
0 commit comments