[Router] Preserve reasoning_content when caching streaming responses by theohsiung · Pull Request #2141 · vllm-project/semantic-router

theohsiung · 2026-06-10T16:40:53Z

What

Caching a streaming reasoning-model response dropped reasoning_content: the streaming accumulator captured only delta.content, so the reconstructed chat.completion written to the cache had no reasoning. A later cache hit then returned a response missing the reasoning the original live stream delivered (the non-streaming cache preserves it, since it stores the raw upstream body).

Fixes #2140.

Fix

request_context.go: add StreamingReasoning.
processor_res_body_streaming.go: accumulate delta.reasoning_content alongside delta.content.
processor_res_cache.go: emit message.reasoning_content in the reconstructed response when non-empty.

reasoning_content is already a recognized field (looper extraction reads choices[].reasoning_content and .message.reasoning_content; memory + anthropic-outbound handle it).

Test plan

processor_res_cache_reasoning_test.go: accumulator captures delta.reasoning_content across chunks; reconstruction includes message.reasoning_content; reconstruction omits the field when no reasoning was streamed. RED before fix → GREEN.
go test ./pkg/extproc/ full suite green (no regression).
gofmt/go vet clean; golangci-lint (repo config) → 0 issues.

Notes

DCO signed-off.
Out of scope (separate): multi-choice (n>1) streaming is reconstructed as a single merged choice — a distinct fidelity gap to address separately.

netlify · 2026-06-10T16:43:16Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`c719ae7`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/6a2d883081eea70008021e23
😎 Deploy Preview	https://deploy-preview-2141--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-06-10T17:06:52Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src/semantic-router`

Owners: @rootfs, @Xunzhuo, @szedan-rh, @yehuditkerido, @abdallahsamabd, @asaadbalum, @liavweiss, @noalimoy
Files changed:

src/semantic-router/pkg/extproc/processor_res_body_streaming.go
src/semantic-router/pkg/extproc/processor_res_cache.go
src/semantic-router/pkg/extproc/processor_res_cache_reasoning_test.go
src/semantic-router/pkg/extproc/request_context.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

github-actions · 2026-06-10T17:17:06Z

✅ Supply Chain Security Report — All Clear

Scanner	Status	Findings
AST Codebase Scan (Py, Go, JS/TS, Rust)	✅	19 finding(s) — MEDIUM: 12 · LOW: 7
AST PR Diff Scan	✅	No issues detected
Regex Fallback Scan	✅	No issues detected

Scanned at 2026-06-13T16:42:17.542Z · View full workflow logs

The streaming accumulator only captured delta.content; the reconstructed chat.completion written to the semantic cache therefore dropped the reasoning that reasoning models stream under delta.reasoning_content. A later cache hit for a semantically-similar request then returned a response WITHOUT the reasoning the original live stream delivered — a silent fidelity loss (the non-streaming cache preserves it because it stores the raw upstream body). Accumulate delta.reasoning_content into ctx.StreamingReasoning and include it as message.reasoning_content in the reconstructed response when present. reasoning_content is already a recognized field elsewhere (looper extraction, memory, anthropic outbound). Absent when no reasoning was streamed. Signed-off-by: theohsiung <theobear870924@gmail.com>

theohsiung · 2026-06-13T17:24:29Z

Hi @AayushSaini101, appreciate you spending your weekend time on this! hahaha 🥹

Xunzhuo

LGTM

…llm-project#2141) The streaming accumulator only captured delta.content; the reconstructed chat.completion written to the semantic cache therefore dropped the reasoning that reasoning models stream under delta.reasoning_content. A later cache hit for a semantically-similar request then returned a response WITHOUT the reasoning the original live stream delivered — a silent fidelity loss (the non-streaming cache preserves it because it stores the raw upstream body). Accumulate delta.reasoning_content into ctx.StreamingReasoning and include it as message.reasoning_content in the reconstructed response when present. reasoning_content is already a recognized field elsewhere (looper extraction, memory, anthropic outbound). Absent when no reasoning was streamed. Signed-off-by: theohsiung <theobear870924@gmail.com> Co-authored-by: Moderator <60972989+AayushSaini101@users.noreply.github.com> Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>

theohsiung requested review from Xunzhuo and rootfs as code owners June 10, 2026 16:40

github-actions Bot assigned abdallahsamabd, asaadbalum, liavweiss, noalimoy, rootfs, szedan-rh, Xunzhuo and yehuditkerido Jun 10, 2026

theohsiung force-pushed the fix/streaming-cache-reasoning-content branch from 82dc59a to dbb3989 Compare June 10, 2026 23:45

Merge branch 'main' into fix/streaming-cache-reasoning-content

3f752b8

AayushSaini101 self-requested a review June 13, 2026 10:16

Merge branch 'main' into fix/streaming-cache-reasoning-content

c719ae7

Xunzhuo approved these changes Jun 17, 2026

View reviewed changes

Xunzhuo merged commit a78a618 into vllm-project:main Jun 17, 2026
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Router] Preserve reasoning_content when caching streaming responses#2141

[Router] Preserve reasoning_content when caching streaming responses#2141
Xunzhuo merged 3 commits into
vllm-project:mainfrom
theohsiung:fix/streaming-cache-reasoning-content

theohsiung commented Jun 10, 2026

Uh oh!

netlify Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

theohsiung commented Jun 13, 2026

Uh oh!

Xunzhuo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

theohsiung commented Jun 10, 2026

What

Fix

Test plan

Notes

Uh oh!

netlify Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src/semantic-router

🎉 Thanks for your contributions!

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Supply Chain Security Report — All Clear

Uh oh!

theohsiung commented Jun 13, 2026

Uh oh!

Xunzhuo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

netlify Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

📁 `src/semantic-router`

github-actions Bot commented Jun 10, 2026 •

edited

Loading