[Model][MIMO Audio] Unify sync/async code2wav into single path#4359
Open
NickCao wants to merge 8 commits into
Open
[Model][MIMO Audio] Unify sync/async code2wav into single path#4359NickCao wants to merge 8 commits into
NickCao wants to merge 8 commits into
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
9343112 to
06d7946
Compare
Contributor
Author
|
Rebased and retested. |
hsliuustc0106
left a comment
Collaborator
There was a problem hiding this comment.
Posting one inline finding from review.
Collaborator
|
@qibaoyuan PTAL |
Point both custom_process_next_stage_input_func and async_chunk_process_next_stage_input_func to the same llm2code2wav_async_chunk function. Add unflatten_payload call so it handles both transport modes (full_payload accumulator flattens dict keys; async_chunk transport does not). Remove the duplicate _batch_decode_waveforms decoder and the if-is_async_chunk branch — the streaming decoder with left_context_size=0 produces identical output to the old sync path. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
…arams Use module-level _DEFAULT_CODEC_CHUNK_FRAMES and _DEFAULT_CODEC_LEFT_CONTEXT_FRAMES as parameter defaults so callers don't need to repeat them. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
…ing paths
Update test_mimo_audio_code2wav_batch_decode to exercise
_batch_chunked_decode_streaming (the unified decode path) instead of
the removed _batch_decode_waveforms. Every test is parametrized with
left_context in {0, 1, 5} covering sync (no strip), partial strip,
and full strip.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
…_chunk Add --no-async-chunk variant to test_params so both sync and streaming code2wav paths are exercised by the existing e2e tests. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
The upstream "Output Processor Phase 2" refactor renamed pooling_output → multimodal_output in the chunk_transfer_adapter but not in the full-payload mixin. Restore llm2code2wav_full_payload as a thin wrapper that bridges the kwarg mismatch so the non-async path works with both callers on vllm 0.23.0. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
06d7946 to
ca4162c
Compare
…ter in async_chunk The old llm2code2wav_full_payload truncated flat_codes at MAX_CODE2WAV_TOKENS and filtered zero-padded codec rows via _filter_zero_codec_rows before flattening. Both guards were lost when the function was replaced with a delegation to llm2code2wav_async_chunk. Restore them and drop the tensor-list-tensor round-trip. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Provide a transfer_manager mock with code_prompt_token_ids, add req_id to request fixtures, and switch from dict access to OmniPayloadStruct attribute access to match the new return type. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
ca4162c to
d80d796
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Contributor
Author
|
Passed unit tests and manually examined the output audio. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Models with both custom_process_next_stage_input_func and async_chunk_process_next_stage_input_func maintain two separate producer functions and two decoder paths that are ~80% identical. This PR eliminates the duplication for MIMO Audio by routing both sync and async modes through the single async_chunk path.
Test Plan
vLLM Version: 0.23.0
vLLM-Omni Commit: d80d796
Test Result
PASSED