Skip to content

[Model][MIMO Audio] Unify sync/async code2wav into single path#4359

Open
NickCao wants to merge 8 commits into
vllm-project:mainfrom
NickCao:worktree-unified-chunk-path
Open

[Model][MIMO Audio] Unify sync/async code2wav into single path#4359
NickCao wants to merge 8 commits into
vllm-project:mainfrom
NickCao:worktree-unified-chunk-path

Conversation

@NickCao

@NickCao NickCao commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Purpose

Models with both custom_process_next_stage_input_func and async_chunk_process_next_stage_input_func maintain two separate producer functions and two decoder paths that are ~80% identical. This PR eliminates the duplication for MIMO Audio by routing both sync and async modes through the single async_chunk path.

Test Plan

vLLM Version: 0.23.0

vLLM-Omni Commit: d80d796

# Unit tests — stage input processors (15 passed)
pytest tests/model_executor/stage_input_processors/test_mimo_audio_llm2code2wav.py -v
pytest tests/model_executor/stage_input_processors/test_mimo_audio_flush_remaining_codes.py -v

# Unit tests — full streaming helpers suite (40 passed, includes updated full_payload tests)
pytest tests/model_executor/stage_input_processors/test_qwen3_omni_streaming_helpers.py -v

# Unit tests — batch decode (25 passed, sync and streaming parametrized)
pytest tests/model_executor/models/mimo_audio/test_mimo_audio_code2wav_batch_decode.py -v

# E2e — online serving (4 passed, both async_chunk and no_async_chunk)
pytest tests/e2e/online_serving/test_mimo_audio.py -v

Test Result

PASSED

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@NickCao NickCao force-pushed the worktree-unified-chunk-path branch 2 times, most recently from 9343112 to 06d7946 Compare June 15, 2026 17:14
@NickCao

NickCao commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Rebased and retested.

@hsliuustc0106 hsliuustc0106 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting one inline finding from review.

Comment thread vllm_omni/model_executor/stage_input_processors/mimo_audio.py
@hsliuustc0106

Copy link
Copy Markdown
Collaborator

@qibaoyuan PTAL

@hsliuustc0106 hsliuustc0106 added the tts code related to tts models label Jun 19, 2026
NickCao and others added 6 commits June 22, 2026 09:49
Point both custom_process_next_stage_input_func and
async_chunk_process_next_stage_input_func to the same
llm2code2wav_async_chunk function. Add unflatten_payload call so
it handles both transport modes (full_payload accumulator flattens
dict keys; async_chunk transport does not). Remove the duplicate
_batch_decode_waveforms decoder and the if-is_async_chunk branch —
the streaming decoder with left_context_size=0 produces identical
output to the old sync path.

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Nick Cao <ncao@redhat.com>
…arams

Use module-level _DEFAULT_CODEC_CHUNK_FRAMES and
_DEFAULT_CODEC_LEFT_CONTEXT_FRAMES as parameter defaults so callers
don't need to repeat them.

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Nick Cao <ncao@redhat.com>
…ing paths

Update test_mimo_audio_code2wav_batch_decode to exercise
_batch_chunked_decode_streaming (the unified decode path) instead of
the removed _batch_decode_waveforms. Every test is parametrized with
left_context in {0, 1, 5} covering sync (no strip), partial strip,
and full strip.

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Nick Cao <ncao@redhat.com>
…_chunk

Add --no-async-chunk variant to test_params so both sync and streaming
code2wav paths are exercised by the existing e2e tests.

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Nick Cao <ncao@redhat.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
The upstream "Output Processor Phase 2" refactor renamed
pooling_output → multimodal_output in the chunk_transfer_adapter
but not in the full-payload mixin. Restore llm2code2wav_full_payload
as a thin wrapper that bridges the kwarg mismatch so the non-async
path works with both callers on vllm 0.23.0.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
@NickCao NickCao force-pushed the worktree-unified-chunk-path branch from 06d7946 to ca4162c Compare June 22, 2026 14:13
@NickCao NickCao marked this pull request as draft June 22, 2026 14:15
NickCao added 2 commits June 22, 2026 10:18
…ter in async_chunk

The old llm2code2wav_full_payload truncated flat_codes at
MAX_CODE2WAV_TOKENS and filtered zero-padded codec rows via
_filter_zero_codec_rows before flattening. Both guards were lost when
the function was replaced with a delegation to llm2code2wav_async_chunk.
Restore them and drop the tensor-list-tensor round-trip.

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Nick Cao <ncao@redhat.com>
Provide a transfer_manager mock with code_prompt_token_ids, add req_id
to request fixtures, and switch from dict access to OmniPayloadStruct
attribute access to match the new return type.

Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Nick Cao <ncao@redhat.com>
@NickCao NickCao force-pushed the worktree-unified-chunk-path branch from ca4162c to d80d796 Compare June 22, 2026 14:26
@NickCao NickCao marked this pull request as ready for review June 22, 2026 15:27
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@NickCao

NickCao commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Passed unit tests and manually examined the output audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tts code related to tts models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants