Summary
tests/e2e/online_serving/test_higgs_audio_v3.py::TestHiggsAudioV3OnlineHappyPath::test_concurrent_pcm_streaming[higgs_audio_v3_plain_text] fails deterministically:
AssertionError: Audio distortion detected: HNR=-2.73 dB < -2.0 dB. Voice clone decoder may be losing ref_code speaker context on later chunks.
The 3-way concurrent PCM-streaming case measures HNR ~0.7-1.3 dB below the -2.0 dB floor, while the single-request streaming case passes (~0.8 dB). This is a higgs_audio_v3_plain_text request (no ref_audio / voice clone), so the assertion's stock "Voice clone decoder may be losing ref_code speaker context" hint is generic boilerplate, not specific to this case.
Reproducible / deterministic (not flaky)
Same failure on both main and the rebase branch, across multiple runs:
| Where |
HNR |
main @ 31886d6e (3x parallel, identical) |
-2.73 dB x3 |
main @ d75a6433 |
-3.25 dB |
dev/vllm-align (build #2125) |
-2.94 dB |
dev/vllm-align (build #2158) |
-2.94 dB |
3 parallel runs on main produced byte-identical -2.73 dB -> deterministic (fixed seed), not a flake. Single-stream passes; only the 3-way concurrent batch dips below threshold. Not introduced by any rebase - main fails it identically.
Note on CI visibility
This test is diff-gated out of main CI builds (the ready pipeline if: is false on main), so main builds appear green without exercising it. It runs (and fails) on nightly / ready-labeled builds.
Threshold context
-2.0 dB is already a Higgs-specific relaxed min_hnr_db override (the global default is 1.0 dB); the concurrent-streaming case still falls below it. Related to the HNR-threshold-too-strict pattern in #4335 / #4390 (VoxCPM2), but those do not cover this Higgs case.
Suggested investigation
Either the concurrent-streaming decode genuinely degrades audio quality (real bug) and needs a fix, or the metric/threshold for the batched-concurrent case needs adjusting (e.g. median across streams instead of worst-of-N). The test is being temporarily skipped pending this investigation.
Environment
- vLLM 0.22 (
main) / 0.23 (dev/vllm-align), CUDA 13, torch 2.11
- Reproduces on both H100 (CI) and L20X (local)
Summary
tests/e2e/online_serving/test_higgs_audio_v3.py::TestHiggsAudioV3OnlineHappyPath::test_concurrent_pcm_streaming[higgs_audio_v3_plain_text]fails deterministically:The 3-way concurrent PCM-streaming case measures HNR ~0.7-1.3 dB below the -2.0 dB floor, while the single-request streaming case passes (~0.8 dB). This is a
higgs_audio_v3_plain_textrequest (noref_audio/ voice clone), so the assertion's stock "Voice clone decoder may be losing ref_code speaker context" hint is generic boilerplate, not specific to this case.Reproducible / deterministic (not flaky)
Same failure on both
mainand the rebase branch, across multiple runs:main@31886d6e(3x parallel, identical)main@d75a6433dev/vllm-align(build #2125)dev/vllm-align(build #2158)3 parallel runs on
mainproduced byte-identical-2.73 dB-> deterministic (fixed seed), not a flake. Single-stream passes; only the 3-way concurrent batch dips below threshold. Not introduced by any rebase -mainfails it identically.Note on CI visibility
This test is diff-gated out of
mainCI builds (the ready pipelineif:isfalseonmain), somainbuilds appear green without exercising it. It runs (and fails) on nightly /ready-labeled builds.Threshold context
-2.0 dBis already a Higgs-specific relaxedmin_hnr_dboverride (the global default is1.0 dB); the concurrent-streaming case still falls below it. Related to the HNR-threshold-too-strict pattern in #4335 / #4390 (VoxCPM2), but those do not cover this Higgs case.Suggested investigation
Either the concurrent-streaming decode genuinely degrades audio quality (real bug) and needs a fix, or the metric/threshold for the batched-concurrent case needs adjusting (e.g. median across streams instead of worst-of-N). The test is being temporarily skipped pending this investigation.
Environment
main) / 0.23 (dev/vllm-align), CUDA 13, torch 2.11