Skip to content

[Bug]: higgs_audio_v3 test_concurrent_pcm_streaming consistently fails HNR threshold (concurrent PCM streaming) #4411

@tzhouam

Description

@tzhouam

Summary

tests/e2e/online_serving/test_higgs_audio_v3.py::TestHiggsAudioV3OnlineHappyPath::test_concurrent_pcm_streaming[higgs_audio_v3_plain_text] fails deterministically:

AssertionError: Audio distortion detected: HNR=-2.73 dB < -2.0 dB. Voice clone decoder may be losing ref_code speaker context on later chunks.

The 3-way concurrent PCM-streaming case measures HNR ~0.7-1.3 dB below the -2.0 dB floor, while the single-request streaming case passes (~0.8 dB). This is a higgs_audio_v3_plain_text request (no ref_audio / voice clone), so the assertion's stock "Voice clone decoder may be losing ref_code speaker context" hint is generic boilerplate, not specific to this case.

Reproducible / deterministic (not flaky)

Same failure on both main and the rebase branch, across multiple runs:

Where HNR
main @ 31886d6e (3x parallel, identical) -2.73 dB x3
main @ d75a6433 -3.25 dB
dev/vllm-align (build #2125) -2.94 dB
dev/vllm-align (build #2158) -2.94 dB

3 parallel runs on main produced byte-identical -2.73 dB -> deterministic (fixed seed), not a flake. Single-stream passes; only the 3-way concurrent batch dips below threshold. Not introduced by any rebase - main fails it identically.

Note on CI visibility

This test is diff-gated out of main CI builds (the ready pipeline if: is false on main), so main builds appear green without exercising it. It runs (and fails) on nightly / ready-labeled builds.

Threshold context

-2.0 dB is already a Higgs-specific relaxed min_hnr_db override (the global default is 1.0 dB); the concurrent-streaming case still falls below it. Related to the HNR-threshold-too-strict pattern in #4335 / #4390 (VoxCPM2), but those do not cover this Higgs case.

Suggested investigation

Either the concurrent-streaming decode genuinely degrades audio quality (real bug) and needs a fix, or the metric/threshold for the batched-concurrent case needs adjusting (e.g. median across streams instead of worst-of-N). The test is being temporarily skipped pending this investigation.

Environment

  • vLLM 0.22 (main) / 0.23 (dev/vllm-align), CUDA 13, torch 2.11
  • Reproduces on both H100 (CI) and L20X (local)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh priorityhigh priority issue, needs to be done asap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions