[Bug]: higgs_audio_v3 test_concurrent_pcm_streaming consistently fails HNR threshold (concurrent PCM streaming)

### Summary

`tests/e2e/online_serving/test_higgs_audio_v3.py::TestHiggsAudioV3OnlineHappyPath::test_concurrent_pcm_streaming[higgs_audio_v3_plain_text]` fails deterministically:

```
AssertionError: Audio distortion detected: HNR=-2.73 dB < -2.0 dB. Voice clone decoder may be losing ref_code speaker context on later chunks.
```

The 3-way **concurrent** PCM-streaming case measures HNR ~0.7-1.3 dB below the -2.0 dB floor, while the **single-request** streaming case passes (~0.8 dB). This is a `higgs_audio_v3_plain_text` request (no `ref_audio` / voice clone), so the assertion's stock "Voice clone decoder may be losing ref_code speaker context" hint is generic boilerplate, not specific to this case.

### Reproducible / deterministic (not flaky)

Same failure on both `main` and the rebase branch, across multiple runs:

| Where | HNR |
|---|---|
| `main` @ `31886d6e` (3x parallel, identical) | **-2.73 dB** x3 |
| `main` @ `d75a6433` | -3.25 dB |
| `dev/vllm-align` (build #2125) | -2.94 dB |
| `dev/vllm-align` (build #2158) | -2.94 dB |

3 parallel runs on `main` produced byte-identical `-2.73 dB` -> deterministic (fixed seed), not a flake. Single-stream passes; only the 3-way concurrent batch dips below threshold. **Not introduced by any rebase** - `main` fails it identically.

### Note on CI visibility

This test is diff-gated out of `main` CI builds (the ready pipeline `if:` is `false` on `main`), so `main` builds appear green without exercising it. It runs (and fails) on nightly / `ready`-labeled builds.

### Threshold context

`-2.0 dB` is already a Higgs-specific relaxed `min_hnr_db` override (the global default is `1.0 dB`); the concurrent-streaming case still falls below it. Related to the HNR-threshold-too-strict pattern in #4335 / #4390 (VoxCPM2), but those do not cover this Higgs case.

### Suggested investigation

Either the concurrent-streaming decode genuinely degrades audio quality (real bug) and needs a fix, or the metric/threshold for the batched-concurrent case needs adjusting (e.g. median across streams instead of worst-of-N). The test is being temporarily skipped pending this investigation.

### Environment

- vLLM 0.22 (`main`) / 0.23 (`dev/vllm-align`), CUDA 13, torch 2.11
- Reproduces on both H100 (CI) and L20X (local)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: higgs_audio_v3 test_concurrent_pcm_streaming consistently fails HNR threshold (concurrent PCM streaming) #4411

Summary

Reproducible / deterministic (not flaky)

Note on CI visibility

Threshold context

Suggested investigation

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Where	HNR
`main` @ `31886d6e` (3x parallel, identical)	-2.73 dB x3
`main` @ `d75a6433`	-3.25 dB
`dev/vllm-align` (build #2125)	-2.94 dB
`dev/vllm-align` (build #2158)	-2.94 dB

[Bug]: higgs_audio_v3 test_concurrent_pcm_streaming consistently fails HNR threshold (concurrent PCM streaming) #4411

Description

Summary

Reproducible / deterministic (not flaky)

Note on CI visibility

Threshold context

Suggested investigation

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions