Skip to content

(outdated/superceeded) fix: avoid problems with higher sampling rates #25

Description

@huberp

Because the copilot App on Android today constantly fails while still charging my budget, I have to manually transfere a chat session to the webui. I now copy my questions and copilots answers, Q1, A1 and Q2,A2 to make copilot awate of the previous results ON MY BUDGET, which pisses me off

Q1: I am still concerned about efficiency?
What if people run the plugin with 96khz or even 192khz - is the transmittion volume then getting to big?
A1: This is a great efficiency concern. Let me work through the numbers concretely based on the code.


The Bandwidth Problem at High Sample Rates

Fixed chunk size is the core issue

BROADCAST_CHUNK_SAMPLES = 1470 is a fixed constant, and the comment says:

"Chosen so that one packet covers approximately 33 ms at 44.1 kHz."

The timer sends at 30 Hz, and every tick drains however many full 1470-sample chunks have accumulated. The packet count per second scales linearly with sample rate:

Sample Rate Chunks/sec Bytes/packet* UDP bandwidth
44,100 Hz ~30/s ~5,904 B ~177 KB/s
96,000 Hz ~65/s ~5,904 B ~384 KB/s
192,000 Hz ~131/s ~5,904 B ~768 KB/s

*Packet = 36 B header + 1470 × 4 B floats = 5,916 B

* This is per sender. With N instances, multiply by N.

The two real concerns

  1. Loopback UDP volume — Since this is localhost-only, actual network bandwidth isn't the issue, but CPU overhead for the sender's sendto() + receiver's recvfrom() syscalls, memcpy, and mutex lock scales with packet rate. At 192 kHz you're doing ~131 syscalls/sec instead of ~30.

  2. Ring buffer overflowAudioSampleRingBuffer::CAPACITY = 16384 samples, sized for ~370 ms at 44.1 kHz. At 192 kHz, that same capacity only covers ~85 ms. If the 30 Hz timer fires late (which can happen), you'll start dropping samples (push() returns false on overflow).

  3. Visual usefulness ceiling — The ScopeDisplay renders waveforms for a beat-range window. At 192 kHz you have 4× more samples describing the same musical time — far more resolution than any screen needs. The extra data doesn't add visual quality.


Potential mitigations

Option A: Downsampling before broadcast (most impactful)

Decimate the audio in processBlock before pushing to the ring buffer — e.g., only push every Nth sample where N = sampleRate / 44100. The waveform display doesn't need full-bandwidth audio.

Option B: Scale BROADCAST_CHUNK_SAMPLES with sample rate

Instead of a hardcoded 1470, compute the chunk size at prepareToPlay() to always represent a fixed time window (e.g. 33 ms), keeping packet rate constant at ~30/sec regardless of sample rate.

Option C: Increase ring buffer capacity at high sample rates

In prepareToPlay(), resize the ring buffer based on actual sampleRate to avoid overflow at 96/192 kHz.


Option A (decimation) is probably the cleanest fix — it solves bandwidth, CPU overhead, ring buffer pressure, and keeps the broadcast format completely unchanged. Would you like me to open a PR that implements one of these strategies?

Q2: But does decimate not completely change rms or canceling sensor to the worse?
A2: That's a great and sharp observation. You're right to be careful — naive decimation would hurt, but looking at the actual code, the situation is more nuanced. Let me break it down properly.


Your concern is valid — but only for one part

RMS: Decimation would NOT hurt it

RMS for remote instances is computed from the dedicated rmsAccum[] / rmsCount[] arrays accumulated in setRemoteRawData() (lines 91–94):

accum.rmsAccum[rmsSlot] += s * s;
accum.rmsCount[rmsSlot]++;

Then computeMetrics() reads it back as:

sqrt(ra[s] / (float)rc[s])   // = true RMS

This is a proper mean-square accumulation — it doesn't matter how many samples fall into a slot, the rmsCount normalises correctly. Decimating the stream (fewer samples arriving) would still yield correct RMS as long as the samples are representative, because more samples per slot just average more finely — the result converges to the same value.


Cancellation: This is where decimation gets risky

The cancellation index has two parts:

  • Denominator D: uses the same cancelAccum / cancelCount accumulator — this is RMS-based and robust to decimation.

  • Numerator N (line 376–389): computed from accum.bins[], which is a last-write-wins waveform buffer:

    accum.bins[bin] = s;  // last sample in this PPQ bin wins

    The bins array is then used to compute the RMS of the summed waveform (local + remote) per cancel slot. This is a phase-sensitive operation — what matters here is that the bin contains the right sample at the right PPQ position for phase comparison with the local signal.

    With heavy decimation, fewer samples per slot means fewer bin writes per beat cycle — the bins would be sparser but still written at valid PPQ-mapped positions. The phase accuracy depends on which sample wins for a given bin. This is already "one sample per bin" at native rate (last-write-wins), so sparse coverage doesn't fundamentally break phase alignment — it just reduces bin fill density, which could make the numerator N noisier.


The real answer: the better mitigation isn't decimation

Since the RMS is robust but the waveform bin fill density matters for cancellation quality, the cleanest fix is actually Option B from before — keep BROADCAST_CHUNK_SAMPLES time-constant:

// In prepareToPlay(), compute dynamically:
// chunkSamples = ~33ms × sampleRate
// = round(sampleRate * 0.033)
// At 44.1k → 1470, at 96k → 3168, at 192k → 6336

This keeps the packet rate fixed at ~30/s regardless of sample rate, packet size grows proportionally, but:

  • ✅ RMS accuracy: unaffected (more samples = more precise accumulation)
  • ✅ Cancellation bin density: stays the same or improves (more samples covering the same time window → denser bin fill)
  • ✅ Packet rate / UDP syscalls: constant at ~30/s regardless of sample rate
  • ⚠️ Packet payload size: grows 2× at 96 kHz, 4× at 192 kHz — but still only ~24 KB on localhost, totally fine

The ring buffer overflow risk (CAPACITY = 16384) also needs addressing alongside this — it should be sized to at least 2× the dynamic chunk size to avoid audio-thread drops at high sample rates.


Bottom line: Decimation would hurt cancellation metric accuracy (sparser bin coverage, potentially wrong phase wins), whereas a time-constant chunk size keeps everything correct and just allows naturally larger packets. Would you like me to open a PR for Option B?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions