Exports speechbrain/lang-id-voxlingua107-ecapa
into a single end-to-end ONNX graph for use as the language-identification
backend in Vernacula.
The model is an ECAPA-TDNN classifier over 80-dim mel filterbank features, trained on the VoxLingua107 corpus. It takes raw 16 kHz mono audio and emits both a 107-class logit vector (for language classification) and a 192-dim pooled embedding (useful for language-similarity clustering). License is Apache 2.0, so it can be bundled or auto-downloaded without a gated-accept flow.
export_voxlingua_to_onnx.py— exportsvoxlingua107.onnx+lang_map.jsonverify_voxlingua_parity.py— PyTorch ↔ ONNX parity check on real audio clipssrc/ecapa_wrapper.py— thinnn.Modulecomposing SpeechBrain's submodules for a clean export graphsrc/lang_map.py— builds the{index: {iso, name}}lookup that ships alongside the ONNXrequirements.txt— Python dependencies for the export environment
Single file, no external data.
| Tensor | Shape | Dtype | Notes |
|---|---|---|---|
audio |
[batch, samples] |
float32 | 16 kHz mono, variable length |
logits |
[batch, 107] |
float32 | Language classification logits |
embedding |
[batch, 256] |
float32 | Pooled ECAPA embedding |
Preprocessing (FBANK, per-utterance mean-variance normalisation) is folded into the graph, so the C# runtime just sends raw PCM.
Use Python 3.11 or 3.12.
python3 -m venv .venv-voxlingua-export
source .venv-voxlingua-export/bin/activate
pip install -r public/scripts/voxlingua107_export/requirements.txtpython public/scripts/voxlingua107_export/export_voxlingua_to_onnx.py \
--out-dir ./voxlingua107This produces:
voxlingua107/
├── voxlingua107.onnx # ~83 MB FP32, weights inlined
└── lang_map.json # 107 entries
PyTorch's dynamo-based exporter (torch>=2.8) splits weights into a
companion .onnx.data file by default. The script re-saves the graph
inline once export finishes — this model is small enough (well under the
2 GB protobuf ceiling) that the sidecar just adds distribution friction.
The script downloads the SpeechBrain checkpoint into ./.voxlingua107-cache/
on first run; subsequent runs reuse the cached weights.
Prepare a handful of 16 kHz mono WAV clips across several languages (ten clips
spanning en, zh, hi, ar, es, de, fr, ru, ja, sw is a good baseline), then:
python public/scripts/voxlingua107_export/verify_voxlingua_parity.py \
--model-dir ./voxlingua107 \
--clips clip_en.wav clip_zh.wav clip_hi.wav ...Pass criteria:
- Top-1 language matches between PyTorch and ONNX on every clip
- Softmax probability max-abs-diff ≤
1e-3(scale-invariant; raw-logit diff grows with clip length because of FP32 accumulation, so we compare probs rather than logits) - Embedding cosine similarity ≥
0.9999
Observed values on a 5-clip test set (en, de, fr, ru, hu, durations 90 s–
602 s) are Δprob 3e-11 to 6e-5 and cosine 1.000000 across the board,
so the gates have comfortable headroom.
Failures print the reason(s) and return a non-zero exit code so the check can gate CI if needed.
After a successful export, build the manifest and upload to
christopherthompson81/voxlingua107-lid-onnx using the shared tools at
the scripts/ root:
python scripts/make_manifest.py \
--model-dir ~/models/voxlingua107 \
--files voxlingua107.onnx lang_map.json
python scripts/upload_to_hf.py \
--model-dir ~/models/voxlingua107 \
--repo-id christopherthompson81/voxlingua107-lid-onnx \
--sync-readmeThe model card is sourced from
scripts/hf_readmes/voxlingua107-lid-onnx/README.md;
edit it there, commit, and re-run with --sync-readme to update HF.