Skip to content

Latest commit

 

History

History
80 lines (68 loc) · 3.4 KB

File metadata and controls

80 lines (68 loc) · 3.4 KB

TODO

Burn Architectures

  • Implement Squeezeformer architecture with Burn.
  • Add Squeezeformer CTC wrapper.
  • Add Squeezeformer transcription module.
  • Compare Squeezeformer against the Python reference in /home/yehor/Work/PositiveLoss/squeezeformer-ukrainian.
  • Implement Zipformer architecture with Burn.
  • Implement Paraformer architecture with Burn.
  • Implement Wav2Vec/W2V-BERT architecture with Burn.
  • Tighten Zipformer parity against the Python implementation, including custom normalization, balancing, and attention details.
  • Replace Zipformer balancer/whiten forward-compatible placeholders with Burn-compatible zero-forward gradient penalties.
  • Revisit Zipformer balancer/whiten against pinned Burn 0.21.0-pre.3 custom-backward APIs.
  • Re-check Zipformer balancer/whiten when upgrading Burn, and replace zero-forward penalties if a public generic custom-backward hook is available.
  • Tighten Paraformer parity against the Python implementation, including predictor/alignment-specific training losses.
  • Add enhanced Paraformer-v2 shallow CTC, boundary, and refinement heads.
  • Tighten Wav2Vec/W2V-BERT parity against the Python implementation.
  • Add Hugging Face W2V-BERT weight import/config loading for Burn.
  • Add W2V-BERT activation checkpointing

Training

  • Add Rust training CLI.
  • Support Squeezeformer training.
  • Support Zipformer training.
  • Support Paraformer training.
  • Support Wav2Vec/W2V-BERT training.
  • Add CTC training path for supported architectures.
  • Add dry-run mode for trainer smoke tests.
  • Add real checkpoint save/load for model weights.
  • Add optimizer state checkpointing.
  • Add resume support with config validation.
  • Add GPU backend/device selection.
  • Add mixed precision support + BF16.
  • Add gradient accumulation.
  • Add gradient clipping.
  • Add learning-rate scheduler with warmup/hold/decay.
  • Add EMA model tracking.
  • Add multi-GPU training support.
  • Add parquet files as an alternative to manifest-based data loading.

Data Loading

  • Add manifest path support.
  • Add manifest directory support.
  • Support JSONL manifests.
  • Support file-backed feature records.
  • Add streaming data loader for large datasets.
  • Add adaptive batching.
  • Add largest-batches-first sorting.
  • Make sorting metadata-only so large inline features are not buffered in memory.
  • Add raw audio dataset loading.
  • Add feature extraction from audio.
  • Add tokenizer-driven transcript-to-token conversion.
  • Add dataset cache/index support.
  • Add SpecAugment and waveform augmentation.

Validation And Inference

  • Add Squeezeformer greedy CTC transcription helper.
  • Add validation decoding for all architectures.
  • Add CER/WER metrics.
  • Add beam search decoding.
  • Add optional language-model decoding.
  • Add sample prediction logging.
  • Add inference/export entrypoints for all architectures.

Experiment Ergonomics

  • Write training config metadata to output directory.
  • Add structured run logging.
  • Add detailed diagnostics for losses, batch sizes, and throughput.
  • Add model export packaging.
  • Add Hugging Face upload support.

Performance Optimization

  • Run pprof profiling on training and inference to identify bottlenecks.
  • Inline small helper functions in critical paths.