Summary
Currently the pipeline assumes a single speaker. Add speaker diarization to handle multi-speaker videos with different voice clones per speaker.
Tasks
Considerations
- GPU memory management with multiple voice models loaded
- Speaker embedding similarity threshold for clustering
- Graceful fallback to single-speaker mode
Summary
Currently the pipeline assumes a single speaker. Add speaker diarization to handle multi-speaker videos with different voice clones per speaker.
Tasks
Considerations