[Feature] Add speaker diarization for multi-speaker videos

## Summary
Currently the pipeline assumes a single speaker. Add speaker diarization to handle multi-speaker videos with different voice clones per speaker.

## Tasks
- [ ] Integrate pyannote.audio or similar diarization model
- [ ] Map each segment to a speaker ID
- [ ] Support multiple reference audio files (one per speaker)
- [ ] Clone each speaker's voice independently
- [ ] Track speaker transitions in the pipeline metadata

## Considerations
- GPU memory management with multiple voice models loaded
- Speaker embedding similarity threshold for clustering
- Graceful fallback to single-speaker mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add speaker diarization for multi-speaker videos #8

Summary

Tasks

Considerations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Add speaker diarization for multi-speaker videos #8

Description

Summary

Tasks

Considerations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions