Skip to content

[Feature] Add speaker diarization for multi-speaker videos #8

Description

@Edmon02

Summary

Currently the pipeline assumes a single speaker. Add speaker diarization to handle multi-speaker videos with different voice clones per speaker.

Tasks

  • Integrate pyannote.audio or similar diarization model
  • Map each segment to a speaker ID
  • Support multiple reference audio files (one per speaker)
  • Clone each speaker's voice independently
  • Track speaker transitions in the pipeline metadata

Considerations

  • GPU memory management with multiple voice models loaded
  • Speaker embedding similarity threshold for clustering
  • Graceful fallback to single-speaker mode

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestresearchResearch and experimentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions