Issue in deepfakes generated for Bangla language

 I'm working with generation of deepfakes for Bangla language. While inferencing, I'm providing source audio of a speaker. To generate the deepfakes of remaining speakers using this, the output is not proper. The words spoken by  source speaker are only audible in all deepfakes of other speakers.