Skip to content

构造large margin data所用的scoring model是哪一个模型呀?有计划开源嘛? #56

Description

@eyree

非常感谢贵团队的工作!请问3.1.2 Emotion and Speaking Style Editing构造large margin data所用的scoring model是哪一个模型呀?这个有计划开源嘛?

原文3.1.2 Emotion and Speaking Style Editing
Zero-shot Cloning. A triplet ⟨textprompt,audioneutral,audioemotion,style⟩is constructed for each emo-
tion and speaking style by selecting corresponding emotional and neutral audio clips from the same
speaker as the prompt audio and processing them with the StepTTS voice cloning interface, using a
text instruction that describes the target attribute.
Margin Scoring. To evaluate the triplet generated, we developed a scoring model using a small,
human-annotated dataset. The model evaluates audio pairs on a 1-10 scale, with higher margin
scores corresponding to more desirable outcomes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions