构造large margin data所用的scoring model是哪一个模型呀？有计划开源嘛？

非常感谢贵团队的工作！请问3.1.2 Emotion and Speaking Style Editing构造large margin data所用的scoring model是哪一个模型呀？这个有计划开源嘛？

原文3.1.2 Emotion and Speaking Style Editing
Zero-shot Cloning. A triplet ⟨textprompt,audioneutral,audioemotion,style⟩is constructed for each emo-
tion and speaking style by selecting corresponding emotional and neutral audio clips from the same
speaker as the prompt audio and processing them with the StepTTS voice cloning interface, using a
text instruction that describes the target attribute.
Margin Scoring. To evaluate the triplet generated, we developed a scoring model using a small,
human-annotated dataset. The model evaluates audio pairs on a 1-10 scale, with higher margin
scores corresponding to more desirable outcomes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

构造large margin data所用的scoring model是哪一个模型呀？有计划开源嘛？ #56

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

构造large margin data所用的scoring model是哪一个模型呀？有计划开源嘛？ #56

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions