-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
跨语种复制模式下从日语到中文会出现粤语输出 Cantonese output appears from Japanese to Chinese in cross-language copying mode #385
Comments
well this is the drawback of bpe tokenize. zero shot/cross lingual mode is not so stable because chinese and Cantonese have same character |
Thanks a lot |
nice trick |
@liujiaqi7998 我和你的相反,想将中文文本生成日语音频,代码如下所示:
cross_lingual_jp.wav 音频文件是日语音频文件,但是生成的结果 cross_lingual_zh2jp.wav文件音频内容还是 中文,并不是预期的日语,请问需要怎么修改呢? |
@Anmidy 首先模型的输出和输入的字符串相关,你需要将“你好”翻译成“こんにちは”,load_wav理论上加载源语言的音频(存疑) |
@liujiaqi7998 意思是三个方法:inference_sft、inference_zero_shot和inference_cross_lingual,并不能直接将中文文本转成日语音频吗?
|
Describe the bug
跨语种复制模式下从日语到中文会出现粤语输出
For Title , Cantonese output appears from Japanese to Chinese in cross-language copying mode
Reapped
` tts_text = "<|zh|>" + 目标输出文字 prompt_speech_16k = load_wav(person_voice_file, prompt_sr) for i, j in enumerate(cosyvoice.inference_cross_lingual(tts_text, prompt_speech_16k, stream=False)): torchaudio.save(chinese_person_voice_file, j['tts_speech'], 22050)
`Expected behavior
Data sets: 433 original audio and corresponding pre -generated Chinese content. The average audio is within 3 seconds, and the pre -generated text is about 5 words.
Conclusion: After joining the "<| zh |>" limit, more than 50%of the content still appears in Cantonese
复现
预期行为
数据集:433条原始音频和对应的预生成中文内容,音频平均时长在3秒内,预生成文字在5字左右
结论:在加入“<|zh|>”限制后,仍然有超过50%的内容出现了粤语
The text was updated successfully, but these errors were encountered: