Fine-tuning voice-cloning capability of metavoice #137

abhijeethp · 2024-04-27T01:15:45Z

Hey Team,
Can anyone help me understand the following regarding the metavoice model fine-tuning process?
https://github.com/metavoiceio/metavoice-src/tree/main?tab=readme-ov-file#finetuning

For fine-tuning the mode what is the minimum and maximum audio length I can use that is allowed by the system?
The fine-tuning script takes only 2 files as input -- a speech (audio) file and it's transcription. How is this possible? is the SiSNR calculated against the same audio?
I want fine-tune the voice cloning aspect of metavoice if possible. Is there anything extra I need to implement to do this?

lucapericlp · 2024-05-14T21:45:10Z

Hey @abhijeethp, sorry for only getting to this now, we've seen people finetuning using chunks of 5-10s audio in their training datasets (but it's not a hard range). We're not calculating SiSNR as part of finetuning - are you asking whether using the same audio is appropriate?

Re finetuning the voice cloning, you should be all good if you follow the finetuning guide with a solid dataset & play around with the hyperparameters and then use a good reference clip upon inference.

abhijeethp changed the title ~~Fine-tuning the cloning of metavoice~~ Fine-tuning voice-cloning capability of metavoice Apr 27, 2024

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning voice-cloning capability of metavoice #137

Fine-tuning voice-cloning capability of metavoice #137

abhijeethp commented Apr 27, 2024

This comment was marked as spam.

lucapericlp commented May 14, 2024

Fine-tuning voice-cloning capability of metavoice #137

Fine-tuning voice-cloning capability of metavoice #137

Comments

abhijeethp commented Apr 27, 2024

This comment was marked as spam.

lucapericlp commented May 14, 2024