You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For fine-tuning the mode what is the minimum and maximum audio length I can use that is allowed by the system?
The fine-tuning script takes only 2 files as input -- a speech (audio) file and it's transcription. How is this possible? is the SiSNR calculated against the same audio?
I want fine-tune the voice cloning aspect of metavoice if possible. Is there anything extra I need to implement to do this?
The text was updated successfully, but these errors were encountered:
abhijeethp
changed the title
Fine-tuning the cloning of metavoice
Fine-tuning voice-cloning capability of metavoice
Apr 27, 2024
Hey @abhijeethp, sorry for only getting to this now, we've seen people finetuning using chunks of 5-10s audio in their training datasets (but it's not a hard range). We're not calculating SiSNR as part of finetuning - are you asking whether using the same audio is appropriate?
Re finetuning the voice cloning, you should be all good if you follow the finetuning guide with a solid dataset & play around with the hyperparameters and then use a good reference clip upon inference.
Hey Team,
Can anyone help me understand the following regarding the metavoice model fine-tuning process?
https://github.com/metavoiceio/metavoice-src/tree/main?tab=readme-ov-file#finetuning
The text was updated successfully, but these errors were encountered: