Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we modified it for training on Hindi dataset by adding Hindi dictionary in the utils.py #160

Open
pratiksonune opened this issue May 29, 2024 · 3 comments

Comments

@pratiksonune
Copy link

No description provided.

@Cbkmxbkbchkb
Copy link

Hiii my

@vatsalaggarwal
Copy link
Member

The tokeniser used is OpenAI BPE based, and can support any unicode character (although hasn't seen data using those tokens), so you'd just need to retrain on that data, without making changes in utils.py

@metavoiceio metavoiceio deleted a comment from Cbkmxbkbchkb Jun 10, 2024
@metavoiceio metavoiceio deleted a comment from Cbkmxbkbchkb Jun 10, 2024
@pratiksonune
Copy link
Author

The tokeniser used is OpenAI BPE based, and can support any unicode character (although hasn't seen data using those tokens), so you'd just need to retrain on that data, without making changes in utils.py

We have created our own Hindi_speaker_encoder.pt based on metavoice encoder architecture, and we have fine-tuned first_stage.pt using our hindi_speaker_encoder.pt just to get an idea of result, but it gives a silent output when we give a hindi text as an input. We have also noticed that their is no training script for firs_stage and second_stage so will it still work or not if we train our hindi dataset on this two pretrained models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants