We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add https://oscar-corpus.com, common crawl from the BBC to the working corpus for ADR and other monolingual tasks
Language | Words original | Size original | File original | Words deduplicated | Size deduplicated | File deduplicated Yoruba | 8,906 | 55K | yo.txt.gz | 3,518 | 27K | yo_dedup.txt.gz
The text was updated successfully, but these errors were encountered:
ruohoruotsi
No branches or pull requests
Add https://oscar-corpus.com, common crawl from the BBC to the working corpus for ADR and other monolingual tasks
The text was updated successfully, but these errors were encountered: