YorùbáVoice

Landing page for data, code and publications for this project sponsored by an Imminent Research Grant.

In 2022, we launched the curation and recording of 40 hours of high-fidelity speech data for the Yorùbá language, the third most widely spoken language in Africa with over 40 million L1 speakers. We partner with the YorubaName organization in Nigeria to encourage volunteers both online and offline to record their voices.

Official project blog → www.yorubavoice.com
The dataset is published in the ELRA catalogue →
- ELRA Resource description page
- 012-405-700-001-6 → Corresponding unique ISLRN number to use in citations, publications
The LREC-COLING 2024 paper → arXiv
The Speech Recorder App we developed → yoruba-voice-speech-recorder
Source code and various tools used can be found in this present repo

BibTeX entry and citation info

If you make use of our dataset, please cite the our paper.

@misc{ogunremi2023iroyinspeech,
      title={\`{I}r\`{o}y\`{i}nSpeech: A multi-purpose Yor\`{u}b\'{a} Speech Corpus}, 
      author={Tolulope Ogunremi and Kola Tubosun and Anuoluwapo Aremu and Iroro Orife and David Ifeoluwa Adelani},
      year={2023},
      eprint={2307.16071},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

YorùbáVoice

BibTeX entry and citation info

Files

README.md

Latest commit

History

README.md

File metadata and controls

YorùbáVoice

BibTeX entry and citation info