Skip to content

Yorùbá language training text for NLP, ASR and TTS tasks

License

Notifications You must be signed in to change notification settings

Timilehin/yoruba-text

 
 

Repository files navigation

Yorùbá text

This repository contains fully diacritized Yorùbá text, converted to Unicode Normalization Form Composition (NFC) format, where diacritized characters are composed into a single character with the following code:

def convert_to_NFC(filename, outfilename):
    text=''.join(c for c in unicodedata.normalize('NFC', open(filename).read()))
    with open(outfilename, 'w') as f:
        f.write(text)

Web sources:

Social Media sources:

Text has been gathered with permission from online sources, and lightly preprocessed for use in NLP, TTS, ASR applications. Note, some of the sentences may have errors, please submit a pull-request if you have corrections!

Resources

About

Yorùbá language training text for NLP, ASR and TTS tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%