Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 748 Bytes

README.md

File metadata and controls

17 lines (12 loc) · 748 Bytes

audio corpora builder

Build large audio corpora in various languages → {Yorùbá, Urhobo, Edo, Èʋe, Igbo}

Audio Corpora

Curate specific language corpora from the wealth of audio available in good quality on YouTube The process is as follows:

  • Locate a list of existing playlists, e.g. OrisunTV Iroyin
  • Alternatively, create a new playlist with a custom set of YouTube videos
  • Update yoruba_sources.yml with the reference to the playlist
  • Execute $ python download_youtube.py --output ./audio/

Install dependencies

  • Python 3.7 or later
  • pip install -r requirements.txt