Skip to content

Releases: lhotse-speech/lhotse

0.2 - Towards the Base Camp

18 Nov 14:17
98493d6
Compare
Choose a tag to compare

New features:

  • K2SpeechRecognitionIterableDataset that supports more efficient batching #116
  • Support for torchaudio.sox_effects data augmentation alongside WavAugment #124

Breaking changes:

  • the data augmentation APIs in Lhotse expect augment_fn argument instead of augmenter, that has a signature like: def augment_fn(samples: np.ndarray, sampling_rate: int) -> np.ndarray #124

New corpora:

  • Mobvoi Hotwords #109

Enhancements:

  • progress bars for corpus downloads and feature extraction #131
  • re-using cached LibriSpeech manifests for faster data preparation #133
  • LilcomFilesWriter and NumpyFilesWriter use sub-directories for storage to reduce the filesystem load #134

Several bug fixes and improved testing.

0.1 - First Steps

03 Nov 14:55
c53e83f
Compare
Choose a tag to compare

”The journey of a thousand miles begins with one step.” – Lao Tzu

The first official release of Lhotse! It provides a solid base to build speech research and applications upon, by treating speech and audio data as a first-class citizen in the ML world.

Lhotse is going to continue to evolve, and some API changes might still happen.

Highlights:

  • audio-specific data model with Recording, Supervision, Features, and Cut manifests
  • integration with PyTorch for task-specific Dataset classes and Torchaudio for feature extraction
  • built-in data preparation for 8 speech corpora, including Librispeech, Switchboard, AMI, and TED-LIUM v3
  • intuitive interfaces that work well with interactive environments such as Jupyter notebooks for data visualisation
  • on-the-fly or pre-computed feature extraction and data augmentation