Releases: lhotse-speech/lhotse
Releases · lhotse-speech/lhotse
Himalayan Salt Bath
This is mostly a bug, documentation, and installation fix release.
New features
Add dynamic LRU cache for audio and feature I/O (#443)
Lhotse will now cache up to 512 audio/feature arrays that were read from disk to speed-up some common usage patterns.
This behavior can be easily disabled with lhotse.set_caching_enabled(False)
.
Recipe improvements
- Update timit recipe by @luomingshuang in #446
- Fixes to babel recipe by @jtrmal in #447
- Fixed bug in preparing AMI supervisions by @desh2608 in #449
General improvements
- add export multichannel to kaldi by @jtrmal in #441
- Fix documentation builds by @pzelasko in #448
- fix typo of np.stack by @glynpu in #450
- Rich exception info for I/O in Recording, MonoCut and MixedCut by @pzelasko in #454
- Fix SpecAugment docs by @pzelasko in #463
- Fix misleading manifest format in the comments by @stachu86 in #460
- Ensure the users who installed PyTorch also install torchaudio themselves by @pzelasko in #466
- Add documentation and tests for Kaldi feature extraction layers by @pzelasko in #467
New Contributors
Spontaneous Combustion
Corpora
New
Improved
- Add and update timit recipe by @luomingshuang in #423
- Prepare segment level supervisions for AMI recipe by @desh2608 in #429
- adding ASR data prep for CH Am EN by @jtrmal in #431
New features
Faster Kaldi-compatible feature extraction on CPU and CUDA using kaldifeat
- Add Lhotse wrappers for kaldifeat-based feature extractors by @pzelasko in #424
- Method for batch feature extraction from CutSet that supports CUDA extractors by @pzelasko in #422
Others
- CLI manimulation subset cmd adds cut_ids arguments by @oplatek in #425
- Function to plot alignments in a Cut by @pzelasko in #436
General improvements
- Support for PyTorch 1.10.0 and 1.8.2 by @pzelasko in #427
- Adding durations without FP precision issues by @pzelasko in #428
- fix doc strings by @jtrmal in #430
- add the auto-generated files/dirs into ignore list by @jtrmal in #432
- [Kaldi-related] Mapping underscores for utt-ids + num-jobs option for import by @pzelasko in #435
- set the channel info correctly by @jtrmal in #437
- make the callhome asr utterances following specific format for QOL by @jtrmal in #442
Happy Yodeling
Corpora
- HiFiTTS data prep (#410)
- CMU Indic data prep (#411)
- ADEPT data prep (#413)
- Reduced memory usage in CommonVoice data prep (#414)
- fixes in Switchboard recipe (#419)
New features
- Saving and restoring the sampler's state for resuming training exactly where it left off (#417, #418)
General improvements
Spacious Glacier
Corpora
- CommonVoice data preparation recipe (#400)
Breaking changes
- removed
len()
attribute from CutSamplers (#392)
General improvements
- Options to preserve cut IDs in cut operations and transforms (#405)
- faster
import lhotse
and CLI start-up (#403) - versioning improvements (#401)
- fixes in
cuts.subset()
(#395 #398, thanks @janvainer) - Count cuts discarded with
.filter()
into the sampler report (#391) - Option for ZipSampler to return tuples of CutSets (#390)
Thin Ribbon of Snow
Breaking changes
- Lhotse
CutSampler
classes now return mini-batchCutSet
s instead of a list of string cut IDs (Lhotse Dataset classes are adjusted correspondingly) (#345) - Cut refactoring (
Cut
is now an abstract base class for all cut types; what was previously calledCut
is now calledMonoCut
) (#328) - CLI:
lhotse obtain
is nowlhotse download
(#329)
Corpora
- TIMIT (#324 thanks @luomingshuang)
- Fisher English (#374 thanks @videodanchik)
- Fisher Spanish (#376 thanks @videodanchik)
- yesno (#380 thanks @csukuangfj)
- improvements to GigaSpeech recipe (#329 #334 #337 #381 thanks @jimbozhang)
- including word alignments in LibriSpeech recipe (#379)
New features
CutSampler improvements (PyTorch data API)
ZipSampler
for batches constructed from different cut sources (#344 #347 #363 thanks for fixes @janvainer)drop_last
option andget_report()
method for cut samplers (#357)find_pessimistic_batches
utility to help fail fast with GPU OOM (#358)- streaming variant of shuffling for lazy CutSets in samplers (#359)
- a bucketing method with equal cumulative bucket duration for
BucketingSampler
(#365) - approximate proportional sampling in
BucketingSampler
(#372)
I/O improvements
- chunked OPUS file reads (#339)
- chunked sphere file reads (#367 thanks @videodanchik)
- faster
OnTheFlyFeatures
(padding audio instead of features) (#352) ChunkedLilcomHdf5Writer
(and reader) for efficient chunk reads of lilcom-compressed arrays (#334)- a global cache for re-using smart_open connection sessions (improves performance for repeated smart_open calls e.g., to S3) (#335, thanks @oplatek)
Data augmentation
- tempo perturbation (#375 thanks @janvainer)
- volume perturbation (#382 thanks @videodanchik)
Others
CutSet.trim_to_supervision
has new arguments for including actual acoustic context next to the supervisions (#330 #331)SupervisionSegment
is now mutable (and all Lhotse manifests will remain mutable) (#333).shuffle()
method for Lhotse*Set
classes (#341)lhotse fix
CLI (#360)lhotse install-sph2pipe
for handling LDC corpora compressed withshorten
(auto-registers sph2pipe so no further actions are needed) (#370)
General improvements
- refreshed docs (#327 #328 #330)
- improvements to downloading corpora (#340)
- experimental dataloader that allows two levels of parallelism (#343, might be abandoned for other alternatives)
- auto-detection of compatible torchaudio version for pytorch (#348)
- improvements to Kaldi data dir import/export (#351 #354)
- fixed cut ordering in
CutSet.subset(cut_ids=...)
(#353) - improvements to storing cuts as recordings (#355)
- refactored
lhotse.dataset.sampling
file into a directory module (#366) - improvements to CLI (#369 #371 thanks @songmeixu)
- improvements to setup (#377 #383 thanks @songmeixu)
- Colab notebook with ESPnet + Lhotse example (#384)
- improvements to Lhotse versioning (#385)
Melting Away
New corpora
- GigaSpeech (#283, thanks @jimbozhang)
- Dihard 3 (#287, thanks @desh2608)
- GALE Arabic and Mandarin (#296, thanks @desh2608)
- CMU and CSLU Kids (#297, thanks @desh2608)
- MTedX (#301, thanks @m-wiesner)
- LibriTTS (#306)
New features
- Reading huge manifests lazily with Apache Arrow (documentation and examples are coming) (#286, #288, #289, #290, #292, #294)
- Sequential JSONL writer storing manifests on disk as they are created (#302)
- Support for alignments in
SupervisionSegment
(#304, #310, #313, thanks @desh2608) - PyTorch Kaldi-compatible feature extractors that support GPU, batching and autograd (#307, thanks @jesus-villalba)
- Reading, writing, and uploading features to URLs (e.g. S3 or GCP) (#312)
- Store waveforms of cuts as audio recordings to disk (#316, thanks @entn-at)
- Support for importing Kaldi's feats.scp and reading features directly from scp/ark (#318)
General improvements
- add multi thread to process AIShell data (#259, thanks @pingfengluo)
- tracking dev versions (#291, thanks @oplatek)
- Explicitly set UTF-8 encoding when reading README.md in setup.py (#293, thanks @entn-at)
- Auto-add link to source code in docs (#295)
cut.resample()
(#299)- fixing flaky tests (#300)
- fix AMI CLI mode (#303, thanks @desh2608)
- handle zero energy error in audio mixing (#305)
- update Kaldi related docs (#308)
- add a missing SpecAugment parameter (#309)
- fixing edge cases for audio transforms (#311)
- Add
drop_last
option in*Set.split()
(#315) - Support h5py file modes in feature writers (#317)
- don't using kaldi reco2dur and fix some error in bin/lhotse (#318, thanks @shanguanma)
- Fix cut num of samples bug (#322, thanks @dophist)
- use whitespace in kaldi field-splitting (#323, thanks @dophist)
Thawing Potion
New corpora
- CMU Arctic (#225)
- L2 Arctic (#227, #251)
- VCTK (#228, #253, #254)
- CallHome English (#278)
- CallHome Egyptian (#208)
- Multilingual Librispeech (#282 -- can be quite slow but we plan on improving the speed in further releases)
Features
PyTorch Dataset API
- Lhotse's samplers are now fully deterministic, have
len()
that returns the number of batches, and return a consistent number of batches in all distributed workers. (#213, #222, #223, #224, #255, #267, thanks @janvainer) - On-the-fly feature extraction in PyTorch datasets (#229)
- visualisations of ASR batches with multiple transforms applied (#234)
Features and transforms
- Add LIbrosaFbank consistent with various TTS applications (#252, thanks @janvainer)
- SpecAugment (#246)
- option to pad cuts from left/right/both directions (#216)
- Randomized smoothing augmentation (#272, #273, #274)
- Randomized extra padding (#281)
I/O and serialization
- [experimental] Downloading audio from HTTP/S3/GCP/Azure URLs upon request (#233)
- Use HDF5 as the default storage backend for features (#237)
- Add JSONL support (#262)
- Support for auto-magically determined serializers (CLI + Python API) (#264)
Removed features
- Removed WavAugment support (use torchaudio.sox_effects instead) (#232)
General improvements
- Add tolerance to validate_recordings_and_supervisions (#208, thanks @janvainer )
- Fix incorrect truncation in cut mixing for data augmentation (#214)
- Return lengths from feature and token collations (#211, thanks @janvainer)
- Refactor Standardize to GlobalMVN (#230, thanks @janvainer)
- Fix rare error in randomized Recording's resampling test (#239)
- Fix concatenate cuts omitting the longest cut when duration_factor > 1 (#240)
- Fix CutMix not adding enough noise in long cuts (#241)
- Add max_cuts keyword to global stats computation in GlobalMVN (#245)
- Improved error message for mixing audio (#248)
- Add a check for matching sampling rates when mixing cuts (#247)
- Fix - make VCTK CLI discoverable (#250)
- Fix trim_to_supervisions and CLI (#249)
- Fix find segments float rounding issues (#265)
- Update the examples of libirispeech (full) and ami (#268, thanks @jimbozhang)
- Fix test for sphere files (#269, thanks @csukuangfj)
- More informative error message for incorrect channels in load_audio() (#270)
0.5 - Ice Melt
New features:
Major overhaul of support for PyTorch Dataset API (#194 #197 #202)
Lhotse now implements a number of PyTorch datasets and samplers. The core features are:
- familiar API (map-style datasets and cut samplers that work with standard
DataLoader
) - dynamic batch size, chosen based on constraints such as
max_frames
- bucketing or cut concatenation as strategies for avoiding too much padding
- optional noise padding (using
CutMix
transform) - our samplers work with DDP training out-of-the-box (no need for
DistributedSampler
) - More details available at: https://lhotse.readthedocs.io/en/latest/datasets.html
Example code:
from torch.utils.data import DataLoader
from lhotse.dataset import SpeechRecognitionDataset, SingleCutSampler
cuts = CutSet(...)
dset = SpeechRecognitionDataset(cuts)
sampler = SingleCutSampler(cuts, max_frames=50000)
# Dataset performs batching by itself, so we have to indicate that
# to the DataLoader with batch_size=None
dloader = DataLoader(dset, sampler=sampler, batch_size=None, num_workers=1)
for batch in dloader:
... # process data
Lazy (on-the-fly) resampling on Recording/RecordingSet (#185)
The resampling is performed at the moment of reading the audio samples from disk. It automatically adjusts the duration/num_samples in the data manifest.
recording = recording.resample(22050)
recording_set = recording_set.resample(8000)
New corpora:
General improvements:
CutSet.subset()
gotfirst
andlast
arguments (like Kaldi'ssubset_data_dir.sh
) and a CLI mode (#188)CutSet.from_manifest()
creates deterministic Cut IDs by default (#186)- Padding cuts with arbitrary user specified values (now also works with custom feature extractors) (#187)
- Improved code coverage measurements (now excludes test code and recipe code) (#191 #192)
- Improved support for sampling rates other than 8k and 16k (#190 #195)
- Documentation build fixes (#196)
- Fixes in NSC recipe (#199)
- Fixes in ASR dataset validation (#204)
0.4 - Passing the North Glacier
New features:
- Lazy time-domain speed perturbation of Recording/Cut that also adjusts supervision segments (#167)
cuts_sp = cuts.perturb_speed(0.9)
- Manifest validation (
lhotse.validate()
) (#175)
lhotse.validate(cuts)
- Parallel feature extraction API lifting (#176)
# As simple as:
cuts = cuts.compute_and_store_features(lhotse.Fbank(), 'path/to/feats', num_jobs=20)
- Support for using HDF5 storage with parallel feature extraction (#176)
# Modify the above with:
cuts = cuts.compute_and_store_features(lhotse.Fbank(), 'path/to/feats', num_jobs=20, storage_type=lhotse.LilcomHdf5Storage)
CutSet
mixing for noise data augmentation (#180)
# Can be performed after feature extraction for dynamic feature-domain mixing!
cuts = cuts.mix(noise_cuts, snr=[10, 30], mix_prob=0.5)
- On-the-fly noise data augmentation for K2 ASR (#180)
New corpora:
General improvements:
- LibriSpeech recipe API lifting and major preparation speedup (#163)
- Stop using deprecated torchaudio.info (#164)
- CutSet
map()
andmodify_ids()
methods (#165) - Parallelism: Executor concept documentation (#152)
- Single/multi channel audio/features collation methods for a batch of Cuts (#173)
- Cache data manifests for Mobvoi (#168, thanks @freewym)
- High-level workflow illustrations in docs (#178)