Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src,build] Add minimal example to demonstrate pybind11 binding #3668

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

danpovey
Copy link
Contributor

No description provided.

@pzelasko
Copy link
Contributor

Perhaps you'll find scikit-build helpful - it's a utility for building different kinds of native extensions for Python with support for CMake (when/if you decide to migrate to CMake) https://github.com/scikit-build/scikit-build

@danpovey
Copy link
Contributor Author

danpovey commented Oct 20, 2019 via email

danpovey and others added 8 commits October 22, 2019 21:03
In addition, refactor kaldi pybind to be more modular.
* [egs] New chime-5 recipe (kaldi-asr#2893)

* [scripts,egs] Made changes to the augmentation script to make it work for ASR and speaker ID (kaldi-asr#3119)

Now multi-style training with noise and reverberation is an option (instead of speed augmentation).
Multi-style training seems to be more robust to unseen/noisy conditions.

* [egs] updated local/musan.sh to steps/data/make_musan.sh in speaker id scripts (kaldi-asr#3320)

* [src] Fix sample rounding errors in extract-segments (kaldi-asr#3321)

With a segments file constructed from exact wave file durations
some segments came out one sample short. The reason is the
multiplication of the float sample frequency and double audio
time point is inexact. For example, float 8000.0 multiplied
by double 2.03 yields 16239.99999999999, one LSB short of the
correct sample number 16240.

Also changed all endpoint calculations so that they performed
in seconds, not sample numbers, as this does not require a
conversion in nearly every comparison, and report positions in
diagnostic messages also in seconds, not sample numbers.

* [src,scripts]Store frame_shift, utt2{dur,num_frames}, .conf with features (kaldi-asr#3316)

Generate utt2dur and utt2num_frames during feature extraction,
and store frame period in frame_shift file in feature directory.

Copy relevant .conf files used in feature extraction into
the conf/ subdirectory with features.

Add missing validations and options in some extraction scripts.

* [build] Initial version of Docker images for (CPU and GPU versions) (kaldi-asr#3322)

* [scripts] fix typo/bug in make_musan.py (kaldi-asr#3327)

* [scripts] Fixed misnamed variable in data/make_musan.py (kaldi-asr#3324)

* [scripts] Trust frame_shift and utt2num_frames if found (kaldi-asr#3313)

Getting utt2dur involves accessing wave files, and potentially
running full pipelines in wav.scp, which may take hours for a
large data set. If utt2num_frames exists, use it instead if
frame rate is known.

Issue: kaldi-asr#3303
Fixes: kaldi-asr#3297 "cat: broken pipe"

* [scripts] typo fix in augmentation script (kaldi-asr#3329)

Fixes typo in kaldi-asr#3119

* [scripts] handle frame_shit and utt2num_frames in utils/ (kaldi-asr#3323)

subset_data_dir.sh has been refactored thoroughly so that its
logic can be followed easier. It has been well tested and
dogfooded.

All changes here are necessary to subset, combine and verify
utt2num_frames, and copy frame_shift to new directories where
necessary.

* [scripts] Extend combine_ali_dirs.sh to combine alignment lattices (kaldi-asr#3315)

Relevant discussion:
https://groups.google.com/forum/#!topic/kaldi-help/2uxfByEAmfw

* [src] Fix rare case when segment end rounding overshoots file end in extract-segments (kaldi-asr#3331)

* [scripts] Change --modify-spk-id default to False; back-compatibility fix for kaldi-asr#3119 (kaldi-asr#3334)

* [build] Add easier configure option in failure message of configure (kaldi-asr#3335)

* [scripts,minor] Fix typo in comment (kaldi-asr#3338)

* [src,egs] Add option for applying SVD on trained models (kaldi-asr#3272)

* [src] Add interfaces to nnet-batch-compute that expects device input. (kaldi-asr#3311)

This avoids a ping pong of memory to host.

Implementation now assumes device memory.  interfaces will allocate
device memory and copy to it if data starts on host.

Add a cuda matrix copy function which clamps rows.  This is much
faster than copying one row at a time and the kernel can handle the
clamping for free.

* [build] Update GCC support check for CUDA toolkit 10.1 (kaldi-asr#3345)

* [egs] Fix to aishell1 v1 download script (kaldi-asr#3344)

* [scripts] Support utf-8 files in some scripts (kaldi-asr#3346)

* [src] Fix potential underflow bug in MFCC, RE energy floor, thx: Zoltan Tobler (kaldi-asr#3347)

* [scripts]: add warning to nnet3/chain/train.py about ineffective options (kaldi-asr#3341)

* [scripts] Fix regarding UTF handling in cleanup script (kaldi-asr#3352)

* [scripts] Change encoding to utf-8 in data augmentation scripts (kaldi-asr#3360)

* [src] Add CUDA accelerated MFCC computation. (kaldi-asr#3348)

* Add CUDA accelerated MFCC computation.

Creates a new directory 'cudafeat' for placing cuda feature extraction
components as it is developed.  Added a directory 'cudafeatbin' for
placing binaries that are cuda accelerated that mirrior binaries
elsewhere.

This commit implements:
  feature-window-cuda.h/cu which implements a feature window on the device
    by copying it from a host feature window.
  feature-mfcc-cuda.h/cu which implements the cuda mfcc feature
    extractor.
  compute-mfcc-feats-cuda.cc which mirriors compute-mfcc-feats.cc

  There were also minor changes to other files.

* Only build cuda binaries if cuda is enabled

* [src] Optimizations for batch nnet3.  The issue fixed here is that (kaldi-asr#3351)

small cuda memory copies are inefficeint because each copy can
add multiple micro-seconds of latency.  The code as written
would copy a small matrices or vectors to and from the tasks one
after another.  To avoid this i've implemented a batched matrix
copy routine.  This takes arrays of matrix descriptions for the
input and output and batches the copies in a single kernel call.
This is used in both FormatInputs and FormatOutputs to reduce
launch latency overhead.

The kernel for the batched copy uses a trick to avoid a memory
copy of the host paramters.  The parameters are put into a struct
containing a static sized array.  These parameters are then marshalled
like normal cuda parameters.  This avoids additional launch latency
overhead.

There is still more work to do at the beginning and end of nnet3.
In particular we may want to batch the clamped memory copies and
the large number of D2D copies at the end.  I haven't fully tracked
those down and may return to them in the future.

* [scripts,minor] Remove outdated comment (kaldi-asr#3361)

* [egs] A kaldi recipe based on the corpus named "aidatatang_200zh". (kaldi-asr#3326)

* [src] nnet1: changing end-rule in 'nnet-train-multistream', (kaldi-asr#3358)

- end the training when there is no more data to refill one of the streams,
- this avoids overtraining to the 'last' utterance,

* [scripts] Fix how the empty (faulty?) segments are handled in data-cleanup code (kaldi-asr#3337)

* [src] Fix to bug in ivector extraction causing assert failure, thx: sray (kaldi-asr#3364)

* [src] Fix to bug in ivector extraction causing assert failure, thx: sray (kaldi-asr#3365)

* [scripts] add script to compute dev PPL on kaldi-rnnlm (kaldi-asr#3340)

* [scripts,egs] Small fixes to diarization scripts (kaldi-asr#3366)

* [egs] Modify split_scp.pl usage to match its updated code (kaldi-asr#3371)

* [src] Fix non-cuda `make depend` build by putting compile guards around header. (kaldi-asr#3374)

* [build] Docker docs update and minor changes to the Docker files  (kaldi-asr#3377)

* [egs] Scripts for MATERIAL ASR (kaldi-asr#2165)

* [src] Batch nnet3 optimizations.  Batch some of the copies in and copies out (kaldi-asr#3378)

* [build] Widen cuda guard in cudafeat makefile. (kaldi-asr#3379)

* [scripts] nnet1: updating the scripts to support 'online-cmvn', (kaldi-asr#3383)

* [build,src] Enhancements to the cudamatrix/cudavector classes. (kaldi-asr#3373)

* Added CuSolver to the matrix class.  This is only supported with
Cuda 9.1 or newer.  Calling CuSolver code without Cuda 9.1 or newer
will result in a runtime error.

This change required some changes to the build system which requires
versioning the configure script. This forces everyone to reconfigure.
Failure to reconfigure would result in linking and build errors on
some systems.

* [egs] Fix perl `use encoding` deprecation (kaldi-asr#3386)

* [scripts] Add max_active to align_fmllr_lats.sh to prevent rare crashes (kaldi-asr#3387)

* [src] Implemented CUDA acclerated online cmvn. (kaldi-asr#3370)

This patch is part of a larger effort to implement the entire online feature pipeline in CUDA so that wav data is transfered to the device and never copied back to the host.
This patch includes a new binary cudafeatbin/apply-cmvn-online.cc which for the most part matches online2bin/apply-cmvn-online.
This binary is primarily for correctness testing and debugging as it makes no effort to compute multiple features in parallel on the device.
The CUDA performance is dominiated by the cost of copying the feature to and from the device. While there is a small speedup I do not expect this binary to be used in production.
Instead users will use the upcomming online-pipeline which will take features directly from the mfcc computation on the device and pass results to the next part of the pipeline.

Summary of changes:

Makefile:
   Added online2 dependencies to cudafeat, cudafeatbin, cudadecoder, and cudadecoderbin.
cudafeat\:
   Makefile:  added online2 dependency, added new .cu/.h files
   feature-online-cmvn-cuda.cu/h:  implements online-cmvn in cuda.
cudafeatbin\:
   Makefile:  added new binary, added online2 dependency
   apply-cmvn-online-cuda.cc:  binary which mimics online2bin/apply-cmvn-online

Correctness testing:

The correctness was tested by generating set of 20000 features and then running the CPU binary and GPU binary and comparing results using featbin/compare-feats.

../online2bin/apply-cmvn-online /workspace/models/LibriSpeech/ivector_extractor/global_cmvn.stats "scp:mfcc.scp" "ark,scp:cmvn.ark,cmvn.scp"
./apply-cmvn-online-cuda /workspace/models/LibriSpeech/ivector_extractor/global_cmvn.stats "scp:mfcc.scp" "ark,scp:cmvn-cuda.ark,cmvn-cuda.scp"

../featbin/compare-feats ark:cmvn-cuda.ark ark:cmvn.ark
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:105) self-product of 1st features for each column dimension:  [ 5.52221e+09 9.1134e+09 5.92818e+09 7.42173e+09 7.48633e+09 7.21316e+09 6.9515e+09 7.03883e+09 6.40267e+09 5.83088e+09 5.01438e+09 5.1575e+09 4.28688e+09 3.529e+09 3.12182e+09 2.28721e+09 1.76343e+09 1.35117e+09 8.72517e+08 5.31836e+08 2.65112e+08 9.20308e+07 1.24084e+07 3.56008e+06 4.25283e+07 1.09786e+08 1.88937e+08 2.60207e+08 3.23115e+08 3.56371e+08 3.69035e+08 3.65216e+08 3.89125e+08 4.07064e+08 3.40407e+08 2.65444e+08 2.50244e+08 2.05726e+08 1.60606e+08 1.07217e+08 ]

LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:106) self-product of 2nd features for each column dimension:  [ 5.5223e+09 9.11355e+09 5.92812e+09 7.4218e+09 7.48666e+09 7.21338e+09 6.95174e+09 7.03895e+09 6.40254e+09 5.83113e+09 5.01411e+09 5.15774e+09 4.28692e+09 3.52918e+09 3.122e+09 2.28693e+09 1.76326e+09 1.3513e+09 8.72521e+08 5.31802e+08 2.65137e+08 9.20296e+07 1.2408e+07 3.5604e+06 4.25301e+07 1.09793e+08 1.88933e+08 2.60217e+08 3.23124e+08 3.56371e+08 3.69007e+08 3.65176e+08 3.89104e+08 4.07067e+08 3.40416e+08 2.65498e+08 2.50196e+08 2.057e+08 1.60612e+08 1.07192e+08 ]

LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:107) cross-product for each column dimension:  [ 5.52209e+09 9.11229e+09 5.92538e+09 7.41665e+09 7.47877e+09 7.20269e+09 6.93785e+09 7.02284e+09 6.38411e+09 5.81143e+09 4.99389e+09 5.13753e+09 4.26792e+09 3.51154e+09 3.10676e+09 2.27436e+09 1.75322e+09 1.34367e+09 8.67367e+08 5.28672e+08 2.63516e+08 9.14194e+07 1.23215e+07 3.53409e+06 4.21905e+07 1.08872e+08 1.87238e+08 2.57779e+08 3.19827e+08 3.5252e+08 3.64691e+08 3.60529e+08 3.84482e+08 4.02396e+08 3.36136e+08 2.61631e+08 2.46931e+08 2.03079e+08 1.5856e+08 1.05738e+08 ]

LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:111) Similarity metric for each dimension  [ 0.99997 0.999871 0.999532 0.999311 0.998968 0.998533 0.998019 0.997719 0.997111 0.996644 0.995941 0.996104 0.995572 0.995028 0.995147 0.994445 0.994258 0.994402 0.994095 0.994084 0.993934 0.993363 0.993015 0.992655 0.992037 0.991645 0.991017 0.990649 0.98981 0.989195 0.988267 0.987222 0.988093 0.98853 0.987442 0.985534 0.986858 0.987196 0.987242 0.986318 ]
 (1.0 means identical, the smaller the more different)
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:116) Overall similarity for the two feats is:0.993119 (1.0 means identical, the smaller the more different)
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:119) Processed 20960 feature files, 0 had errors.
LOG (compare-feats[5.5.1301~3-17818]:main():compare-feats.cc:126) Features are considered similar since 0.993119 >= 0.99

* [egs] Fixed file path RE augmentation, in aspire recipe (kaldi-asr#3388)

* [scripts] Update taint_ctm_edits.py, RE utf-8 encoding (kaldi-asr#3392)

* [src] Change nnet3-am-copy to allow more manipulations (kaldi-asr#3393)

* [egs] Remove confusing setting of overridden num_epochs variable in aspire (kaldi-asr#3394)

* [build] Add a missing dependency for "decoder" in Makefile (kaldi-asr#3397)

* [src] CUDA decoder performance patch (kaldi-asr#3391)

* [build,scripts] Dependency fix; add cross-references to scripts (kaldi-asr#3400)

* [egs] Fix cleanup-after-partial-download bug in aishell  (kaldi-asr#3404)

* [src] Change functions like AppiyLog() to all work out-of-place (kaldi-asr#3185)

* [src] Make stack trace display more user friendly (kaldi-asr#3406)

* [egs] Fix to separators in Aspire reverb recipe (kaldi-asr#3408)

* [egs] Fix to separators in Aspire, related to kaldi-asr#3408 (kaldi-asr#3409)

* [src] online2-tcp, add option to display start/end times (kaldi-asr#3399)

* [src] Remove debugging assert in cuda feature extraction code (kaldi-asr#3411)

* [scripts] Fix to checks in adjust_unk_graph.sh (kaldi-asr#3410)

bash test `-f` does not work for `phones/` which is a directory. Changed it to `-e`.

* [src] Added GPU feature extraction (will improve speed of GPU decoding) (kaldi-asr#3390)

Currently only supports MFCC features.

* [src] Fix build error introducted by race condition in PR requests/accepts. (kaldi-asr#3412)

* [src] Added error string to CUDA allocation errors. (kaldi-asr#3413)

* [src] Fix CUDA_VSERION number in preprocessor checks (kaldi-asr#3414)

* [src] Fix build of online feature extraction with older CUDA version (kaldi-asr#3415)

* [src] Update Insert function of hashlist and decoders (kaldi-asr#3402)

makes interface of HashList more standard; slight speed improvement.

* [src] Fix spelling mistake in kaldi-asr#3415 (kaldi-asr#3416)

* [build] Fix configure bug RE CuSolver (kaldi-asr#3417)

* [src] Enable an option to use the GPU for feature extraction in GPU decoding (kaldi-asr#3420)

This is turned on by using the option
--gpu-feature-extract=true.  By default this is on.  We provie the
option to turn it off because in situations where CPU resources
are unconfined you can get slightly higher performance with CPU
feature extraction but in most cases GPU feature extraction is faster
and has more stable performance.  In addition a user may wish to
turn it off to support models where feature extraction is currently
incomplete (e.g. FBANK, PLP, PITCH, etc).  We will add those
features in the future but for now a user wanted to decode those
models should place feature extraction on the host.

* [egs] Replace $cuda_cmd with $train_cmd for FarsDat (kaldi-asr#3426)

* [src] Remove outdated comment (kaldi-asr#3148) (kaldi-asr#3422)

* [src] Adding missing thread.join in CUDA decoder and fixing two todos (kaldi-asr#3428)

* [build] Add missing lib dependency in cudafeatbin (kaldi-asr#3427)

* [egs] Small fix to aspire run_tdnn_7b.sh (kaldi-asr#3429)

* [build] Fix to cuda makefiles, thanks: [email protected] (kaldi-asr#3431)

* [build] Add missing deps to cuda makefiles, thanks: [email protected] (kaldi-asr#3432)

* [egs] Fix encoding issues in Chinese ASR recipe (kaldi-asr#3430) (kaldi-asr#3434)

* Revert "[src] Update Insert function of hashlist and decoders (kaldi-asr#3402)" (kaldi-asr#3436)

This reverts commit 5cc7ce0.

* [src] Update Insert function of hashlist and decoders (kaldi-asr#3402) (kaldi-asr#3438)

makes interface of HashList more standard; slight speed improvement.  Fixed version of kaldi-asr#3402

* [build] Fix the cross-compiling issue for Android under MacOS (kaldi-asr#3435)

* [src] Marking operator as __host__ __device__ to avoid build issues (kaldi-asr#3441)

avoids cudafeat build failures with some CUDA toolkit versions

* [egs] Fix perl encoding bug (was causing crashes) (kaldi-asr#3442)

* [src] Cuda decoder fixes, efficiency improvements (kaldi-asr#3443)

* [scripts] Fix shebang of taint_ctm_edits.py to invoke python3 directly (kaldi-asr#3445)

* [src] Fix to a check in nnet-compute code (kaldi-asr#3447)

* [src,scripts] Various typo fixes and stylistic fixes (kaldi-asr#3153)

* [scripts] Scripts for VB (variational bayes) resegmentation for Callhome diarization (kaldi-asr#3305)

This refines the segment boundaries.  Based on code originally by Lukas Burget from Brno.

* [scripts] Extend utils/data/subsegment_data_dir.sh to copy reco2dur (kaldi-asr#3452)

* [src,scripts,egs]  Add code and example for SpecAugment in nnet3 (kaldi-asr#3449)

* [scripts] Make segment_long_utterance honor frame_shift (kaldi-asr#3455)

* [scripts] Fix to steps/nnet/train.sh (nnet1) w.r.t. incorrect bash test expressions (kaldi-asr#3456)

* [egs] fixed bug in egs/gale_arabic/s5c/local/prepare_dict_subword.sh that it may delete words matching '<*>' (kaldi-asr#3465)

* [src,build] Small fixes (kaldi-asr#3472)

* [src] fixed warning: moving a temporary object prevents copy elision

* [scripts] tools/extras/check_dependencies.sh look for alternative MKL locations via MKL_ROOT environment variable

* [src] Fixed compilation error when DEBUG is defined

* [egs] Add MGB-2 Arabic recipe (kaldi-asr#3333)

* [scripts] Check/fix utt2num_frames when fixing data dir. (kaldi-asr#3482)

* [src] A couple small bug fixes. (kaldi-asr#3477)

* [src] A couple small bug fixes.

* [src] Fix RE pdf-class 0, which is valid in pre-kaldi10

* [src,scripts] Cosmetic,file-mode fixes; fix to nnet1 align.sh introduced in kaldi-asr#3383 (kaldi-asr#3487)

* [src] Small cosmetic and file-mode fixes

* [src] Fix bug in nnet1 align.sh introduced in kaldi-asr#3383

* [egs] Add missing script in MGB2 recipe (kaldi-asr#3491)

* [egs] Fixing nnet1 but introduced in kaldi-asr#3383 (rel. to kaldi-asr#3487) (kaldi-asr#3494)

* [src] Fix for nnet3 bug encountered when implementing deltas. (kaldi-asr#3495)

* [scripts,egs] Replace LDA layer with delta and delta-delta features (kaldi-asr#3490)

WER is about the same but this is simpler to implement and more standard.

* [egs] Add updated tdnn recipe for AMI (kaldi-asr#3497)

* [egs] Create MALACH recipe based on s5b for AMI (kaldi-asr#3496)

* [scripts] add --phone-symbol-table to prepare_lang_subword.sh (kaldi-asr#3485)

* [scripts] Option to prevent crash when adapting on much smaller data (kaldi-asr#3506)

* [build,scripts] Make OpenBLAS install check for gfortran; documentation fix (kaldi-asr#3507)

* [egs] Update chain TDNN-F recipe for CHIME s5 to match s5b, improves results (kaldi-asr#3505)

* [egs] Fix to kaldi-asr#3505: updating chime5 TDNN-F script (kaldi-asr#3508)

* [scripts] Fixed issue that leads to empty segment file (kaldi-asr#3510)

Fixed issue that leads to empty segment file on SSD disk (more detail: https://groups.google.com/forum/#!topic/kaldi-help/Ij3lQLCinN8)

* [egs] Fix bug in AMI s5b RE overlapping segments that causes Fixed overlap segment bug (kaldi-asr#3503)

* [egs] Small cosmetic change in extend_vocab_demo.sh (kaldi-asr#3516)

* [src] Cosmetic changes; fix windows-compile bug reported by @spencerkirn (kaldi-asr#3515)

* [src] Move cuda gpu from nnetbin to nnet3bin.  (kaldi-asr#3513)

* fix a bug in egs/voxceleb/v1/local/make_voxceleb1_v2.pl when preparing the file data/voxceleb1_test/trials (kaldi-asr#3512)

* [egs] Fixed some bugs in mgb_data_prep.sh of mgb2_arabic (kaldi-asr#3501)

* [src,scripts] fix various typos and errors in comments (kaldi-asr#3454)

* [src] Move cuda-compiled to nnet3bin (kaldi-asr#3517)

* [src] Fix binary name in Makefile, RE cuda-compiled (kaldi-asr#3518)

* [src] buffer fix in cudafeat (kaldi-asr#3521)

* [src] Hopefully make it possible to use empty FST in grammar-fst (kaldi-asr#3523)

* [src] Add option to convert pdf-posteriors to phones (kaldi-asr#3526)

* [src] Fix GetDeltaWeights for long-running online decoding (kaldi-asr#3528)

* Fix GetDeltaWeights for long-running online decoding. Use frame count relative to decoder start internally in silence weighting and update to the frame count relative to pipeline only once result is calculated.

* Add a note about transition to a new function API

* [src] Small fix to post-to-phone-post.cc (problem introduced in from kaldi-asr#3526) (kaldi-asr#3534)

* [src]: adding Dan's fix to a bug in nnet-computation-graph (kaldi-asr#3531)

* [egs] Replace prep_test_aspire_segmentation.sh (kaldi-asr#2943) (kaldi-asr#3530)

* [egs] OCR: Decomposition for CASIA and YOMDLE_ZH datasets (kaldi-asr#3527)

* [build] check_dependencies.sh: correct installation command for fedora (kaldi-asr#3539)

* [src,doc] Fix bug in new option of post-to-phone-post; skeleton of faq page (kaldi-asr#3540)

* [egs,scripts]  Adding possibility to have 'online-cmn' on input of 'nnet3' models (kaldi-asr#3498)

* [scripts] Fix to build_tree_multiple_sources.sh (kaldi-asr#3545)

* [doc] Fix accidental overwriting of kws page (kaldi-asr#3541)

* [egs] Fix regex filter in Fisher preprocessing (was excluding 2-letter sentences like "um") (kaldi-asr#3548)

* [scripts] Fix to bug introduced in kaldi-asr#3498 RE ivector-extractor options mismatch. (kaldi-asr#3549)

* [scripts] Fix awk compatibility issue; be more careful about online_cmvn file (kaldi-asr#3550)

* [src] Add a method for backward-compatibility with previous API (kaldi-asr#3536)

* [src] Feature bank feature extraction using CUDA (kaldi-asr#3544)

Following this change, both MFCC and fbank run
through a single code path with parameters
(use_power, use_log_fbank and use_dct) controlling
the flow.

CudaMfcc has been renamed to CudaSpectralFeatures.

It contains an MfccOptions structure which contains
FrameOptions and MelOptions. It can be initialized
either with an MfccOptions object or an FbankOptions
object.

Compared with CudaMfccOptions, CudaSpectralOptions also
contains these parameters

use_dct  - switches on the discrete cosine and lifter
use_log_fbank - takes the log of the MEL banks values
use_power - uses power in place of abs(amplitude)

Each of these is defaulted on for MFCC. For fbank,
use_dct is set to false. The others are set by user
parameters.

Also added a unit test for CUDA Fbank
(cudafeatbin/compute-fbank-feats-cuda).

* [src] Fix missing semicolon (kaldi-asr#3551)

* [src] fix a typo mistaking not equal for assign sign in CUDA feature pipeline (kaldi-asr#3552)

* [src] Fix issue kaldi-asr#3401 (crash in ivector extraction with max-remembered-frames+silence-weight) (kaldi-asr#3405)

* [egs] Librispeech: in RESULTS, change best_wer.sh => utils/best_wer.sh (kaldi-asr#3553)

* [egs] semisupervised recipes: fixing some variables in comments  (kaldi-asr#3547)

* [scripts]  fix utils/lang/extend_lang.sh to add nonterm symbols in align_lexicon.txt (kaldi-asr#3556)

* [scripts] Fix to bug in steps/data/data_dir_manipulation_lib.py (kaldi-asr#3174)

* [src] Fix in nnet3-attention-component.cc, RE required context (kaldi-asr#3563)

* [src] Temporarily turn off some model-collapsing code while investigating a bug (kaldi-asr#3565)

* [scripts] Fix to data cleanup script RE utf-8 support in uttearnce

* [src,scripts,egs] online-cmvn for online2 with chain models, (kaldi-asr#3560)

* Add OnlineCMVN to Online NNET2 pipeline.  This is used in some models (e.g.
CVTE).  This is optional and off by default.  It applies CMVN before
applying pitch.  This code is essentially copied out of
"online2/online-feature-pipeline.cc/h".

Patch set provided by Levi Barnes.

* online2: bugfix of config script, include ivector_period into the iextractor config file.

* online-cmvn in online-nnet2-feature-pipeline,

- update the C++ code from @luitjens,
   - introduced `OnlineFeatureInterface *nnet3_feature_` to
     explictly mark features that are passed to nnet3 model
   - added the transfer of OnlineCmvnState across utterances from same speaker
- update the 'prepare_online_decoding.sh' to support online-cmvn
- enabled OnlineCmvnStats transfer in training/decoding

* OnlineNnet2FeaturePipeline, removing unused constructor, updating results

* [build,src] Change to error message; update kaldi_lm install script (and kaldi_lm) to fix compile issue (kaldi-asr#3581)

* [src] Clarify something in plda.h (cosmetic change) (kaldi-asr#3588)

* [src] Small fix to online ivector extraction (RE kaldi-asr#3401/kaldi-asr#3405), thanks: Vladmir Vassiliev. (kaldi-asr#3592)

Stops an assert failure by changing the assert condition

* [egs] Add recipe for Chinese training with multiple databases (kaldi-asr#3555)

* [src] Remove duplicate `num_done++` in apply-cmvn-online.cc (kaldi-asr#3597)

* [src] Fix to CMVN with CUDA (kaldi-asr#3593)

* * __restrict__ not __restrict__ *

Fixing a complaint during Windows compilation

* Plumbing for CUDA CMVN

* Removing detritus.

Removing comments, an unused variable and
NULL'ing some object pointers (just in case)

* [src] Fix two bugs in batched-wav-nnet3-cuda binary. (kaldi-asr#3600)

1)  Using the "Any" apis prior to finishing submission of the full group
could lead to a group finishing early.  This would cause output to
appear repeatedly.  I have not seen this occur but an audit revealed it
as an issue. The fix is to use the "Any" APIs only when full groups have
been submitted.

2) GetNumberOfTasksPending() can return zero even though groups have not
been waited for.  This API call should not be used to determine if all
groups have been completed as the number of pending tasks is independent
of the number of groups remaining.

* [scripts] propagated bad whitespace fix to validate_dict_dir.pl (cosmetic change) (kaldi-asr#3601)

* [scripts] bug-fix on subset_data_dir.sh with --per-spk option (kaldi-asr#3567) (kaldi-asr#3572)

The script failed to return the number of utterances `per-spk` requested, despite being available in the original spk2utt file.
The stop-boundary on the awk's for-loop was off by 1.

> An example of how affects the spk2utt files

```$ cat tmpData/spk2utt 
spk1 spk1-utt1 spk1-utt2 spk1-utt3
spk2 spk2-utt1
spk3 spk3-utt1 spk3-utt2
spk4 spk4-utt1 spk4-utt2 spk4-utt3 spk4-utt4
$
$ ./subset_data_dir.sh --per-spk tmpData 2 tmpData-before
$ cat tmpData-before/spk2utt 
spk1 spk1-utt1 
spk2 spk2-utt1 
spk3 spk3-utt1 
spk4 spk4-utt1 spk4-utt3 
$
$ ./subset_data_dir_MOD.sh --per-spk tmpData 2 tmpData-after
$ cat tmpData-after/spk2utt 
spk1 spk1-utt1 spk1-utt2 
spk2 spk2-utt1 
spk3 spk3-utt1 spk3-utt2 
spk4 spk4-utt1 spk4-utt3 
```

* [src] Code changes to support GCC9 + OpenFST1.7.3 + C++2a (namespace issues) (kaldi-asr#3570)

* [scripts] Training ivector-extractor: make online-cmvn per speaker. (kaldi-asr#3615)

* [src] cached compiler I/O for nnet3-xvector-compute (kaldi-asr#3197)

* [src] Fixed online2-tcp-nnet3-decode-faster.cc poll_ret checks (kaldi-asr#3611)

* [scripts] Call utils/data/get_utt2dur.sh using the correct $cmd and $nj (kaldi-asr#3608)

* [scripts] Enable tighter control of downloaded dependencies (kaldi-asr#3543) (kaldi-asr#3573)

* [scripts] Make reverberate_data_dir.py handle vad.scp (kaldi-asr#3619)

* [scripts] Don't get utt2dur in librispeech.. will be made by make_mfcc.sh now (kaldi-asr#3610) (kaldi-asr#3620)

* [scripts] Make combine_ali_dirs.sh work when queue.pl is used (kaldi-asr#3537) (kaldi-asr#3561)

* [egs] Fix duplicate removal of unk from Librispeech decoding graphs (kaldi-asr#3476) (kaldi-asr#3621)

* [build] Check for gfortran, needed by OpenBLAS (for lapack) (kaldi-asr#3622)

* [scripts] VB resegmentation: load MFCCs only when used (save memory) (kaldi-asr#3612)

* [build] removed old Docker files - see docker in the root folder for the latest files (kaldi-asr#3558)

* [src] Fix to compute-mfcc-feats.cc (thanks: @puneetbawa) (kaldi-asr#3623)

* [egs] Update AMI tdnn recipe to reflect what was really run, and add hires feats. (kaldi-asr#3578)

* [egs] Fix to run_ivector_common.sh in swbd (crash when running from stage 3) (kaldi-asr#3631)

* [scripts] Make data augmentation work with UTF-8 utterance ids/filenames (kaldi-asr#3633)

* [src] fix a bug in src/online/online-faster-decoder.cc (prevent segfault with some models) (kaldi-asr#3634)

* [scripts]  Make extend_lang.sh support lexiconp_silprob.txt (kaldi-asr#3339) (kaldi-asr#3632)

* [scripts] Fix typo in analyze_alignments.sh (kaldi-asr#3635)

* [scripts] Change the GPU policy of align.sh to wait instead of yes (kaldi-asr#3636)

This is needed to avoid failures when using more number jobs than number of GPUs. Previously when --use-gpu=yes, when all GPUs are occupied, the job will error out in 20s. Changing to the policy to wait will have correct behavior.

* [egs] Added tri3b and chain training for Aurora4 (kaldi-asr#3638)

* [build] fixed broken Docker builds by adding gfortran package (kaldi-asr#3640)

* [src] Fix bug in resampling checks for online features (kaldi-asr#3639)

The --allow-upsample and --allow-downsample options were not handled correctly in the code that handles resampling for computing online features.

* [build] Bump OpenBLAS version to 0.3.7 and enable locking (kaldi-asr#3642)

Bumps OpenBLAS version to 0.3.7 for improved performance with Zen architecture. Also sets USE_LOCKING which fixes issue when calling a single-threaded OpenBLAS from several threads in parallel

* [scripts] Update nnet3_to_dot.py to ignore Scale descriptors (kaldi-asr#3644)

* [src] Speed optimization for decoders (call templates properly) (kaldi-asr#3637)

* [doc] update FAQ page to include some FAQ candidates from Kaldi mailing lists (kaldi-asr#3646)

* [egs] Small fix: duplicate xent settings from examples (kaldi-asr#3649)

* [doc] Fix typo (kaldi-asr#3648)

* [egs] Fix a bug in Tedlium run_ivector_common.sh (kaldi-asr#3647)

Running perturb_data_dir_speed_3way.sh after making MFCC would cause an error, saying "feats.scp already exists". It is also uncessary to run it twice.

* [src] Fix to matrix-lib-test to ignore small difference (kaldi-asr#3650)

* [doc] update FAQ page: (kaldi-asr#3651)

1. Added some new FAQ candiates from mailing lists.
2. Added section for logo, version and books recommendation for beginners.

* [scripts] Modify split_data.sh to split data evenly when utt2dur exists (kaldi-asr#3653)

* [doc] update FAQ page: added section for free dataset, python wrapper, etc. (kaldi-asr#3652)

* [src] Some CUDA i-vector fixes (kaldi-asr#3660)

no longer assume we are starting at frame 0.  This is needed for eventual online decoding.

  src/cudafeat/online-ivector-feature-cuda.cc:
    Added code to use LU-decomposition for debugging.  Cholesky's is still used and works fine but this is a good test if something wrong is suspected in the solver.
    Added support for older toolkits but with a less efficient algorithm.

* [src] CUDA batched decoder pipeline fixes (kaldi-asr#3659)

bug fix: don't assume stride and number of colums is the same when packing matrix into a vector.

  src/cudadecoder/batched-threaded-nnet3-cuda-pipeline.cc:
    added additional sanity checking to better report errors in the field.
    no longer pinning memory for copying waves down.  This was causing consistency issues.  It is unclear why the code is not working and will continue to evaluate.
    This optimization doesn't add a lot of perf so we are disabling it for now.
    general cleanup
    fixed bug with tasks array being possibly being resized before being read.

  src/cudadecoderbin/batched-wav-nnet3-cuda.cc:
    Now outputting every iteration as a different lattice.  This way we can score every lattice and better ensure correctness of binary.
    clang-format (removing tabs)

* [egs] Small fix to aspire example: pass in num_data_reps (kaldi-asr#3665)

* [doc] Update FAQ page (kaldi-asr#3663)

1. Adde section of Android for Kaldi
2. list some useful scripts in data processing
3. Illustrate some common tools

* [src] Add batched xvector computation (kaldi-asr#3643)

* [src] CUDA decoder: write all output to single stream (kaldi-asr#3666)

.. instead of one per iteration.

We are seeing that small corpus are dominated by writer Open/Close.

A better solution is to modify the key with the iteration number and
then handle the modified keys in scoring.  Note that if you have just a
single iteration the key is not modified and thus behavior is as
expected.

* [egs] Fix sox command in multi-Updated sox command (kaldi-asr#3667)

* [src] Change data-type for hash in lattice alignment (avoid debugger complaints)  (kaldi-asr#3672)

* [egs] Two fixes to multi_en setup  (kaldi-asr#3676)

* [scripts] Fix misspelled variable in reverb script (kaldi-asr#3678)

* Update cudamatrix.dox (kaldi-asr#3674)

* [scripts] Add reduced-context option for TDNN-F layers. (kaldi-asr#3658)

* [egs] Make librispeech run_tdnn_discriminative.sh more efficient with disk (kaldi-asr#3685)

* [egs] Remove pitch from multi_cn nnet3 recipe (kaldi-asr#3686)

* [egs] multi_cn: clarify metric is CER (kaldi-asr#3687)

* [src,minor] Fix typo in comment in kaldi-holder.h (kaldi-asr#3691)

* [egs] update run_tdnn_discriminative.sh in librispeech recipe (kaldi-asr#3689)

* [egs] Aspire recipe: fixed typo in utterance and speaker prefix (kaldi-asr#3696)

* [src] cuda batched decoder, fix memory bugs (kaldi-asr#3697)

bug fix:  Pass token by reference so we can return an address of it.
Previously it was done by value which means we returned an address of
a locally scoped variable. This lead to memory corruptions.

re-add pinned memory (i.e. disable workaround)

Free memory on control thread, switch shared_ptr to unique_ptr, don't
reset unique ptr.

* [scripts] Change make_rttm.py to read/write files with UTF-8 encoding (kaldi-asr#3705)

* [src] Removing non-compiling paranoid asserts in nnet-computation-graph. (kaldi-asr#3709)

* [build] Fix gfortran package name for centos (kaldi-asr#3708)

* [scripts] Change the Python diagnostic scripts to accept non-ASCII UTF-8 phone set (kaldi-asr#3711)

* [scripts] Fix some issues in kaldi-asr#3653 in split_scp.pl (kaldi-asr#3710)

* [scripts] Fix 2 issues in nnet2->nnet3 model conversion script (kaldi-asr#886)(kaldi-asr#3713)

Parsing learning rates would fail if it was written in e-notation (e.g.
1e-5). Also fixes ivector-dim=0 which could result if const component
dim=0 in src nnet2 model.

* Removing changes to split_scp.pl (kaldi-asr#3717)

* [scripts] Improve how combine_ali_dirs.sh gets job-specific filenames (kaldi-asr#3720)

* [src] Add --debug-level=N configure option to control assertions (kaldi-asr#3690) (kaldi-asr#3700)

* [src] Adding some more feature extraction options (needed by some users..) (kaldi-asr#3724)

* [src,script,egs] Goodness of Pronunciation (GOP) (kaldi-asr#3703)

* [src] Making ivector extractor tolerate dim mismatch due to pitch (kaldi-asr#3727)

* Revert "[src] Making ivector extractor tolerate dim mismatch due to pitch (kaldi-asr#3727)" (kaldi-asr#3728)

This reverts commit 59255ae.

* [src] Fix NVCC compilation errors on Windows (kaldi-asr#3741)

* make nvcc+msvc happier when two-phase name lookup involved,
NOTE: in simple native case (CPU, without cuda), msvc is happy with TemplatedBase<T>::base_value, but nvcc is not...
* position of __restrict__ in msvc is restricted

* [build] Add CMake Build System as alternative to current Makefile-based build (kaldi-asr#3580)

* [scripts] Modify split_data_dir.sh and split_scp.pl to use utt2dur if present, to balance splits

* [scripts] fix slurm.pl error (kaldi-asr#3745)

* Revert "[scripts] Modify split_data_dir.sh and split_scp.pl to use utt2dur if present, to balance splits" (kaldi-asr#3746)

This reverts commit 1d0b267.

* [egs] Children's speech ASR recipe for cmu_kids and cslu_kids (kaldi-asr#3699)

* [src] Incremental determinization [cleaned up/rewrite] (kaldi-asr#3737)

* [scripts] Add scripts to create combine fmllr-tranform dirs(kaldi-asr#3752)

* [src] CUDA decoder: fix invalid-lattice error that happens in corner cases (kaldi-asr#3756)

* [egs] Add Chime 6 baseline system (kaldi-asr#3755)

* [scripts] Fix issue in copy_lat_dir.sh affecting combine_lat_dirs.sh (missing phones.txt) (kaldi-asr#3757)

* [src] Add missing #include, needed for CUDA decoder compilation on some platforms (kaldi-asr#3759)

* [scripts] fix bug in steps/data/reverberate_data_dir.py (kaldi-asr#3762)

* [src] CUDA allocator: fix order of next largest block (kaldi-asr#3739)

* [egs] some fixes in Chime6 baseline system (kaldi-asr#3763)

* changed scoring tool for diarization

* added comment for scoring

* fixing number of deletions, adding script to check DP result of the total errors is equivalent to the sum of the individual errors

* updated RESULTS for new diarization scoring

* outputing wer similar to compute-wer routine

* adding routine to select best LM weight and insertion penalty factor based on the development set

* updating results

* changing lang_chain to lang, minor fix

* adding all array option

* change in directory structure of scoring_kaldi_multispeaker to make it similar to scoring_kaldi

* removing test sets from run ivector script

* added ref RTTM creation

* making modifications for all array

* minor fix

* [src] CUDA decoding: add support for affine transforms to CUDA feature extractor (kaldi-asr#3764)

* [src] relax assertion constraint slightly (RE matrix orthonormalization) (kaldi-asr#3767)

* [src] CUDA decoder: fix bug in NumPendingTasks()  (kaldi-asr#3769)

* [src] Add options to select specific gpu, reuse cuda context (kaldi-asr#3750)

* [src] Move CheckAndFix to config struct (kaldi-asr#3749)

* [egs,scripts] Add recipes for CN-Celeb (kaldi-asr#3758)

* [src] CUDA decoder: remove unecessary sync that was added for debugging (kaldi-asr#3770)

* [src] CUDA decoder: shrink channel vectors instead of vector holding those vectors (kaldi-asr#3768)

* add include path

* update test
>>>
>>> import numpy as np
>>> a = k.FloatVector(10)
>>> b = np.array(a, copy_data = False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_data is an invalid argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be copy

naxingyu and others added 5 commits December 18, 2019 15:35
* add pybind for fst::StdVectorFst

* avoid recompilation.

* Wrap SequentialNnetChainExampleReader.

* Fix the bug for the Value() method of reader/writer.

We are using `__iter__` method in Python, which calls Next()
immediately when Value() is returned. This problem does not
exist in C++. We use copy semantics currently to solve this problem.
…tor and support zero-copying PyTorch GPU tensors (#92)

* refactor matrix/vector reader/writers and add `numpy()` to
matrix/vector.

Now there is no extra copy for SequentialReader and it is more
easier to convert a Matrix/Vector to a numpy array.

* add support for dlpack interface.

Now we can convert a float tensor from PyTorch to
kaldi::SubVector<float> and kaldi::SubMatrix<float>
with **zero** memory copy.

* Zero copy PyTorch GPU tensor to kaldi CuSubVector/CuSubMatrix.

Through the DLPack interface, it is possible now to zero copy
a PyTorch GPU tensor to construct a kaldi CuSubVector/CuSubMatrix.

Note that no **source** level dependencies on any DL frameworks.
@stale
Copy link

stale bot commented Jun 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale bot on the loose label Jun 19, 2020
@kkm000 kkm000 added the stale-exclude Stale bot ignore this issue label Jul 15, 2020
@stale stale bot removed the stale Stale bot on the loose label Jul 15, 2020
@kkm000
Copy link
Contributor

kkm000 commented Oct 2, 2021

@danpovey, is pybind11 still a thing?

@csukuangfj
Copy link
Contributor

@danpovey, is pybind11 still a thing?

Not sure. With the experience gained from k2, I think we can do a better job about wrapping kaldi to Python (e.g, integration with PyTorch).

@revdeluxe
Copy link

Not sure. With the experience gained from k2, I think we can do a better job about wrapping kaldi to Python (e.g, integration with PyTorch).

Rustlang has cargo and it's quick too. I ran into some runtime issues with Cmake last night. (June 7,2022)
I think wrapping it using rust's cargo would be faster in runtime and buildtime.(But i admit it's a learning curve. but at least errors are handled in-code.)

A suggestion nothing more nothing less;

@csukuangfj
Copy link
Contributor

Not sure. With the experience gained from k2, I think we can do a better job about wrapping kaldi to Python (e.g, integration with PyTorch).

Rustlang has cargo and it's quick too. I ran into some runtime issues with Cmake last night. (June 7,2022) I think wrapping it using rust's cargo would be faster in runtime and buildtime.(But i admit it's a learning curve. but at least errors are handled in-code.)

A suggestion nothing more nothing less;

Could you have a look at

We are putting more effort into the above two projects.

@revdeluxe
Copy link

Not sure. With the experience gained from k2, I think we can do a better job about wrapping kaldi to Python (e.g, integration with PyTorch).

Rustlang has cargo and it's quick too. I ran into some runtime issues with Cmake last night. (June 7,2022) I think wrapping it using rust's cargo would be faster in runtime and buildtime.(But i admit it's a learning curve. but at least errors are handled in-code.)
A suggestion nothing more nothing less;

Could you have a look at

We are putting more effort into the above two projects.

Ill see what i can share...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale-exclude Stale bot ignore this issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants