Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Draft of new neural net adaptation framework (will also involve script rewrite) #2913

Open
wants to merge 110 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 98 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
95c1b6a
[src] Add draft of interface for svd backprop thing.
danpovey Nov 15, 2018
68f703d
[src] Add interface to handle symmetric matrices in SvdRescaler.
danpovey Nov 18, 2018
96c5f70
[src] Modify interface of SvdRescaler, allow to fix singular values o…
danpovey Nov 18, 2018
fe8afee
[src] Extend interface of SvdRescalar
danpovey Nov 18, 2018
6ed9b7e
[src] Extend interface of SvdRescaler
danpovey Nov 18, 2018
087925d
[src] Modify interface of SvdRescaler
danpovey Nov 18, 2018
ed0c319
modification1
GaofengCheng Nov 19, 2018
d8c49c5
Merge branch 'svd_draft' of github.com:/danpovey/kaldi into svd_draft…
GaofengCheng Nov 19, 2018
23a522b
Add tempt code fot SvdRescaler
GaofengCheng Nov 19, 2018
d15d32b
[src] small fixes to SvdRescaler interface
danpovey Nov 19, 2018
df74407
[src] Small interface fix
danpovey Nov 19, 2018
5d106e5
[src] Adding draft of core DifferentiableFmllr code.
danpovey Nov 20, 2018
836c0ab
Merge branch 'svd_draft' of github.com:/danpovey/kaldi into svd_draft…
GaofengCheng Nov 20, 2018
52a6a31
[src] small changes
danpovey Nov 20, 2018
e8ffbbf
[src] Add more testing code
danpovey Nov 20, 2018
de65be2
fix
GaofengCheng Nov 20, 2018
8096d44
Merge branch 'svd_draft' of github.com:/danpovey/kaldi into svd_draft…
GaofengCheng Nov 20, 2018
afff449
fix
GaofengCheng Nov 20, 2018
96771f6
No compiling problem remaining
GaofengCheng Nov 20, 2018
51e3711
fix
GaofengCheng Nov 21, 2018
dfd9e5d
Add test
GaofengCheng Nov 21, 2018
ac0eaec
Merge pull request #56 from GaofengCheng/svd_draft_gaofeng_local
danpovey Nov 21, 2018
84f8d62
Fixes to SVD code
danpovey Nov 22, 2018
0ce47b6
[src] Further test functions added.
danpovey Nov 22, 2018
fd0fa94
[src] More fixes
danpovey Nov 22, 2018
91d5743
Further fixes.
danpovey Nov 22, 2018
35442a6
[src] Commit nearly-working version of differentiable fmllr code (var…
danpovey Nov 28, 2018
40f8fdc
Merge remote-tracking branch 'upstream/master' into svd_draft
danpovey Nov 28, 2018
49fb313
[src] Commit working version before making some changes.
danpovey Nov 29, 2018
0d11246
[src] Rework math for differentiable fMLLR for greater efficiency.
danpovey Dec 1, 2018
da36067
[src] Add MeanOnlyTransformEstimator and tests for it; misc. fixes.
danpovey Dec 5, 2018
e265781
[Small fixes to DifferentiableTransform stuff.]
danpovey Dec 5, 2018
8263826
Merge remote-tracking branch 'upstream/master' into svd_draft
danpovey Dec 5, 2018
a974225
[src] more drafting interaces
danpovey Dec 5, 2018
e06e51e
[src] Move around sources, create adapt/ dir.
danpovey Dec 5, 2018
aa88ec0
[src] some fixes after moving things.
danpovey Dec 5, 2018
2d50044
[src] Various changes, not tested
danpovey Dec 10, 2018
12894e7
[src] Add testing code for transforms; fix bug in variance estimation…
danpovey Dec 12, 2018
3b1351f
[src] Add more testing code; more bug fixes.
danpovey Dec 13, 2018
f4a4f6f
[src]Add new constructor for SubPosterior
danpovey Dec 13, 2018
1210afb
[src] Add missing file; make overrides consistent.
danpovey Dec 15, 2018
88890a2
[src,egs] Documentation updates
danpovey Dec 15, 2018
a52a6fc
[src] Add some unfinished sources
danpovey Dec 15, 2018
4568b4c
[src] More progress on nnet3a training code.
danpovey Dec 20, 2018
88226e3
[src] More code progress.
danpovey Dec 21, 2018
9ba4e83
[src] Make sure differentiable-transform TrainingBackward() adds to i…
danpovey Dec 25, 2018
0254935
Implement chain numerator posteriors
hhadian Dec 29, 2018
80e0f73
Support merging already-merged egs
hhadian Dec 29, 2018
106e872
Merge pull request #59 from hhadian/chaina2
danpovey Dec 29, 2018
36323a8
[src] more bug-fixes, finish more code
danpovey Dec 25, 2018
c049d11
[src] Updates to chaina training code.
danpovey Dec 28, 2018
e9b1e8b
Small change to make sure numerator post is always computed if requested
hhadian Dec 29, 2018
66b9c96
Typo fix
hhadian Dec 29, 2018
68d3851
Merge pull request #58 from hhadian/chaina1
danpovey Dec 29, 2018
67ba162
Add --long-key option for nnet3-chain-get-egs
hhadian Dec 30, 2018
85ce74c
Merge pull request #60 from hhadian/chaina2
danpovey Dec 30, 2018
902a5ab
Compute numerator posteriors in forward-backward
hhadian Jan 1, 2019
b3343fb
Merge pull request #61 from hhadian/chaina2
danpovey Jan 1, 2019
d23dfe2
Add context extension capability to nnet3-chain-copy-egs with merged-…
hhadian Jan 1, 2019
e3db519
Support merged-egs when removing context
hhadian Jan 2, 2019
0f13be7
Merge pull request #62 from hhadian/chaina2
danpovey Jan 3, 2019
04e6dc9
Incorporate code for keeping track of the tree map.
danpovey Dec 30, 2018
97d6a75
[src,scripts,egs] Further progress
danpovey Jan 4, 2019
0b443b2
Add the --use-query-string option to nnet3-chain-merge-egs
hhadian Jan 4, 2019
59445e7
Merge pull request #64 from hhadian/chaina2
danpovey Jan 4, 2019
195168f
[scripts] More documentation for choose_egs_to_merge.py
danpovey Jan 4, 2019
b96e80c
[scripts] Further progress on chaina scripts
danpovey Jan 4, 2019
c952053
Merge remote-tracking branch 'upstream/master' into svd_draft
danpovey Jan 5, 2019
f6087c9
[src,scripts,egs] Further progress; some renaming.
danpovey Jan 6, 2019
08fdf02
[scripts] Further progress on validation scripts
danpovey Jan 6, 2019
b337595
Implement choose_egs_to_merge.py
hhadian Jan 6, 2019
9e684b7
Merge branch 'svd_draft' of https://github.com/danpovey/kaldi into ch…
hhadian Jan 6, 2019
c0eab12
Remove comments
hhadian Jan 6, 2019
8e894b1
Some cleanup
hhadian Jan 6, 2019
53f33d9
Merge pull request #66 from hhadian/chaina2
danpovey Jan 6, 2019
00ef41e
[scripts,egs] Small fixes/progress
danpovey Jan 6, 2019
0f3adb8
Some bugfixes
hhadian Jan 6, 2019
71a3702
Merge pull request #67 from hhadian/chaina2
danpovey Jan 6, 2019
eeacc97
[src] Add missing files
danpovey Jan 6, 2019
8cc8068
[src] Add missing files
danpovey Jan 6, 2019
97f295c
minor bugfixes
hhadian Jan 6, 2019
80438f8
Merge pull request #68 from hhadian/chaina2
danpovey Jan 6, 2019
8ba1f82
more small fixes
hhadian Jan 7, 2019
f057108
Merge pull request #69 from hhadian/chaina2
danpovey Jan 7, 2019
8becfd5
[egs,scripts] small fix; add docs.
danpovey Jan 7, 2019
795ec74
More fixes
hhadian Jan 7, 2019
22799ef
Minor fix
hhadian Jan 7, 2019
bae22f7
Merge pull request #70 from hhadian/chaina2
danpovey Jan 7, 2019
bce05d2
Merge remote-tracking branch 'upstream/master' into svd_draft
danpovey Jan 12, 2019
003982d
[src,scripts,egs] Further progress
danpovey Jan 13, 2019
d827df7
further fixes
danpovey Jan 14, 2019
7617a8a
Implement get_train_schedule.py
hhadian Jan 14, 2019
95f967b
Minor changes
hhadian Jan 14, 2019
42bc125
[src] Various bug fixes
danpovey Jan 15, 2019
20034ac
Merge pull request #75 from hhadian/chaina3
danpovey Jan 15, 2019
8cad240
[src,scripts,egs] Various fixes; it trains now.
danpovey Jan 16, 2019
52391c2
[scripts] Add missing file
danpovey Jan 18, 2019
78de4bb
[scripts] Fixes to bugs found by Gaofeng
danpovey Jan 18, 2019
5aabf44
[src,scripts] Finish some code to estimate target model at end of tra…
danpovey Jan 18, 2019
bb448ff
[scripts,egs] Further progress; finish SI decoding
danpovey Jan 19, 2019
d5c9622
[src,scripts,egs] Further progress; 1st actual decoding
danpovey Jan 19, 2019
315a5cb
[src,egs] Fix more bugs
danpovey Jan 19, 2019
9a3e894
[scripts,src,egs] Fix various bugs, still tuning.
danpovey Jan 20, 2019
4481699
[src,scripts,egs] Refactor the command line options; add more tuning …
danpovey Jan 21, 2019
d130837
[egs,scripts,src] Add model combination; and LM rescoring in the egs
danpovey Jan 23, 2019
d204bca
Set random seed in choose_egs_to_merge.py
hhadian Jan 26, 2019
4521408
Merge pull request #76 from hhadian/chaina3
danpovey Jan 26, 2019
093290b
[src] Bug fix; disable freezing of NG (which seems to hurt)
danpovey Jan 29, 2019
50eb3c4
[scripts] chaina training script fix
danpovey Jan 29, 2019
eedc9d1
Merge remote-tracking branch 'origin/svd_draft' into svd_draft
danpovey Jan 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions egs/mini_librispeech/s5/cmd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.

# in future I'd like to start using just one $cmd variable.
export cmd="queue.pl --mem 2G"
export train_cmd="queue.pl --mem 2G"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"
14 changes: 14 additions & 0 deletions egs/mini_librispeech/s5/conf/mfcc_hires2.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# config for high-resolution MFCC features, intended for 'chaina' neural network
# training. These '..2.conf' setups are intended to have the --modified=true
# configuration value.

# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false # use average of log energy, not energy.
# Will soon add: --modified=true
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
# there might be some information at the low end.
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)
76 changes: 76 additions & 0 deletions egs/mini_librispeech/s5/local/chaina/data_prep_common.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash

set -euo pipefail

# This script is called from local/chaina/run_tdnn.sh and
# similar scripts. It contains the common feature preparation and
# lattice-alignment preparation parts of the chaina training.
# See those scripts for examples of usage.

stage=0
train_set=train_clean_5
test_sets="dev_clean_2"
gmm=tri3b

. ./cmd.sh
. ./path.sh
. utils/parse_options.sh

gmm_dir=exp/${gmm}
ali_dir=exp/${gmm}_ali_${train_set}_sp

for f in data/${train_set}/feats.scp ${gmm_dir}/final.mdl; do
if [ ! -f $f ]; then
echo "$0: expected file $f to exist"
exit 1
fi
done

# Our default data augmentation method is 3-way speed augmentation followed by
# volume perturbation. We are looking into better ways of doing this,
# e.g. involving noise and reverberation.

if [ $stage -le 1 ]; then
# Although the nnet will be trained by high resolution data, we still have to
# perturb the normal data to get the alignment. _sp stands for speed-perturbed
echo "$0: preparing directory for low-resolution speed-perturbed data (for alignment)"
utils/data/perturb_data_dir_speed_3way.sh data/${train_set} data/${train_set}_sp
echo "$0: making MFCC features for low-resolution speed-perturbed data"
steps/make_mfcc.sh --cmd "$train_cmd" --nj 10 data/${train_set}_sp || exit 1;
steps/compute_cmvn_stats.sh data/${train_set}_sp || exit 1;
utils/fix_data_dir.sh data/${train_set}_sp
fi

if [ $stage -le 2 ]; then
echo "$0: aligning with the perturbed low-resolution data"
steps/align_fmllr.sh --nj 20 --cmd "$train_cmd" \
data/${train_set}_sp data/lang $gmm_dir $ali_dir || exit 1
fi

if [ $stage -le 3 ]; then
# Create high-resolution MFCC features (with 40 cepstra instead of 13).
# this shows how you can split across multiple file-systems.
echo "$0: creating high-resolution MFCC features"
mfccdir=data/${train_set}_sp_hires/data
if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $mfccdir/storage ]; then
utils/create_split_dir.pl /export/fs0{1,2}/$USER/kaldi-data/mfcc/mini_librispeech-$(date +'%m_%d_%H_%M')/s5/$mfccdir/storage $mfccdir/storage
fi

for datadir in ${train_set}_sp ${test_sets}; do
utils/copy_data_dir.sh data/$datadir data/${datadir}_hires
done

# do volume-perturbation on the training data prior to extracting hires
# features; this helps make trained nnets more invariant to test data volume.
utils/data/perturb_data_dir_volume.sh data/${train_set}_sp_hires || exit 1;

for datadir in ${train_set}_sp ${test_sets}; do
steps/make_mfcc.sh --nj 10 --mfcc-config conf/mfcc_hires.conf \
--cmd "$train_cmd" data/${datadir}_hires || exit 1;
steps/compute_cmvn_stats.sh data/${datadir}_hires || exit 1;
utils/fix_data_dir.sh data/${datadir}_hires || exit 1;
done
fi


exit 0
Loading