[Bug] Num. instances discarded samples: 252. VITS finetuning #3347

nicholasguimaraes · 2023-12-01T01:37:52Z

Describe the bug

Hello, I'm finetuning the VITS model on my 979 audio dataset.

I have followed this tutorial (https://tts.readthedocs.io/en/latest/formatting_your_dataset.html) on how to format my dataset and it looks just fine.

I am able to start training but right before the first epoch I see the log:

DataLoader initialization
| > Tokenizer:
| > add_blank: True
| > use_eos_bos: False
| > use_phonemes: False
| > Number of instances : 979
| > Preprocessing samples
| > Max text length: 94
| > Min text length: 15
| > Avg text length: 50.786795048143055
|
| > Max audio length: 223959.0
| > Min audio length: 46143.0
| > Avg audio length: 144398.36176066025
| > Num. instances discarded samples: 252
| > Batch group size: 0.

My first thought was to check the min/mix permited length of audio samples on my config.json file.

My minimal desired length is of 16.000 (about 1 second) but after editing it in the config.json file it does not seem to have any effect in the code since the log still prints Min audio length: 46143.0.

What is even funnier is that 46143.0 is about 2 seconds and I only have 77 audio samples smaller than 2 seconds so I'm not really sure what happens for the network to discart 252 samples.

Any insights are very welcomed!

To Reproduce

To reproduce the error you need to install coqui-ai TTS and set up a custom dataset for finetuning (ljspeech format).

After that run: python train_tts.py --config_path tts_models--pt--cv--vits/config.json --restore_path tts_models--pt--cv--vits/model_file.pth.tar

Expected behavior

No response

Logs

No response

Environment

TTS 0.20.6
torch gpu 2.1.0
cuda 12.0
nvidia driver 527.41

Additional context

No response

Edresson · 2023-12-06T19:17:48Z

Hi @nicholasguimaraes,

What is the audio format that you are using for training?

I think the issues is that the current audio len computation only supports 16 bits wave files. If it is a different audio format it computes the length wrongly. I fixed this issue on the PR #3092

@erogol I think we should merge #3092 as soon as possible to avoid issues like this one.

nicholasguimaraes added the bug Something isn't working label Dec 1, 2023

Edresson self-assigned this Dec 6, 2023

nicholasguimaraes closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Num. instances discarded samples: 252. VITS finetuning #3347

[Bug] Num. instances discarded samples: 252. VITS finetuning #3347

nicholasguimaraes commented Dec 1, 2023

Edresson commented Dec 6, 2023 •

edited

Loading

[Bug] Num. instances discarded samples: 252. VITS finetuning #3347

[Bug] Num. instances discarded samples: 252. VITS finetuning #3347

Comments

nicholasguimaraes commented Dec 1, 2023

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Edresson commented Dec 6, 2023 • edited Loading

Edresson commented Dec 6, 2023 •

edited

Loading