26 Oct 16:34

erogol

fa4ec83

v0.4.0

🐸 v0.4.0

Update multi-speaker training API.
VCTK recipes for all the TTS models.
Documentation for multi-speaker training.
Pre-trained Ukrainian GlowTTS model from 👑 https://github.com/robinhad/ukrainian-tts
Pre-trained FastPitch VCTK model
Dataset downloaders for LJSpeech and VCTK under TTS.utils.downloaders
Documentation reformatting.
Trainer V2 and compact. updates in model implementations.

This update makes the Trainer V2 responsible for only the training of a model. The rest is excluded from the trainer and they need to be done either in the model or before calling the trainer.

Try out new models

Pre-trained FastPitch VCTK model

tts --model_name tts_models/en/vctk/fast_pitch --text "This is my sample text to voice." --speaker_idx VCTK_p229

Pre-trained Ukrainian GlowTTS model from 👑 https://github.com/robinhad/ukrainian-tts

tts --model_name tts_models/uk/mai/glow-tts --text "Це зразок тексту, щоб спробувати нашу модель."

Assets 6

4 Join discussion

17 Sep 23:55

erogol

v0.3.1

7f388f2

v0.3.1

Mostly fixes for GlowTTS training.

Assets 2

13 Sep 12:13

erogol

v0.3.0

0592a58

v0.3.0

🐸 v0.3.0

New `ForwardTTS` implementation.

This version implements a new ForwardTTS interface that can be configured as any feed-forward TTS model that uses a duration predictor at inference time. Currently, we provide 3 pre-configured models and plan to implement one more.

SpeedySpeech
FastSpeech
FastPitch
FastSpeech 2 (TODO)

Through this API, any model can be trained in two ways. Either using pre-computed durations from a pre-trained Tacotron model or using an alignment network to learn durations from the dataset. The alignment network is only used at training and discarded at inference. You can set which mode you want to use by just setting the use_aligner field in the configuration.

This new API will help us to design more efficient inference run-time for all these models using ONNX like run-time optimizers.

Old FastPitch and SpeedySpeech implementations are deprecated for the sake of this new implementation.

Fine-Tuning Documentation

This version introduces documentation for model fine-tunning. You can see it under https://tts.readthedocs.io/ when this is merged.

New Model Releases

English Speedy Speech model on LJSpeech

Try out:

tts --text "This is a sample text for my model to speak." --model_name tts_models/en/ljspeech/speedy-speech

Fine-tuned UnivNet Vocoder

Try out:

tts --text "This is how it is." --model_name tts_models/en/ljspeech/tacotron2-DDC_ph

Assets 4

06 Sep 17:25

erogol

v0.2.2

dc2ace3

v0.2.2

🐸 v0.2.2

FastPitch model with an Aligner Network is implemented with other changes accompanying it.

Alignment Network: https://arxiv.org/abs/2108.10447
Fast Pitch Model: https://arxiv.org/abs/2006.06873

Thanks to 👑 @kaiidams for his Japanese g2p update.

Try FastPitch model:

tts --model_name tts_models/en/ljspeech/fast_pitch --text "This is my sample text to voice."

Contributors

kaiidams

Assets 3

0 Join discussion

31 Aug 10:39

erogol

v0.2.1

5793dca

v0.2.1

🐸 v0.2.1

🐞Bug Fixes

Fix distributed training and solve compact issues with the Trainer API.
Fix bugs in the VITS model implementation that caused training instabilities.
Fix some Abstract Class usage issues in WaveRNN and WaveGrad models.

💾 Code updates

Use a single gradient scaler for all the optimizers in TrainerAPI. Previously, we used one scaler per optimizer.

🏃‍♀️Operational Updates

Update to Pylint 2.10.2

Thanks to 👑 @fijipants for his fixes 🛠️
Thanks to 👑 @agrinh for his flag and discussion in DDP issues

Contributors

agrinh and fijipants

Assets 2

11 Aug 08:43

erogol

v0.2.0

c308226

v0.2.0

🐸 v0.2.0

🐞Bug Fixes

Fix phoneme pre-compute issue.
Fix multi-speaker setup in Tacotron models.
Fix small issues in the Trainer regarding multi-optimizer training.

💾 Code updates

W&B integration for model logging and experiment tracking, (👑 @AyushExel)
Code uses the Tensorboard by default. For W&B, you need to set log_dashboard option in the config and define project_name and wandb_entity.
Use ffsspec for model saving/loading (👑 @agrinh)
Allow models to define their own symbol list with in-class make_symbols()
Allow choosing after epoch or after step LR scheduler update with scheduler_after_epoch.
Make converting spectrogram from amplitude to DB optional with do_amp_to_db_linear and do_amp_to_db_linear options.

🗒️ Docs updates

Add GlowTTS and VITS docs.

🤖 Model implementations

VITS implementation with pre-trained models (https://arxiv.org/abs/2106.06103)

🚀 Model releases

vocoder_models--ja--kokoro--hifigan_v1 (👑 @kaiidams)

HiFiGAN model trained on Kokoro dataset to complement the existing Japanese model.

Try it out:
```
tts --model_name tts_models/ja/kokoro/tacotron2-DDC --text "こんにちは、今日はいい天気ですか？"
```
tts_models--en--ljspeech--tacotronDDC_ph

TacotronDDC with phonemes trained on LJSpeech. It is to fix the pronunciation errors caused by the raw text
in the released TacotronDDC model.

Try it out:
```
tts --model_name tts_models/en/ljspeech/tacotronDDC_ph --text "hello, how are you today?"
```

tts_models--en--ljspeech--vits

VITS model trained on LJSpeech.

Try it out:

tts --model_name tts_models/en/ljspeech/vits --text "hello, how are you today?"

tts_models--en--vctk--vits

VITS model trained on VCTK with multi-speaker support.

Try it out:
```
tts-server --model_name tts_models/en/vctk/vits     
```
vocoder_models--en--ljspeech--univnet

UnivNet model trained on LJSpeech to complement the TacotronDDC model above.

Try it out:
```
tts --model_name tts_models/en/ljspeech/tacotronDDC_ph --text "hello, how are you today?"
```

Contributors

agrinh, AyushExel, and kaiidams

Assets 7

5 Join discussion

26 Jul 16:37

erogol

v0.1.3

0fc9f38

v0.1.3

🐸 v0.1.3

🐞Bug Fixes

Fix Tacotron stopnet training

Models trained after v0.1 had the problem that the stopnet was not trained. It caused models not to generate audio
at evaluation and inference time.
Fix test_run at training. (👑 @WeberJulian)

In training 🐸 TTS would skip the test_run and not generate test audio samples. Now it is fixed :).
Fix server.py for multi-speaker models.

💾 Code updates

Refactoring in compute_embeddings.py for efficiency and compatibility with the latest speaker encoder. (👑 @Edresson)

🚀 Model releases

New Fullband-MelGAN model for Thorsten German dataset. (👑 @thorstenMueller)

Try it:

tts --model_name tts_models/de/thorsten/tacotron2-DCA --text "Was geschehen ist geschehen, es ist geschichte."

Contributors

thorstenMueller, WeberJulian, and Edresson

Assets 3

0 Join discussion

06 Jul 13:26

erogol

v0.1.2

8fbadad

v0.1.2

Bump up to v0.1.2

Assets 3

06 Jul 08:32

erogol

v0.1.1

676d22f

v0.1.1

Fix #607 and #608

Assets 2

03 Jul 12:18

erogol

v0.1.0

c25a218

v0.1.0

🐸 v0.1.0

In a nutshell, there are a ton of updates in this release. I don't know if we can cover them all here but let's try.

After this release, 🐸 TTS stands on the following architecture.

Trainer API for training.
Synthesizer API for inference.
ModelManager API for managing 🐸TTS model zoo.
SpeakerManager API for managing speakers in a multi-speaker setting.
(TBI) Exporter API for exporting models to ONNX, TorchScript, etc.
(TBI) Data Processing API for making a dataset ready for training.
Model API for implementing models, compatible with all the other components above.

Updates

💾 Code updates

Brand new Trainer API

We unified all the training code in a lightweight but feature complete Trainer API. From now on all the 🐸TTS
models will use this new API for training.

It provides mixed precision (with Nvidia's APEX of torch.amp) and multi-gpu training for all the models.
Brand new Model API

Abstract BaseModel and its BaseTTS, BaseVocoder child classes are used as the basis of the 🐸TTS models now.
Any model that implements one of these classes, works seamlessly with the Trainer and Synthesizer.
Brand new 🐸TTS recipes.

We decided to merge the recipes to the main project. Now we host recipes for the LJspeech dataset, covering all the implemented models.
So you can pick the model you want, change the parameters, and train your own model easily.

Thanks to the new Trainer API and 👩‍✈️Coqpit integration, we could implement these recipes with pure python.
Updates SpeakerManager API

TTS.utilsSpeakerManager is now the core unit to manage speakers in a multi-speaker model and interface a SpeakerEncoder model with the tts and vocoder models.
Updated model training mechanics.

You can now use pure Python to define your model and run the training. It is useful to train models on a Jupyter
Notebook or the other python environments.

We also keep the old mechanics by using TTS/bin/train_tts.py or ``TTS/bin/train_vocoder.py`. You just need to
change the previous training script name with one of these two based on your model.
```
python TTS/bin/train_tacotron.py --config_path config.json
```
becomes
```
python TTS/bin/train_tts.py --config_path config.json
```
Use 👩‍✈️Coqpit for managing model class arguments.

Now all the model arguments are defined in a coqpit class and imported by the model config.
gruut based character to phoneme conversion. (👑 @synesthesiam)

As a drop-in replacement for the previous solution that is compatible with the released models. So now all these
models are functional again without version nitpicking.
Set test_sentences in the config rather than providing a txt file.
Set the maximum number of decoder steps of Tacotron1-2 models in the config.

🏃‍♀️ Operational Updates

FINALLY DOCUMENTATION!! https://tts.readthedocs.io
Enable support for Python 3.9
Changes for PyTorch 1.9.0

🏅 Model implementations

Univnet GAN Vocoder: https://arxiv.org/pdf/2106.07889.pdf (👑 @rishikksh20)

🚀 Model releases

We solved the compat issues and re-release some of the models. You can see them in the released binaries section.

You don't need to change anything. If you use v0.1.0, by default, it uses these new models.

Assets 5

1 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐸 v0.4.0

Try out new models

🐸 v0.3.0

New `ForwardTTS` implementation.

Fine-Tuning Documentation

New Model Releases

🐸 v0.2.2

Contributors

🐸 v0.2.1

🐞Bug Fixes

💾 Code updates

🏃‍♀️Operational Updates

Contributors

🐸 v0.2.0

🐞Bug Fixes

💾 Code updates

🗒️ Docs updates

🤖 Model implementations

🚀 Model releases

Try it out:

Try it out:

Try it out:

Try it out:

Try it out:

Contributors

🐸 v0.1.3

🐞Bug Fixes

💾 Code updates

🚀 Model releases

Try it:

Contributors

🐸 v0.1.0

Updates

💾 Code updates

🏃‍♀️ Operational Updates

🏅 Model implementations

🚀 Model releases

Releases: coqui-ai/TTS

v0.4.0

🐸 v0.4.0

Try out new models

v0.3.1

v0.3.0

🐸 v0.3.0

New ForwardTTS implementation.

Fine-Tuning Documentation

New Model Releases

v0.2.2

🐸 v0.2.2

Contributors

v0.2.1

🐸 v0.2.1

🐞Bug Fixes

💾 Code updates

🏃‍♀️Operational Updates

Contributors

v0.2.0

🐸 v0.2.0

🐞Bug Fixes

💾 Code updates

🗒️ Docs updates

🤖 Model implementations

🚀 Model releases

Try it out:

Try it out:

Try it out:

Try it out:

Try it out:

Contributors

v0.1.3

🐸 v0.1.3

🐞Bug Fixes

💾 Code updates

🚀 Model releases

Try it:

Contributors

v0.1.2

v0.1.1

v0.1.0

🐸 v0.1.0

Updates

💾 Code updates

🏃‍♀️ Operational Updates

🏅 Model implementations

🚀 Model releases

New `ForwardTTS` implementation.