Skip to content

Releases: coqui-ai/TTS

v0.4.0

26 Oct 16:34
Compare
Choose a tag to compare

🐸 v0.4.0

  • Update multi-speaker training API.

  • VCTK recipes for all the TTS models.

  • Documentation for multi-speaker training.

  • Pre-trained Ukrainian GlowTTS model from 👑 https://github.com/robinhad/ukrainian-tts

  • Pre-trained FastPitch VCTK model

  • Dataset downloaders for LJSpeech and VCTK under TTS.utils.downloaders

  • Documentation reformatting.

  • Trainer V2 and compact. updates in model implementations.

    This update makes the Trainer V2 responsible for only the training of a model. The rest is excluded from the trainer and they need to be done either in the model or before calling the trainer.

Try out new models

  • Pre-trained FastPitch VCTK model
tts --model_name tts_models/en/vctk/fast_pitch --text "This is my sample text to voice." --speaker_idx VCTK_p229
tts --model_name tts_models/uk/mai/glow-tts --text "Це зразок тексту, щоб спробувати нашу модель."  

v0.3.1

17 Sep 23:55
Compare
Choose a tag to compare

Mostly fixes for GlowTTS training.

v0.3.0

13 Sep 12:13
0592a58
Compare
Choose a tag to compare

🐸 v0.3.0

New ForwardTTS implementation.

This version implements a new ForwardTTS interface that can be configured as any feed-forward TTS model that uses a duration predictor at inference time. Currently, we provide 3 pre-configured models and plan to implement one more.

  1. SpeedySpeech
  2. FastSpeech
  3. FastPitch
  4. FastSpeech 2 (TODO)

Through this API, any model can be trained in two ways. Either using pre-computed durations from a pre-trained Tacotron model or using an alignment network to learn durations from the dataset. The alignment network is only used at training and discarded at inference. You can set which mode you want to use by just setting the use_aligner field in the configuration.

This new API will help us to design more efficient inference run-time for all these models using ONNX like run-time optimizers.

Old FastPitch and SpeedySpeech implementations are deprecated for the sake of this new implementation.

Fine-Tuning Documentation

This version introduces documentation for model fine-tunning. You can see it under https://tts.readthedocs.io/ when this is merged.

New Model Releases

  • English Speedy Speech model on LJSpeech

Try out:

tts --text "This is a sample text for my model to speak." --model_name tts_models/en/ljspeech/speedy-speech
  • Fine-tuned UnivNet Vocoder

Try out:

tts --text "This is how it is." --model_name tts_models/en/ljspeech/tacotron2-DDC_ph 

v0.2.2

06 Sep 17:25
dc2ace3
Compare
Choose a tag to compare

🐸 v0.2.2

FastPitch model with an Aligner Network is implemented with other changes accompanying it.

Thanks to 👑 @kaiidams for his Japanese g2p update.

Try FastPitch model:

tts --model_name tts_models/en/ljspeech/fast_pitch --text "This is my sample text to voice."

v0.2.1

31 Aug 10:39
5793dca
Compare
Choose a tag to compare

🐸 v0.2.1

🐞Bug Fixes

  • Fix distributed training and solve compact issues with the Trainer API.
  • Fix bugs in the VITS model implementation that caused training instabilities.
  • Fix some Abstract Class usage issues in WaveRNN and WaveGrad models.

💾 Code updates

  • Use a single gradient scaler for all the optimizers in TrainerAPI. Previously, we used one scaler per optimizer.

🏃‍♀️Operational Updates

  • Update to Pylint 2.10.2

Thanks to 👑 @fijipants for his fixes 🛠️
Thanks to 👑 @agrinh for his flag and discussion in DDP issues

v0.2.0

11 Aug 08:43
Compare
Choose a tag to compare

🐸 v0.2.0

🐞Bug Fixes

  • Fix phoneme pre-compute issue.
  • Fix multi-speaker setup in Tacotron models.
  • Fix small issues in the Trainer regarding multi-optimizer training.

💾 Code updates

  • W&B integration for model logging and experiment tracking, (👑 @AyushExel)
    Code uses the Tensorboard by default. For W&B, you need to set log_dashboard option in the config and define project_name and wandb_entity.
  • Use ffsspec for model saving/loading (👑 @agrinh)
  • Allow models to define their own symbol list with in-class make_symbols()
  • Allow choosing after epoch or after step LR scheduler update with scheduler_after_epoch.
  • Make converting spectrogram from amplitude to DB optional with do_amp_to_db_linear and do_amp_to_db_linear options.

🗒️ Docs updates

  • Add GlowTTS and VITS docs.

🤖 Model implementations

🚀 Model releases

  • vocoder_models--ja--kokoro--hifigan_v1 (👑 @kaiidams)

    HiFiGAN model trained on Kokoro dataset to complement the existing Japanese model.

    Try it out:

    tts --model_name tts_models/ja/kokoro/tacotron2-DDC --text "こんにちは、今日はいい天気ですか?"
  • tts_models--en--ljspeech--tacotronDDC_ph

    TacotronDDC with phonemes trained on LJSpeech. It is to fix the pronunciation errors caused by the raw text
    in the released TacotronDDC model.

    Try it out:

    tts --model_name tts_models/en/ljspeech/tacotronDDC_ph --text "hello, how are you today?"
  • tts_models--en--ljspeech--vits

    VITS model trained on LJSpeech.

    Try it out:

    tts --model_name tts_models/en/ljspeech/vits --text "hello, how are you today?"
  • tts_models--en--vctk--vits

    VITS model trained on VCTK with multi-speaker support.

    Try it out:

    tts-server --model_name tts_models/en/vctk/vits     
  • vocoder_models--en--ljspeech--univnet

    UnivNet model trained on LJSpeech to complement the TacotronDDC model above.

    Try it out:

    tts --model_name tts_models/en/ljspeech/tacotronDDC_ph --text "hello, how are you today?"

v0.1.3

26 Jul 16:37
Compare
Choose a tag to compare

🐸 v0.1.3

🐞Bug Fixes

  • Fix Tacotron stopnet training

    Models trained after v0.1 had the problem that the stopnet was not trained. It caused models not to generate audio
    at evaluation and inference time.

  • Fix test_run at training. (👑 @WeberJulian)

    In training 🐸 TTS would skip the test_run and not generate test audio samples. Now it is fixed :).

  • Fix server.py for multi-speaker models.

💾 Code updates

  • Refactoring in compute_embeddings.py for efficiency and compatibility with the latest speaker encoder. (👑 @Edresson)

🚀 Model releases

  • New Fullband-MelGAN model for Thorsten German dataset. (👑 @thorstenMueller)

Try it:

tts --model_name tts_models/de/thorsten/tacotron2-DCA --text "Was geschehen ist geschehen, es ist geschichte." 

v0.1.2

06 Jul 13:26
Compare
Choose a tag to compare
Bump up to v0.1.2

v0.1.1

06 Jul 08:32
676d22f
Compare
Choose a tag to compare

v0.1.0

03 Jul 12:18
Compare
Choose a tag to compare

🐸 v0.1.0

In a nutshell, there are a ton of updates in this release. I don't know if we can cover them all here but let's try.

After this release, 🐸 TTS stands on the following architecture.

  • Trainer API for training.
  • Synthesizer API for inference.
  • ModelManager API for managing 🐸TTS model zoo.
  • SpeakerManager API for managing speakers in a multi-speaker setting.
  • (TBI) Exporter API for exporting models to ONNX, TorchScript, etc.
  • (TBI) Data Processing API for making a dataset ready for training.
  • Model API for implementing models, compatible with all the other components above.

Updates

💾 Code updates

  • Brand new Trainer API

    We unified all the training code in a lightweight but feature complete Trainer API. From now on all the 🐸TTS
    models will use this new API for training.

    It provides mixed precision (with Nvidia's APEX of torch.amp) and multi-gpu training for all the models.

  • Brand new Model API

    Abstract BaseModel and its BaseTTS, BaseVocoder child classes are used as the basis of the 🐸TTS models now.
    Any model that implements one of these classes, works seamlessly with the Trainer and Synthesizer.

  • Brand new 🐸TTS recipes.

    We decided to merge the recipes to the main project. Now we host recipes for the LJspeech dataset, covering all the implemented models.
    So you can pick the model you want, change the parameters, and train your own model easily.

    Thanks to the new Trainer API and 👩‍✈️Coqpit integration, we could implement these recipes with pure python.

  • Updates SpeakerManager API

    TTS.utilsSpeakerManager is now the core unit to manage speakers in a multi-speaker model and interface a SpeakerEncoder model with the tts and vocoder models.

  • Updated model training mechanics.

    You can now use pure Python to define your model and run the training. It is useful to train models on a Jupyter
    Notebook or the other python environments.

    We also keep the old mechanics by using TTS/bin/train_tts.py or ``TTS/bin/train_vocoder.py`. You just need to
    change the previous training script name with one of these two based on your model.

    python TTS/bin/train_tacotron.py --config_path config.json

    becomes

    python TTS/bin/train_tts.py --config_path config.json
  • Use 👩‍✈️Coqpit for managing model class arguments.

    Now all the model arguments are defined in a coqpit class and imported by the model config.

  • gruut based character to phoneme conversion. (👑 @synesthesiam)

    As a drop-in replacement for the previous solution that is compatible with the released models. So now all these
    models are functional again without version nitpicking.

  • Set test_sentences in the config rather than providing a txt file.

  • Set the maximum number of decoder steps of Tacotron1-2 models in the config.

🏃‍♀️ Operational Updates

🏅 Model implementations

🚀 Model releases

We solved the compat issues and re-release some of the models. You can see them in the released binaries section.

You don't need to change anything. If you use v0.1.0, by default, it uses these new models.