Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] #127

Closed
mushahid-intesum opened this issue Oct 16, 2023 · 0 comments · Fixed by #131 or #135 · May be fixed by #130
Closed

[Bug] #127

mushahid-intesum opened this issue Oct 16, 2023 · 0 comments · Fixed by #131 or #135 · May be fixed by #130
Labels
bug Something isn't working

Comments

@mushahid-intesum
Copy link

Describe the bug

Error when starting model training from checkpoint in Coqui TTS
When saved as a checkpoint for later training, the last training and eval losses are saved as in dict. When training from scratch, the last training loss is saved as a float. Hence, starting from a checkpoint doesn't run the code properly

To Reproduce

  1. Train a model in Coqui TTS using trainer
  2. Once a checkpoint for best model is saved, stop the training
  3. Set the checkpoint folder as continue path in the trainer class
  4. Restart from the checkpoint

https://colab.research.google.com/drive/1OwemROn306_JIYASjx39d52eXFHS1O_u

Expected behavior

The training should stop

Logs

Traceback (most recent call last):
  File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/trainer.py", line 1808, in fit
    self._fit()
  File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/trainer.py", line 1771, in _fit
    self.save_best_model()
  File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/utils/distributed.py", line 35, in wrapped_fn
    return fn(*args, **kwargs)
  File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/trainer.py", line 1893, in save_best_model
    self.best_loss = save_best_model(
  File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/io.py", line 183, in save_best_model
    if current_loss < best_loss:
TypeError: '<' not supported between instances of 'float' and 'dict'

Environment

-torch: 2.1.0
-trainer: 0.0.31
-python: 3.10
-OS: Endeavor OS
-cuda: cuda_12.2.r12.2
-GPU: NVIDIA RTX 3060
-pytorch installation: pip

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant