Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] XTTS v2 - short utterances finetune doesn't work #3964

Open
SinanAkkoyun opened this issue Aug 13, 2024 · 4 comments
Open

[Bug] XTTS v2 - short utterances finetune doesn't work #3964

SinanAkkoyun opened this issue Aug 13, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@SinanAkkoyun
Copy link

SinanAkkoyun commented Aug 13, 2024

Describe the bug

I can not get short utterances (a couple words) to work without hallucinations at the end, despite my training mix being 50/50 very short and long utterances. Why won't the GPT predict the EOT token correctly if it has seen enough examples already? (1h training data at epoch 46)

Is it due to some batched training optimization that neglects EOT tokens?

To Reproduce

Finetune model on 50/50 very short and long utterances (or sometimes see with pretrained xttsv2 checkpoint and custom speaker latent) and prompt with "Program complete." or something.

Expected behavior

It should cut off after generating the sentence.

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090",
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.4.0+cu121",
        "TTS": "0.24.1",
        "numpy": "1.26.2"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.5",
        "version": "#128-Ubuntu SMP Fri Jul 5 09:28:59 UTC 2024"
    }
}

Additional context

No response

@SinanAkkoyun SinanAkkoyun added the bug Something isn't working label Aug 13, 2024
@SinanAkkoyun
Copy link
Author

I am at epoch 241 and it just gets worse. It hallucinates, even after a 7 word sentence.
There must be something wrong with batched padding or something, I'd appreciate help.

@duringleaves
Copy link

I get hallucinations and slurs at the beginning and end of short phrases. I've trained it on a mix of short (less than 7 words) and long phrases, but it just doesn't like it.

@SinanAkkoyun
Copy link
Author

@duringleaves you get slurs? xD

@duringleaves
Copy link

Hahaha. Well, the SLURRED words and "oy NUk!" Breakouts tamed down with a much lower temperature. Still not perfect, but more usable than I feared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants