Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support wavlm-base-plus-sv with WebGPU #914

Open
1 of 5 tasks
flatsiedatsie opened this issue Aug 30, 2024 · 3 comments
Open
1 of 5 tasks

Support wavlm-base-plus-sv with WebGPU #914

flatsiedatsie opened this issue Aug 30, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@flatsiedatsie
Copy link

System Info

Transformers.js Alpha 10, Brave

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

Not sure what happened, but:

  • did page refresh
  • started Whisper
  • Saw this:
Screenshot 2024-08-30 at 21 11 45

I did just fiddle with moving a wasm file into a local folder. But since it doesn't seem to load those, it shouldn't be the cause.

Reproduction

I'll share more if I can reproduce it myself.

@flatsiedatsie flatsiedatsie added the bug Something isn't working label Aug 30, 2024
@flatsiedatsie flatsiedatsie changed the title Number of computer buffers exceeds... Th number of storage buffers (14) in the Compute stage exceeds the maximum per-stage limit (8) Aug 30, 2024
@flatsiedatsie
Copy link
Author

flatsiedatsie commented Aug 31, 2024

It seems to be related to the audio verification model I'm using.

Perhaps I'm accidentally running it in paralel.

Screenshot 2024-08-31 at 10 54 57

@flatsiedatsie
Copy link
Author

I had forgotten an await. But even after fixing that it still occurs.

I've added a 5 second delay between verification of the audio snippets, and in the console I can see that it's only attempting a verification every 5+ seconds. So it's definitely not running in paralel.

It also seems to output an embedding, despite the error. Though perhaps it's outputting an older embedding? I'm going to check that next.

Screenshot 2024-09-02 at 10 20 41

@flatsiedatsie
Copy link
Author

flatsiedatsie commented Sep 2, 2024

I played around with D-types. Then I realized that the issue is probably with using WebGPU in the first place.

Solution

I switched the verification model over to WASM, and bingo, now it runs fine. It's detecting multiple speakers again.

Conclusion

So the conclusion is: the Xenova/wavlm-base-plus-sv model does not yet have WebGPU support.

It doesn't seem to be much slower, so I don't think it matters at all. But just for completeness I'll rename this issue to 'Support wavlm-base-plus-sv with WebGPU'.

wespeaker-voxceleb-resnet34-LM

I also quickly swapped in onnx-community/wespeaker-voxceleb-resnet34-LM. When using WebGPU (at default or forced to FP32) it doesn't output errors, but the embeddings it returns aren't useful? Here are some similarity score outputs (with the similarity threshold lowered to 0.5 instead of 0.95):

Screenshot 2024-09-02 at 10 54 46 (expectation: two speakers)

This could just be implementation error on my part. But for now I'll be sticking with wavlm-base-plus-sv.

@flatsiedatsie flatsiedatsie changed the title Th number of storage buffers (14) in the Compute stage exceeds the maximum per-stage limit (8) Support wavlm-base-plus-sv with WebGPU Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant