Support wavlm-base-plus-sv with WebGPU #914

flatsiedatsie · 2024-08-30T19:18:33Z

System Info

Transformers.js Alpha 10, Brave

Environment/Platform

Description

Not sure what happened, but:

did page refresh
started Whisper
Saw this:

I did just fiddle with moving a wasm file into a local folder. But since it doesn't seem to load those, it shouldn't be the cause.

Reproduction

I'll share more if I can reproduce it myself.

flatsiedatsie · 2024-08-31T08:57:47Z

It seems to be related to the audio verification model I'm using.

Perhaps I'm accidentally running it in paralel.

flatsiedatsie · 2024-09-02T08:24:59Z

I had forgotten an await. But even after fixing that it still occurs.

I've added a 5 second delay between verification of the audio snippets, and in the console I can see that it's only attempting a verification every 5+ seconds. So it's definitely not running in paralel.

It also seems to output an embedding, despite the error. Though perhaps it's outputting an older embedding? I'm going to check that next.

flatsiedatsie · 2024-09-02T09:01:14Z

I played around with D-types. Then I realized that the issue is probably with using WebGPU in the first place.

Solution

I switched the verification model over to WASM, and bingo, now it runs fine. It's detecting multiple speakers again.

Conclusion

So the conclusion is: the Xenova/wavlm-base-plus-sv model does not yet have WebGPU support.

It doesn't seem to be much slower, so I don't think it matters at all. But just for completeness I'll rename this issue to 'Support wavlm-base-plus-sv with WebGPU'.

wespeaker-voxceleb-resnet34-LM

I also quickly swapped in onnx-community/wespeaker-voxceleb-resnet34-LM. When using WebGPU (at default or forced to FP32) it doesn't output errors, but the embeddings it returns aren't useful? Here are some similarity score outputs (with the similarity threshold lowered to 0.5 instead of 0.95):

(expectation: two speakers)

This could just be implementation error on my part. But for now I'll be sticking with wavlm-base-plus-sv.

flatsiedatsie added the bug Something isn't working label Aug 30, 2024

flatsiedatsie changed the title ~~Number of computer buffers exceeds...~~ Th number of storage buffers (14) in the Compute stage exceeds the maximum per-stage limit (8) Aug 30, 2024

flatsiedatsie changed the title ~~Th number of storage buffers (14) in the Compute stage exceeds the maximum per-stage limit (8)~~ Support wavlm-base-plus-sv with WebGPU Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support wavlm-base-plus-sv with WebGPU #914

Support wavlm-base-plus-sv with WebGPU #914

flatsiedatsie commented Aug 30, 2024

flatsiedatsie commented Aug 31, 2024 •

edited

Loading

flatsiedatsie commented Sep 2, 2024

flatsiedatsie commented Sep 2, 2024 •

edited

Loading

Support wavlm-base-plus-sv with WebGPU #914

Support wavlm-base-plus-sv with WebGPU #914

Comments

flatsiedatsie commented Aug 30, 2024

System Info

Environment/Platform

Description

Reproduction

flatsiedatsie commented Aug 31, 2024 • edited Loading

flatsiedatsie commented Sep 2, 2024

flatsiedatsie commented Sep 2, 2024 • edited Loading

Solution

Conclusion

wespeaker-voxceleb-resnet34-LM

flatsiedatsie commented Aug 31, 2024 •

edited

Loading

flatsiedatsie commented Sep 2, 2024 •

edited

Loading