The inference speed on the mobile end is a bit slow #928

Gratifyyy · 2024-09-10T09:14:16Z

Question

If it is a mobile device that does not support WebGPU, how can we improve the inference speed of the model? I have tried WebWorker, but the results were not satisfactory

gyagp · 2024-09-10T13:52:01Z

Did you enable wasm simd and multi-threads? If not, you may give a try with env.backends.onnx.wasm.simd = true and env.backends.onnx.wasm.numThreads = xxx (a reasonable value according to your core number).

flatsiedatsie · 2024-09-10T18:20:22Z

I believe multi-threading al already handled automatically: #882

I can't imagine SIMD not being the same?

Gratifyyy · 2024-09-11T00:55:04Z

Both are turned on but no real changes have occurred

import { AutoModel, AutoProcessor, env, RawImage } from '@xenova/transformers';

env.allowLocalModels = false;

env.backends.onnx.wasm.proxy = true;
env.backends.onnx.wasm.simd = true
env.backends.onnx.wasm.numThreads = 4

const model = await AutoModel.from_pretrained('briaai/RMBG-1.4', {
  config: { model_type: 'custom' },
});

const processor = await AutoProcessor.from_pretrained('briaai/RMBG-1.4', {
  config: {
      do_normalize: true,
      do_pad: false,
      do_rescale: true,
      do_resize: true,
      image_mean: [0.5, 0.5, 0.5],
      feature_extractor_type: "ImageFeatureExtractor",
      image_std: [1, 1, 1],
      resample: 2,
      rescale_factor: 0.00392156862745098,
      size: { width: 1024, height: 1024 },
  }
});

export const predict = async (url: string) =>  {
  // Read image
  const image = await RawImage.fromURL(url);

  // Preprocess image
  const { pixel_values } = await processor(image);

  // Predict alpha matte
  const { output } = await model({ input: pixel_values });

  const pixelData = image.rgba();
  // Resize mask back to original size
  const mask = await RawImage.fromTensor(output[0].mul(255).to('uint8')).resize(image.width, image.height);
  // Convert alpha channel to 4th channel
  for (let i = 0; i < mask.data.length; ++i) {
    pixelData.data[4 * i + 3] = mask.data[i];
  }
  return (pixelData.toSharp());
}

flatsiedatsie · 2024-09-11T06:02:07Z

@xenova Does SIMD have to be enabled manually?

gyagp · 2024-09-11T08:01:53Z

Usually we'd see perf change if playing with the env.backends.onnx.wasm.numThreads (better or worse, and larger number doesn't always mean better). But to make multi-threads work, your web server needs to be cross-origin isolated (https://web.dev/articles/coop-coep). You may open your console, and check if "crossOriginIsolated" is true or false.

gyagp · 2024-09-11T08:12:10Z

@xenova Does SIMD have to be enabled manually?

Transformers.js doesn't have to do anything special to set both simd and numThreads. By default, ORT has simd enabled and numThreads=min( 4, ceil(cpu_core_num / 2)) if crossOriginIsolated is true (Thanks @fs-eire for confirmation).

Gratifyyy · 2024-09-11T08:46:32Z

Usually we'd see perf change if playing with the env.backends.onnx.wasm.numThreads (better or worse, and larger number doesn't always mean better). But to make multi-threads work, your web server needs to be cross-origin isolated (https://web.dev/articles/coop-coep). You may open your console, and check if "crossOriginIsolated" is true or false.

Thanks for the reminder. I just checked and crossOriginIsolated is false. I will try to change it to see if it has any effect.

Gratifyyy added the question Further information is requested label Sep 10, 2024

Gratifyyy changed the title ~~移动端推理速度有点慢~~ The inference speed on the mobile end is a bit slow Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The inference speed on the mobile end is a bit slow #928

The inference speed on the mobile end is a bit slow #928

Gratifyyy commented Sep 10, 2024 •

edited

Loading

gyagp commented Sep 10, 2024

flatsiedatsie commented Sep 10, 2024

Gratifyyy commented Sep 11, 2024

flatsiedatsie commented Sep 11, 2024

gyagp commented Sep 11, 2024

gyagp commented Sep 11, 2024

Gratifyyy commented Sep 11, 2024

The inference speed on the mobile end is a bit slow #928

The inference speed on the mobile end is a bit slow #928

Comments

Gratifyyy commented Sep 10, 2024 • edited Loading

Question

gyagp commented Sep 10, 2024

flatsiedatsie commented Sep 10, 2024

Gratifyyy commented Sep 11, 2024

flatsiedatsie commented Sep 11, 2024

gyagp commented Sep 11, 2024

gyagp commented Sep 11, 2024

Gratifyyy commented Sep 11, 2024

Gratifyyy commented Sep 10, 2024 •

edited

Loading