NPU support for LLM acceleration without gpu #2257

sebastienbo · 2024-04-24T19:24:33Z

sebastienbo
Apr 24, 2024

Is it possible to support NPU's ? Those are way more specialised for LLM traversal and don't require a gpu.
It would offload the cpu and would be able to run from Laptops with NPU's
Here is an example code: https://intel.github.io/intel-npu-acceleration-library/llm.html

linzj · 2024-04-28T02:18:57Z

linzj
Apr 28, 2024

I concur with your perspective; acquiring a 64GB DDR5 RAM module is indeed more feasible compared to obtaining a 64GB GPU at present.Indeed, incorporating NPU support holds the promise of delivering significant advantages to users in terms of model inference compared to solely relying on GPU support.

0 replies

cebtenzzre · 2024-05-10T16:14:22Z

cebtenzzre
May 10, 2024
Maintainer

If we did add support for NPUs the code would go in llama.cpp, which we use for LLM inference. See these upstream issues:

[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors? ggerganov/llama.cpp#1499
will this project support on device npu like qualcomm hexagon？ ggerganov/llama.cpp#2687

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU support for LLM acceleration without gpu #2257

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

NPU support for LLM acceleration without gpu #2257

sebastienbo Apr 24, 2024

Replies: 2 comments

linzj Apr 28, 2024

cebtenzzre May 10, 2024 Maintainer

sebastienbo
Apr 24, 2024

linzj
Apr 28, 2024

cebtenzzre
May 10, 2024
Maintainer