-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't detect GPU devices #538
Comments
not sure the root cause, but i see the container(3.11) and host(3.10) have different python version in your env besides, I see in the blog Llama2 inference it needs
before execute the python cmd
btw, could you be able to inference llama2-13B model on 2 Arc A770 gpus? Thanks~ |
@mudler - To start with, I would look into the drivers - specifically the UMD. I point to UMD because the intel-published docker container is picking up devices well. Since level-zero is installed, You may have to install 'clinfo'. If the outputs look well, then I concur with @BismarckDD , that likely the proper oneAPI environment variables were not sourced well, before running the python command. So - please try that as well. |
@intel-ravig what I should look for UMD? I just installed the drivers as per the Intel docs I've linked, with the steps in the issue. This is a newly installed 22.04 LTS box. Just for reference, here are the steps:
|
to reiterate: on the same box with llama.cpp it all works fine and I can offload correctly to the GPUs. It just looks a problem with ipex |
@mudler - I am able to duplicate your issue on your conda environment. |
thanks @intel-ravig ! for the time being in LocalAI I'll go with supporting it without conda - however conda support is much wanted as otherwise implementations become quite convoluted and harder to follow |
@mudler - I got in touch with the engineering team and got a solution.
I tried these steps: I was able to solve the issue with conda env. Please check and let us know. |
There currently exists a similar issue with intel/compute-runtime, however it's been observed on kernel 6.8 (kernel 6.7 apparently works normally). It might also manifest on your ancient kernel. Could this be the same cause? Here is the fix: intel/compute-runtime#710 (comment) |
Hi @mudler Did this solution fix your issue? |
Describe the bug
Context: I'm the author of LocalAI, and I'm trying to bring diffusers and transformers support to it ( mudler/LocalAI#1746 ).
I'm starting by following the documentation in https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.10%2Bxpu , however It seems after successfully installing with conda all the dependencies, running the "Sanity" test I cannot find the devices in my system.
I have 2 Intel Arc A770, but when running:
The result is just:
By printing torch.xpu.device_count(), it returns 0.
My user is in the video/render group:
Running conda install is successfull, indeed seems I have all the packages:
system dependencies are there, indeed, I can run llama.cpp just fine and offloading everything to the GPU:
Since I am able to run llama.cpp within this host successfully (also via containers and kubernetes) I'm suspecting is somehow the python environment that cannot detect the devices.
Any help and hint would be greatly appreciated, thanks!
Versions
Oddly enough, from the docker container it seems to detect the devices just fine:
The text was updated successfully, but these errors were encountered: