Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nv/AMD sycl target build cmd #5357

Closed
wants to merge 4 commits into from

Conversation

abhilash1910
Copy link
Collaborator

Cmake modification for Nv/AMD SYCL builds.
@NeoZhangJianyu @ggerganov @airMeng @AidanBeltonS @Alcpz

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@abhilash1910
Copy link
Collaborator Author

abhilash1910 commented Feb 6, 2024

For SYCL runtime perf across nv and amd vendors this is based on sycl.cpp codebase. Features which will get added to the sycl code base will be tested as is on the other vendors . Priority wise we support Intel GPUs optimization and see performance of codebase on nv and amd on sycl runtime and provide improvements.

@NeoZhangJianyu
Copy link
Collaborator

NeoZhangJianyu commented Feb 6, 2024

@abhilash1910

  1. Please make sure test is passed with NV & AMD GPU.
  2. Update the README-sycl.md to guide how to install related software, build and run.
  3. Add the GPU modes verified by you in supported list in README-sycl.md.
  4. Run the CI by ci/run.sh on NV & AMD GPU.

@Alcpz
Copy link
Collaborator

Alcpz commented Feb 6, 2024

@abhilash1910 @AidanBeltonS has been trying to run the SYCL version on Nvidia GPUs, but the tests are still not passing.
Another issue is that AMD builds require to manually specify the --offload-arch as there is currently no default parameter for that. We will be aiding with the review shortly.

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 6, 2024

I tried running it on Nvidia and AMD on Linux, but ran into some issues. It takes very long to compile and still only reports the Intel/CPU devices in the system, then dies.

» build_sycl/bin/main -t 16 -f ~/llama.cpp/input.txt -b 512 -c 2048 -n 128 --ignore-eos -m ~/koboldcpp/models/airoboros-m-7b-3.1.2.Q4_K_S.gguf -ngl 1000
Log start
main: build = 2080 (bea82a05)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017) for x86_64-unknown-linux-gnu
main: seed  = 1707249047
GGML_SYCL_DEBUG=0
ggml_init_sycl: GGML_SYCL_F16:   no
ggml_init_sycl: SYCL_USE_XMX: yes
found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A770 Graphics,     compute capability 1.3,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 1: Intel(R) FPGA Emulation Device,     compute capability 1.2,
        max compute_units 32,   max work group size 67108864,   max sub group size 64,  global mem size 134931963904
  Device 2: AMD EPYC 7302 16-Core Processor                ,    compute capability 3.0,
        max compute_units 32,   max work group size 8192,       max sub group size 64,  global mem size 134931963904
  Device 3: Intel(R) Arc(TM) A770 Graphics,     compute capability 3.0,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
[...]
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:            KV buffer size =   256,00 MiB
llama_new_context_with_model: KV self size  =  256,00 MiB, K (f16):  128,00 MiB, V (f16):  128,00 MiB
llama_new_context_with_model:        CPU input buffer size   =    12,01 MiB
llama_new_context_with_model:            compute buffer size =   171,60 MiB
llama_new_context_with_model:        CPU compute buffer size =     8,80 MiB
llama_new_context_with_model: graph splits (measure): 3
Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)Exception caught at file:/home/user/upstream-llama.cpp/ggml-sycl.cpp, line:12706

@abhilash1910 abhilash1910 marked this pull request as draft February 7, 2024 03:07
@AidanBeltonS
Copy link
Contributor

I tried running it on Nvidia and AMD on Linux, but ran into some issues. It takes very long to compile and still only reports the Intel/CPU devices in the system, then dies.

Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)Exception caught at file:/home/user/upstream-llama.cpp/ggml-sycl.cpp, line:12706

When you build for non spirv targets (i.e. NVidia and AMD) you must pass the device triple to the compiler i.e. -fsycl-targets=nvptx64-nvidia-cuda and -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a for Nvidia and AMD respectively. Please note, you must replace gfx90a with your AMD GPU's architecture. Then you will no longer get an invalid binary error.

However, as pointed out before NVidia and AMD are not yet passing all tests so you should not expect it to run properly just yet.

@NeoZhangJianyu
Copy link
Collaborator

I tried running it on Nvidia and AMD on Linux, but ran into some issues. It takes very long to compile and still only reports the Intel/CPU devices in the system, then dies.

Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)Exception caught at file:/home/user/upstream-llama.cpp/ggml-sycl.cpp, line:12706

When you build for non spirv targets (i.e. NVidia and AMD) you must pass the device triple to the compiler i.e. -fsycl-targets=nvptx64-nvidia-cuda and -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a for Nvidia and AMD respectively. Please note, you must replace gfx90a with your AMD GPU's architecture. Then you will no longer get an invalid binary error.

However, as pointed out before NVidia and AMD are not yet passing all tests so you should not expect it to run properly just yet.

@abhilash1910 @AidanBeltonS
Is it possible to update such compile setting info in the CMakeFile.txt or README-sycl.md?
so that reduce same issue.

I suggest adding a sub chapter "AOT" in chapter "build" in README-sycl.md.

@AidanBeltonS
Copy link
Contributor

@abhilash1910 @AidanBeltonS Is it possible to update such compile setting info in the CMakeFile.txt or README-sycl.md? so that reduce same issue.

I suggest adding a sub chapter "AOT" in chapter "build" in README-sycl.md.

Yes, I think we should update the CMake and README to properly support this. However, I do not propose making this change until we have the CUDA and HIP backends passing tests which is currently not the case.

@abhilash1910
Copy link
Collaborator Author

Addressed in #5738.Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants