Clang cannot detect hermetic cuda version #16877

tchatow · 2024-09-06T17:51:29Z

When using the hermetic CUDA toolchain with Clang, the CUDA version is not actually detected by Clang and defaults to the latest known toolkit version. Clang searches for external/cuda_nvcc/include/cuda.h, and when it is not found, the version detection logic switches to CudaVersion::NEW.

For instance, in clang 18.1.8, CudaVersion::NEW is 12.3. When we set HERMETIC_CUDA_VERSION="12.2.0", this mismatch and failed detection causes clang to use isa 8.3 which is incompatible with cuda 12.2.0.

The easiest fix is probably to copy external/cuda_cudart/include/cuda.h into external/cuda_nvcc/include/cuda.h.

The text was updated successfully, but these errors were encountered:

Fixes openxla#16877

johnnynunez · 2024-09-07T18:04:36Z

I have this error:

third_party/gpus/cuda/include/cuda.h
ts '-std=c++17' -c external/xla/xla/stream_executor/cuda/cuda_status.cc -o bazel-out/aarch64-opt/bin/external/xla/xla/stream_executor/cuda/_objs/cuda_status_cuda_only/cuda_status.pic.o)
# Configuration: d9608b0aa616855e3fabfa1f8c73e3eec1e37022bef94165bef09db9202f5654
# Execution platform: @local_execution_config_platform//:platform
In file included from external/xla/xla/stream_executor/cuda/cuda_status.cc:16:
external/xla/xla/stream_executor/cuda/cuda_status.h:22:10: fatal error: 'third_party/gpus/cuda/include/cuda.h' file not found

johnnynunez · 2024-09-09T12:55:07Z

external/xla/xla/stream_executor/cuda/cuda_status.h:22:10: fatal error: 'third_party/gpus/cuda/include/cuda.h' file not found
#include "third_party/gpus/cuda/include/cuda.h"

python3 build/build.py --enable_cuda --cuda_compute_capabilities=sm_87 --bazel_options=--repo_env=LOCAL_CUDA_PATH="/usr/local/cuda-12.2" --bazel_options=--repo_env=LOCAL_CUDNN_PATH="/usr/lib/aarch64-linux-gnu"

System info (python version, jaxlib version, accelerator, etc.)

Jetson AGX Orin 22.04 cuda 12.2

ybaturina · 2024-09-11T19:55:43Z

Hi @johnnynunez it was an architectural decision to make CUDA_VERSION defined explicitly instead of looking it up implicitly in the old non-hermetic CUDA rules.

So, if you are using a local source of CUDA/CUDNN redistributions (which is not recommended), you still need to pass the correct HERMETIC_CUDA_VERSION and HERMETIC_CUDNN_VERSION in the parameters of Python script.

Also please make sure that the structure of the folders with CUDA, CUDNN and NCCL is exactly the same as described in the instructions. This structure is in line with the structure of redistributions which can be downloaded from NVIDIA site.

If you absolutely need repository rule to discover the CUDA version installed locally, you can use the deprecated method documented here.

johnnynunez · 2024-09-11T20:07:52Z

Hi @johnnynunez it was an architectural decision to make CUDA_VERSION defined explicitly instead of looking it up implicitly in the old non-hermetic CUDA rules.

see internally the jsons
Sbsa is for arm64 servers
Tegra is for edge devices

both are aarch64
#16905

So, if you are using a local source of CUDA/CUDNN redistributions (which is not recommended), you still need to pass the correct HERMETIC_CUDA_VERSION and HERMETIC_CUDNN_VERSION in the parameters of Python script.

Also please make sure that the structure of the folders with CUDA, CUDNN and NCCL is exactly the same as described in the instructions. This structure is in line with the structure of redistributions which can be downloaded from NVIDIA site.

If you absolutely need repository rule to discover the CUDA version installed locally, you can use the deprecated method documented here.

Yes, I totally agree. But you have or rather XLA has the failure to consider SBSA as AARCH64. When the jetson is tegra chip and uses aarch64 but they are other packages

ybaturina · 2024-09-11T20:24:59Z

Yes, I totally agree. But you have or rather XLA has the failure to consider SBSA as AARCH64. When the jetson is tegra chip and uses aarch64 but they are other packages.

Thank you for the clarification, I understand the issue now.
I asked about linux-aarch64 packages a while ago, and I was told that I can use linux-sbsa instead. Also I noticed that linux-sbsa had newer versions than linux-aarch64.
Is there any other indication that Jetson platform is used, apart from the environment variable JETSON_PLATFORM?

johnnynunez · 2024-09-11T21:09:41Z

Thank you for the clarification, I understand the issue now.
I asked about linux-aarch64 packages a while ago, and I was told that I can use linux-sbsa instead. Also I noticed that linux-sbsa had newer versions than linux-aarch64.
Is there any other indication that Jetson platform is used, apart from the environment variable JETSON_PLATFORM?

Hello,
Now jetson has sota packages, it is like PC with RTX. They are moving fast because jetson thor based on blackwell is coming end of the year also.

are there JETSON_PLATFORM variable?
I mean, because in the list of packages I didn’t see it.

i’ve tried to differentiate getting the board id, like jetson containers does. https://github.com/dusty-nv/jetson-containers/blob/master/jetson_containers/l4t_version.py

jetson doesn’t have NCCL.
Jetson has:
Cuda 12.6.1
Cudnn 9.4.0
Tensorrt 10.4.0

example:

ybaturina · 2024-09-11T22:12:08Z

Hi @johnnynunez , can we use L4T_VERSION environment variable to determine if linux-aarch64 packages should be downloaded instead of linux-sbsa?

johnnynunez · 2024-09-11T22:31:13Z

Hi @johnnynunez , can we use L4T_VERSION environment variable to determine if linux-aarch64 packages should be downloaded instead of linux-sbsa?

my idea was like detect automatically:
https://github.com/openxla/xla/pull/16905/files

ybaturina · 2024-09-11T22:42:48Z

Do you mean the line is_jetson = repository_ctx.os.environ.get("JETSON_PLATFORM", None)? Is there a guarantee that JETSON_PLATFORM environment variable is always present in such builds?

ybaturina · 2024-09-12T18:21:50Z

I've posted a workaround here.

Fixes openxla#16877

tchatow added a commit to tchatow/xla that referenced this issue Sep 6, 2024

Symlink hermetic cuda headers to permit clang cuda version detection

6d2b56a

Fixes openxla#16877

tchatow linked a pull request Sep 6, 2024 that will close this issue

Symlink hermetic cuda headers to permit clang cuda version detection #16882

Open

johnnynunez mentioned this issue Sep 11, 2024

Error compile: --bazel_options=--repo_env=LOCAL_CUDA_PATH="${CUDA_HOME}" issues with clang & gcc jax-ml/jax#23575

Open

tchatow added a commit to tchatow/xla that referenced this issue Sep 14, 2024

Symlink hermetic cuda headers to permit clang cuda version detection

4643e2d

Fixes openxla#16877

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clang cannot detect hermetic cuda version #16877

Clang cannot detect hermetic cuda version #16877

tchatow commented Sep 6, 2024

johnnynunez commented Sep 7, 2024 •

edited

Loading

johnnynunez commented Sep 9, 2024

ybaturina commented Sep 11, 2024

johnnynunez commented Sep 11, 2024 •

edited

Loading

ybaturina commented Sep 11, 2024 •

edited

Loading

johnnynunez commented Sep 11, 2024 •

edited

Loading

ybaturina commented Sep 11, 2024

johnnynunez commented Sep 11, 2024

ybaturina commented Sep 11, 2024

ybaturina commented Sep 12, 2024

Clang cannot detect hermetic cuda version #16877

Clang cannot detect hermetic cuda version #16877

Comments

tchatow commented Sep 6, 2024

johnnynunez commented Sep 7, 2024 • edited Loading

johnnynunez commented Sep 9, 2024

System info (python version, jaxlib version, accelerator, etc.)

ybaturina commented Sep 11, 2024

johnnynunez commented Sep 11, 2024 • edited Loading

ybaturina commented Sep 11, 2024 • edited Loading

johnnynunez commented Sep 11, 2024 • edited Loading

ybaturina commented Sep 11, 2024

johnnynunez commented Sep 11, 2024

ybaturina commented Sep 11, 2024

ybaturina commented Sep 12, 2024

johnnynunez commented Sep 7, 2024 •

edited

Loading

johnnynunez commented Sep 11, 2024 •

edited

Loading

ybaturina commented Sep 11, 2024 •

edited

Loading

johnnynunez commented Sep 11, 2024 •

edited

Loading