CUDA: Check PTX version on host side to guard PDL dispatch by ORippler · Pull Request #23530 · ggml-org/llama.cpp

ORippler · 2026-05-22T13:08:48Z

Overview

Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX).

Thus, one can have a bug when compiling with
DCMAKE_CUDA_ARCHITECTURES="89;90a", where current code would wrongly dispatch to PDL on sm_90/sm_120 in forward-JIT mode.

This PR fixes this issue by checking cudaFuncAttributes::ptxVersion of the incoming kernel at runtime. A check on ptxVersion alone is sufficient, as device-codes will always be >= ptxVersion (and any violation of this would be a severe bug in CUDA/nvcc), see:
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code

Additional information

Follow-up to #23471 that implements a more complete fix.

To verify, apply

diff --git a/ggml/src/ggml-cuda/common.cuh b/ggml/src/ggml-cuda/common.cuh
index 20b1846c0..19dd53a72 100644
--- a/ggml/src/ggml-cuda/common.cuh
+++ b/ggml/src/ggml-cuda/common.cuh
@@ -1584,6 +1584,9 @@ static bool ggml_cuda_kernel_can_use_pdl(const void * kernel) {
     // We have to guard on a loaded kernel's PTX version so a kernel forward-JIT'ed
     // from pre-Hopper PTX to a Hopper-or-newer GPU does not opt into PDL.
     const bool can_use_pdl = attr.ptxVersion >= 90;
+    printf("Kernel %p on device %d has PTX version %d -> can_use_pdl = %d, highest compiled_arch = %d\n", kernel,
+           device, attr.ptxVersion, can_use_pdl,
+           ggml_cuda_highest_compiled_arch(ggml_cuda_info().devices[ggml_cuda_get_device()].cc));
     cache.emplace(key, can_use_pdl);
     return can_use_pdl;
 }

& compile with something like

cmake -S . -B build_test_ptx -G Ninja \                                
  -DLLAMA_FATAL_WARNINGS=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CUDA_ARCHITECTURES="89;90a" \
  -DGGML_CUDA=ON

Running test-backend-ops -o MUL_MAT on a CC > 9.0 GPU will now correctly disable PDL, and show logs like
Kernel 0x7cf22ed84aa0 on device 0 has PTX version 89 -> can_use_pdl = 0, highest compiled_arch = 900

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - paired with codex on this.

Checking on `__CUDA_ARCH_LIST__` alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX). Thus, one can have a bug when compiling with `DCMAKE_CUDA_ARCHITECTURES="89;90a"`, where current code would wrongly dispatch to PDL on sm_90/sm_120 in forward-JIT mode. This PR fixes this issue by checking `cudaFuncAttributes::ptxVersion` of the incoming kernel at runtime. A check on ptxVersion alone is sufficient, as device-codes will always be >= ptxVersion (and any violation of this would be a severe bug in CUDA/nvcc), see: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code

Magic constants were taken from boost: https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

JohannesGaessler

The convention of the surrounding code is size_t vs. e.g. std::size_t but either way is fine I think.

* origin/master: vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826) graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864) server: remove obsolete scripts (ggml-org#23870) ci : update macos release to use macos-26 runner (ggml-org#23878) download: add option to skip_download (ggml-org#23059) mtmd: Add DeepSeekOCR 2 Support (ggml-org#20975) CUDA: Check PTX version on host side to guard PDL dispatch (ggml-org#23530) server: bump timeout to 3600s (ggml-org#23842) model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (ggml-org#23346) llama: use f16 mask for FA to save VRAM (ggml-org#23764) sync : ggml ggml : bump version to 0.13.1 (ggml/1523) ngram-mod : Add missing include (ggml-org#23857) llama: add llm_graph_input_mtp (ggml-org#23643) app : move licences to llama-app (ggml-org#23824) cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml-org#23825) meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (ggml-org#23480)

…23530) * CUDA: Check PTX version on host side to guard PDL dispatch Checking on `__CUDA_ARCH_LIST__` alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_90, sm_90a or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX). Thus, one can have a bug when compiling with `DCMAKE_CUDA_ARCHITECTURES="89;90a"`, where current code would wrongly dispatch to PDL on sm_90/sm_120 in forward-JIT mode. This PR fixes this issue by checking `cudaFuncAttributes::ptxVersion` of the incoming kernel at runtime. A check on ptxVersion alone is sufficient, as device-codes will always be >= ptxVersion (and any violation of this would be a severe bug in CUDA/nvcc), see: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code * Implement MurmurHash3 mixer for better hash distribution Magic constants were taken from boost: https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65 * Update ggml/src/ggml-cuda/common.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Address review comments, make seed non-zero * Apply code-formatting * Replace std::size_t -> size_t for consistency --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ORippler requested a review from a team as a code owner May 22, 2026 13:08

ORippler requested a review from JohannesGaessler May 22, 2026 13:09

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 22, 2026

JohannesGaessler reviewed May 24, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

Implement MurmurHash3 mixer for better hash distribution

3144936

Magic constants were taken from boost: https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65

ORippler requested a review from JohannesGaessler May 27, 2026 09:08

JohannesGaessler reviewed May 27, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

JohannesGaessler reviewed May 27, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/common.cuh Outdated

ORippler and others added 3 commits May 27, 2026 17:43

Update ggml/src/ggml-cuda/common.cuh

2c6d1f3

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Address review comments, make seed non-zero

dc68658

Apply code-formatting

aaff9c2

ORippler requested a review from JohannesGaessler May 27, 2026 15:58

JohannesGaessler approved these changes May 27, 2026

View reviewed changes

Replace std::size_t -> size_t for consistency

864bac0

ORippler requested a review from gaugarg-nv May 28, 2026 08:44

gaugarg-nv approved these changes May 29, 2026

View reviewed changes

JohannesGaessler merged commit 6ed481e into ggml-org:master May 29, 2026
28 checks passed

ORippler deleted the osimons/ptx_pdl_checks_cuda branch May 29, 2026 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Check PTX version on host side to guard PDL dispatch#23530

CUDA: Check PTX version on host side to guard PDL dispatch#23530
JohannesGaessler merged 6 commits into
ggml-org:masterfrom
ORippler:osimons/ptx_pdl_checks_cuda

ORippler commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ORippler commented May 22, 2026

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants