CUDA: fix PDL CC check for JIT compilation by JohannesGaessler · Pull Request #23471 · ggml-org/llama.cpp

JohannesGaessler · 2026-05-21T09:21:51Z

See discussion starting at #22522 (comment) .

On master we are currently checking the raw CC value vs. Hopper for PDL. However, what we should be doing is use ggml_cuda_highest_compiled_arch(cc) for the check instead to avoid a mismatch between host and device code for JIT compilation.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: No

clicked a wrong button

ORippler

Had the time to read up on ggml_cuda_highest_compiled_arch now, thanks for fixing this

ORippler · 2026-05-21T18:40:24Z

@JohannesGaessler Just a heads-up, but I think we will be generally facing the same issue for family/architecture specific features that we have for mxfp4/nvfp4 acceleration on BW currently (i.e. compilation failing for sm_120 as we have no way to distinguish between family/arch specific compilation on host code and us having to monkey-patch it to sm_120a in cmake). On this virtue, it actually makes sense to compile for sm_90-virtual, as this will guarantee correct behavior for PDL.

To explain:

should a user compile only for sm_90a, __CUDA_ARCH_LIST__ will include sm_90, yet the sm_90a ptx will not be forward-jittable. Our host code will think we can dispatch to PDL on say sm_120, while the device code from sm_89 does not contain the required barriers.

I'll try to push again for including arch/family specific compilation into __CUDA_ARCH_LIST__/onto host-side (it currently is available only in __CUDA_ARCH__), or at least ask for a robust work-around.

* origin/master: server: only parse empty msg if continuing an assistant msg (ggml-org#23506) perplexity : fix integer overflow (ggml-org#23496) SYCL: improve MoE prefill throughput (ggml-org#23142) sycl : Level Zero detection in ggml_sycl_init (ggml-org#23097) SYCL : gated_delta_net K>1 (ggml-org#23174) SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (ggml-org#21580) docs: Update documentation with Granite 4.0/4.1 (ggml-org#23404) ggml-zendnn : add Q8_0 quantization support (ggml-org#23414) cmake : build router app only during standalone builds (ggml-org#23521) vocab : fix HybridDNA tokenizer (ggml-org#23466) cmake : add install() for impl libraries + fix apple builds (ggml-org#23511) CUDA: fix PDL CC check for JIT compilation (ggml-org#23471) cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (ggml-org#23462) Update WebGPU support and add link to blog/demo (ggml-org#23483) vulkan: fuse snake activation (mul, sin, sqr, mul, add) (ggml-org#22855)

CUDA: fix PDL CC check for JIT compilation

38e77bf

JohannesGaessler requested a review from a team as a code owner May 21, 2026 09:21

JohannesGaessler requested a review from ORippler May 21, 2026 09:22

JohannesGaessler mentioned this pull request May 21, 2026

Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) #22522

Merged

gaugarg-nv approved these changes May 21, 2026

View reviewed changes

ORippler previously approved these changes May 21, 2026

View reviewed changes

ORippler approved these changes May 21, 2026

View reviewed changes

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 21, 2026

JohannesGaessler merged commit 4f0e43d into ggml-org:master May 21, 2026
47 checks passed

ProTekk pushed a commit to ProTekk/buun-llama-cpp that referenced this pull request May 22, 2026

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

e0c0479

ORippler mentioned this pull request May 22, 2026

CUDA: Check PTX version on host side to guard PDL dispatch #23530

Merged

Alex7MV pushed a commit to Alex7MV/claude_llama.cpp that referenced this pull request May 22, 2026

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

16bf438

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

11f02d3

srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

d535d84

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

3f4d6b8

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

c20567f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix PDL CC check for JIT compilation#23471

CUDA: fix PDL CC check for JIT compilation#23471
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
JohannesGaessler:cuda-fix-pdl-cc

JohannesGaessler commented May 21, 2026

Uh oh!

ORippler left a comment

Uh oh!

ORippler commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JohannesGaessler commented May 21, 2026

Requirements

Uh oh!

ORippler left a comment

Choose a reason for hiding this comment

Uh oh!

ORippler commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ORippler commented May 21, 2026 •

edited

Loading