Skip to content

CUDA: fix PDL CC check for JIT compilation#23471

Merged
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
JohannesGaessler:cuda-fix-pdl-cc
May 21, 2026
Merged

CUDA: fix PDL CC check for JIT compilation#23471
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
JohannesGaessler:cuda-fix-pdl-cc

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

See discussion starting at #22522 (comment) .

On master we are currently checking the raw CC value vs. Hopper for PDL. However, what we should be doing is use ggml_cuda_highest_compiled_arch(cc) for the check instead to avoid a mismatch between host and device code for JIT compilation.

Requirements

ORippler
ORippler previously approved these changes May 21, 2026
@ORippler ORippler dismissed their stale review May 21, 2026 14:08

clicked a wrong button

Copy link
Copy Markdown
Collaborator

@ORippler ORippler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had the time to read up on ggml_cuda_highest_compiled_arch now, thanks for fixing this

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 21, 2026
@ORippler
Copy link
Copy Markdown
Collaborator

ORippler commented May 21, 2026

@JohannesGaessler Just a heads-up, but I think we will be generally facing the same issue for family/architecture specific features that we have for mxfp4/nvfp4 acceleration on BW currently (i.e. compilation failing for sm_120 as we have no way to distinguish between family/arch specific compilation on host code and us having to monkey-patch it to sm_120a in cmake). On this virtue, it actually makes sense to compile for sm_90-virtual, as this will guarantee correct behavior for PDL.

To explain:

should a user compile only for sm_90a, __CUDA_ARCH_LIST__ will include sm_90, yet the sm_90a ptx will not be forward-jittable. Our host code will think we can dispatch to PDL on say sm_120, while the device code from sm_89 does not contain the required barriers.

I'll try to push again for including arch/family specific compilation into __CUDA_ARCH_LIST__/onto host-side (it currently is available only in __CUDA_ARCH__), or at least ask for a robust work-around.

@JohannesGaessler JohannesGaessler merged commit 4f0e43d into ggml-org:master May 21, 2026
47 checks passed
ProTekk pushed a commit to ProTekk/buun-llama-cpp that referenced this pull request May 22, 2026
Alex7MV pushed a commit to Alex7MV/claude_llama.cpp that referenced this pull request May 22, 2026
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 22, 2026
* origin/master:
server: only parse empty msg if continuing an assistant msg (ggml-org#23506)
perplexity : fix integer overflow (ggml-org#23496)
SYCL: improve MoE prefill throughput (ggml-org#23142)
sycl : Level Zero detection in ggml_sycl_init (ggml-org#23097)
SYCL : gated_delta_net K>1 (ggml-org#23174)
SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (ggml-org#21580)
docs: Update documentation with Granite 4.0/4.1 (ggml-org#23404)
ggml-zendnn : add Q8_0 quantization support (ggml-org#23414)
cmake : build router app only during standalone builds (ggml-org#23521)
vocab : fix HybridDNA tokenizer (ggml-org#23466)
cmake : add install() for impl libraries + fix apple builds (ggml-org#23511)
CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)
cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (ggml-org#23462)
Update WebGPU support and add link to blog/demo (ggml-org#23483)
vulkan: fuse snake activation (mul, sin, sqr, mul, add) (ggml-org#22855)
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants