CUDA: fix PDL CC check for JIT compilation#23471
Conversation
ORippler
left a comment
There was a problem hiding this comment.
Had the time to read up on ggml_cuda_highest_compiled_arch now, thanks for fixing this
|
@JohannesGaessler Just a heads-up, but I think we will be generally facing the same issue for family/architecture specific features that we have for mxfp4/nvfp4 acceleration on BW currently (i.e. compilation failing for sm_120 as we have no way to distinguish between family/arch specific compilation on host code and us having to monkey-patch it to sm_120a in cmake). On this virtue, it actually makes sense to compile for To explain: should a user compile only for sm_90a, I'll try to push again for including arch/family specific compilation into |
* origin/master: server: only parse empty msg if continuing an assistant msg (ggml-org#23506) perplexity : fix integer overflow (ggml-org#23496) SYCL: improve MoE prefill throughput (ggml-org#23142) sycl : Level Zero detection in ggml_sycl_init (ggml-org#23097) SYCL : gated_delta_net K>1 (ggml-org#23174) SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (ggml-org#21580) docs: Update documentation with Granite 4.0/4.1 (ggml-org#23404) ggml-zendnn : add Q8_0 quantization support (ggml-org#23414) cmake : build router app only during standalone builds (ggml-org#23521) vocab : fix HybridDNA tokenizer (ggml-org#23466) cmake : add install() for impl libraries + fix apple builds (ggml-org#23511) CUDA: fix PDL CC check for JIT compilation (ggml-org#23471) cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (ggml-org#23462) Update WebGPU support and add link to blog/demo (ggml-org#23483) vulkan: fuse snake activation (mul, sin, sqr, mul, add) (ggml-org#22855)
See discussion starting at #22522 (comment) .
On master we are currently checking the raw CC value vs. Hopper for PDL. However, what we should be doing is use
ggml_cuda_highest_compiled_arch(cc)for the check instead to avoid a mismatch between host and device code for JIT compilation.Requirements