Removes PDL enrollment of launch_fattn kernels to fix bug on DGX Spark by aendk · Pull Request #23825 · ggml-org/llama.cpp

aendk · 2026-05-28T15:35:41Z

Overview

On DGX Spark, we saw spurious test failures when running test-backend-ops -o FLASH_ATTN_EXT with PDL enabled.
We identified an internal bug which caused a race condition in a kernel launched with launch_fattn().
For now, moving these kernels out of PDL enrollment fixes this bug in my testing.

Performance Impact

Negative perf impact is limited, I saw around ~0.2% perf loss on both DGX Spark and RTX Pro 6000 for the models gpt-oss20B and qwen35moe 35B.A3B Q4_K.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, for debugging. Every line of code proposed here was manually checked and tested before commit.

@ORippler @ggerganov let me know if this fixes the bug on your setups.

am17an · 2026-05-28T15:58:28Z

Is the bug internal i.e. PDL has an issue or is it the placement of this particular instance was wrong?

aendk · 2026-05-28T16:02:57Z

@am17an it is an internal PDL issue, otherwise the fix would've been to move ggml_cuda_pdl_sync() to the correct place.

In essence, a global load, located behind the barrier in C++, is moved ahead of the barrier in bytecode during compilation, which causes an invalid read.

am17an · 2026-05-28T16:05:27Z

Are we sure this bug wouldn't affect other placements? FLASH_ATTN_EXT has quite an extensive suite of shapes which exercise a lot of paths, other tests are maybe relatively sparse.

JohannesGaessler

Please also link this PR in the inline comment and document the affected compiler versions if possible.

JohannesGaessler · 2026-05-28T16:22:30Z

Is it known to which CUDA versions fixes for PDL will be backported? As of right now we are enabling PDL for CUDA versions as old as 11.8 by default but if those remain unpatched we can't do that.

ORippler

Thanks for the fix!

ORippler · 2026-05-28T16:46:31Z

Is it known to which CUDA versions fixes for PDL will be backported?

Generally, this depends on the severity of the issue that was fixed. Will let you know once we know more

ggerganov

Works on my end now.

Btw, as of few days now, we have a DGX Spark doing some of the CUDA CI so we have this covered continuously.

Regarding the concern about affecting other kernels: it's a valid concern, but I think it is worth keeping PDL enabled so we can surface such potential problems faster.

ggerganov · 2026-05-29T04:46:47Z

I've fast-tracked this to include it in the ggml and whisper.cpp releases.

* origin/master: vocab : support tokenizer for LFM2.5-8B-A1B (ggml-org#23826) graph : ensure DS32 kq_mask_lid is F32 (ggml-org#23864) server: remove obsolete scripts (ggml-org#23870) ci : update macos release to use macos-26 runner (ggml-org#23878) download: add option to skip_download (ggml-org#23059) mtmd: Add DeepSeekOCR 2 Support (ggml-org#20975) CUDA: Check PTX version on host side to guard PDL dispatch (ggml-org#23530) server: bump timeout to 3600s (ggml-org#23842) model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (ggml-org#23346) llama: use f16 mask for FA to save VRAM (ggml-org#23764) sync : ggml ggml : bump version to 0.13.1 (ggml/1523) ngram-mod : Add missing include (ggml-org#23857) llama: add llm_graph_input_mtp (ggml-org#23643) app : move licences to llama-app (ggml-org#23824) cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml-org#23825) meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (ggml-org#23480)

…-org#23825)

Fix disables launch_fattn PDL enrollment due to compiler bug

8f4b95e

aendk requested a review from a team as a code owner May 28, 2026 15:35

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 28, 2026

JohannesGaessler approved these changes May 28, 2026

View reviewed changes

ORippler approved these changes May 28, 2026

View reviewed changes

ggerganov approved these changes May 28, 2026

View reviewed changes

ggerganov merged commit 241cbd4 into ggml-org:master May 29, 2026
28 checks passed

ggerganov mentioned this pull request May 29, 2026

TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs #23843

Merged

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml…

d7f2639

…-org#23825)

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

cuda : disables launch_fattn PDL enrollment due to compiler bug (ggml…

e7f7c2f

…-org#23825)

aendk mentioned this pull request Jun 2, 2026

Avoid PDL race conditions by disabling __restrict__ when PDL is used #24030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removes PDL enrollment of launch_fattn kernels to fix bug on DGX Spark#23825

Removes PDL enrollment of launch_fattn kernels to fix bug on DGX Spark#23825
ggerganov merged 1 commit into
ggml-org:masterfrom
aendk:akieslinger/pdl-fattn-fix

aendk commented May 28, 2026

Uh oh!

am17an commented May 28, 2026

Uh oh!

aendk commented May 28, 2026

Uh oh!

am17an commented May 28, 2026 •

edited

Loading

Uh oh!

JohannesGaessler left a comment

Uh oh!

JohannesGaessler commented May 28, 2026

Uh oh!

ORippler left a comment

Uh oh!

ORippler commented May 28, 2026

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

ggerganov commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

aendk commented May 28, 2026

Overview

Performance Impact

Requirements

Uh oh!

am17an commented May 28, 2026

Uh oh!

aendk commented May 28, 2026

Uh oh!

am17an commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented May 28, 2026

Uh oh!

ORippler left a comment

Choose a reason for hiding this comment

Uh oh!

ORippler commented May 28, 2026

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

am17an commented May 28, 2026 •

edited

Loading