vulkan: async and event fixes by 0cc4m · Pull Request #20518 · ggml-org/llama.cpp

0cc4m · 2026-03-13T15:53:04Z

I noticed incoherence with my multi-GPU setup as well when investigating issues like #20462. I found that they can be fixed by disabling cpy_tensor_async, so the problem is with the async path. I narrowed it down to these problems:

events were set, but the wait command was never submitted to the queue, so the event_wait function didn't do anything
events were resetting command buffers that had long since been reused, because they didn't track that. This was causing validation errors and perhaps driver issues/crashes
there was a race condition between an event being set, being waited on within the GPU and the event being reset by the host. This only didn't lead to deadlocks because event_wait was not working. I fixed it by doing the event reset in the queue instead of outside of it, that way it happens immediately before the next set. command. To avoid the same issue with the fence, I replaced it with a timeline semaphore, which does not need manual resets by the host.

This may help with #20462, #20029, #20517 and maybe some more.

0cc4m · 2026-03-13T16:24:25Z

I'm sure about the event reset asynchronously before setting it. It does seem to work, but maybe we need multiple events instead, to avoid reuse until a full queue synchronization happens.

jeffbolznv · 2026-03-13T17:29:59Z

I don't think the reset right before the set will be safe, this was one of the things that made vkevents very hard to use.

0cc4m · 2026-03-13T17:32:22Z

Probably not, yeah. I'll take another look tomorrow and figure out something better, I guess we need multiple events. I thought about using the timeline semaphore for synchronization within the queue as well, but I think that would be heavier than events and isn't really what they are meant for.

0cc4m · 2026-03-14T08:53:29Z

@jeffbolznv I got something that should work, now. I'm not sure if it could be done in a simpler way, but this was the best I found that still actually reuses without a manual synchronization step anywhere. Without reuse we might get into trouble with too many events during loading, similar to command buffers.

jeffbolznv · 2026-03-14T15:04:23Z

The recycling of events seems to assume that they will only be waited on once. Is that a valid assumption?

0cc4m · 2026-03-14T15:07:07Z

No, they do get waited on multiple times. But the assumption is that after a new event is recorded, the previous one cannot be waited on any longer. So when the new event gets synchronized (cpu-waited), all event waits of the previous one are also done, because they were submitted into the queue before the new event got set.

jeffbolznv · 2026-03-14T17:06:20Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+        vkev->events_submitted.insert(vkev->events_submitted.end(), vkev->events_pending.begin(), vkev->events_pending.end());
+        vkev->events_pending.clear();
+        // Move existing event into pending
+        vkev->events_pending.push_back(vkev->event);


I'm not fully following the logic here. The rest of this function will get an event and submit it. Why doesn't that immediately go into events_submitted? And can there ever be more than one pending event?

Pending and submitted are meant in relation to wait commands in the queue, not set commands. So the flow is like this, for example:

Event 1 recorded Waits for event 1 Event 2 recorded, event 1 goes into pending. It is no longer possible to wait for event 1 Waits for event 2 Event 3 recorded, event 2 goes into pending. Event 1 goes into submitted. Waits for event 3 Synchronize event 3, event 2 and 1 can now be reused because all waits were before event 3.

or

Event 1 recorded Waits for event 1 Synchronize event 1, wait commands for event 1 may still be in the queue, so no reuse yet Event 2 recorded, event 1 moves into pending Waits for event 2 Synchronize event 2, event 1 can now be reused

But I think you're right. The pending stage isn't needed. A synchronization means the queue has reached the set command of the event being synchronized, so all waits for previous events must also be done.

HumerousGorgon · 2026-03-15T17:18:26Z

Hi @0cc4m,

I can confirm that this PR definitely fixes the problems I'm having with inference on Intel GPUS.
Thank you for this!

ggml/src/ggml-vulkan/ggml-vulkan.cpp

sinister-cat · 2026-03-17T01:42:38Z

Tried merging this to master locally, everything builds fine but the assert throws during warmup the phase with qwen3.5? I was able to use the exact same command on master with no immediate issues.

log.txt

Commenting the assert causes a segfault, --no-warmup also causes segfault... any ideas on why this could be?

0cc4m · 2026-03-17T06:00:30Z

@jeffbolznv I guess that answers the "valid use" question. I'll revert the assertion.

…if it isn't" This reverts commit 5825d0b.

0cc4m added 4 commits March 13, 2026 13:40

vulkan: fix event wait submission, event command buffer reset

58deae1

fix event command buffer reset validation error

c0d100e

also reset command buffers before reuse

2204bce

use timeline semaphores instead of fences for event_synchronize

08a4ba6

0cc4m requested a review from jeffbolznv March 13, 2026 15:53

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 13, 2026

don't use initializer list for semaphore wait info

eebf21c

0cc4m mentioned this pull request Mar 13, 2026

Eval bug: Vulkan throws vk::DeviceLostError on Qwen3.5 35B A3B #20462

Closed

0cc4m added 2 commits March 14, 2026 06:39

use multiple events to avoid reset issues

4374b5a

fix event reuse issue with multiple vectors

9adf514

fkroener mentioned this pull request Mar 14, 2026

Eval bug: AMD Vulkan vk::DeviceLostError crash sensitive to ubatch-size and context length #20515

Open

add semaphore wait condition also if compute_ctx already exists

d287bbb

olliewalsh mentioned this pull request Mar 14, 2026

Update default llama.cpp commit hash to b8323. containers/ramalama#2515

Closed

jeffbolznv reviewed Mar 14, 2026

View reviewed changes

remove event pending stage

a338a1e

0cc4m mentioned this pull request Mar 15, 2026

Eval bug: Vulkan branch produces gibberish using multi-IntelGPU setup (3 x A770) past build 8183. #20097

Open

jeffbolznv reviewed Mar 15, 2026

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Show resolved Hide resolved

ggml/src/ggml-vulkan/ggml-vulkan.cpp Show resolved Hide resolved

ggml/src/ggml-vulkan/ggml-vulkan.cpp Show resolved Hide resolved

0cc4m mentioned this pull request Mar 16, 2026

Eval bug: garbage output for Qwen3.5-27B on Vulkan since b8184 #20610

Closed

assert that event is valid in event_wait instead of skipping if it isn't

5825d0b

0cc4m requested a review from a team as a code owner March 16, 2026 14:20

dpmm99 mentioned this pull request Mar 16, 2026

Eval bug: Qwen3.5 27B output incoherent when running on rx 7900 xtx + rtx 4080 super with vulkan backend #20651

Closed

jeffbolznv approved these changes Mar 17, 2026

View reviewed changes

loci-dev mentioned this pull request Mar 17, 2026

UPSTREAM PR #20518: vulkan: async and event fixes auroralabs-loci/llama.cpp#1261

Open

Revert "assert that event is valid in event_wait instead of skipping …

ec12541

…if it isn't" This reverts commit 5825d0b.

jeffbolznv approved these changes Mar 17, 2026

View reviewed changes

0cc4m merged commit 3a5cb62 into master Mar 17, 2026
49 checks passed

0cc4m deleted the 0cc4m/vulkan-async-fixes branch March 17, 2026 13:27

Conversation

0cc4m commented Mar 13, 2026

Uh oh!

0cc4m commented Mar 13, 2026

Uh oh!

jeffbolznv commented Mar 13, 2026

Uh oh!

0cc4m commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Mar 14, 2026

Uh oh!

jeffbolznv commented Mar 14, 2026

Uh oh!

0cc4m commented Mar 14, 2026

Uh oh!

jeffbolznv Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

0cc4m Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

HumerousGorgon commented Mar 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sinister-cat commented Mar 17, 2026

Uh oh!

0cc4m commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0cc4m commented Mar 13, 2026 •

edited

Loading