Conversation
|
I'm sure about the event reset asynchronously before setting it. It does seem to work, but maybe we need multiple events instead, to avoid reuse until a full queue synchronization happens. |
|
I don't think the reset right before the set will be safe, this was one of the things that made vkevents very hard to use. |
|
Probably not, yeah. I'll take another look tomorrow and figure out something better, I guess we need multiple events. I thought about using the timeline semaphore for synchronization within the queue as well, but I think that would be heavier than events and isn't really what they are meant for. |
|
@jeffbolznv I got something that should work, now. I'm not sure if it could be done in a simpler way, but this was the best I found that still actually reuses without a manual synchronization step anywhere. Without reuse we might get into trouble with too many events during loading, similar to command buffers. |
|
The recycling of events seems to assume that they will only be waited on once. Is that a valid assumption? |
|
No, they do get waited on multiple times. But the assumption is that after a new event is recorded, the previous one cannot be waited on any longer. So when the new event gets synchronized (cpu-waited), all event waits of the previous one are also done, because they were submitted into the queue before the new event got set. |
ggml/src/ggml-vulkan/ggml-vulkan.cpp
Outdated
| vkev->events_submitted.insert(vkev->events_submitted.end(), vkev->events_pending.begin(), vkev->events_pending.end()); | ||
| vkev->events_pending.clear(); | ||
| // Move existing event into pending | ||
| vkev->events_pending.push_back(vkev->event); |
There was a problem hiding this comment.
I'm not fully following the logic here. The rest of this function will get an event and submit it. Why doesn't that immediately go into events_submitted? And can there ever be more than one pending event?
There was a problem hiding this comment.
Pending and submitted are meant in relation to wait commands in the queue, not set commands. So the flow is like this, for example:
Event 1 recorded
Waits for event 1
Event 2 recorded, event 1 goes into pending. It is no longer possible to wait for event 1
Waits for event 2
Event 3 recorded, event 2 goes into pending. Event 1 goes into submitted.
Waits for event 3
Synchronize event 3, event 2 and 1 can now be reused because all waits were before event 3.
or
Event 1 recorded
Waits for event 1
Synchronize event 1, wait commands for event 1 may still be in the queue, so no reuse yet
Event 2 recorded, event 1 moves into pending
Waits for event 2
Synchronize event 2, event 1 can now be reused
But I think you're right. The pending stage isn't needed. A synchronization means the queue has reached the set command of the event being synchronized, so all waits for previous events must also be done.
|
Hi @0cc4m, I can confirm that this PR definitely fixes the problems I'm having with inference on Intel GPUS. |
|
Tried merging this to master locally, everything builds fine but the assert throws during warmup the phase with qwen3.5? I was able to use the exact same command on master with no immediate issues. Commenting the assert causes a segfault, --no-warmup also causes segfault... any ideas on why this could be? |
|
@jeffbolznv I guess that answers the "valid use" question. I'll revert the assertion. |
…if it isn't" This reverts commit 5825d0b.
I noticed incoherence with my multi-GPU setup as well when investigating issues like #20462. I found that they can be fixed by disabling
cpy_tensor_async, so the problem is with the async path. I narrowed it down to these problems:event_waitfunction didn't do anythingevent_waitwas not working. I fixed it by doing the event reset in the queue instead of outside of it, that way it happens immediately before the next set. command. To avoid the same issue with the fence, I replaced it with a timeline semaphore, which does not need manual resets by the host.This may help with #20462, #20029, #20517 and maybe some more.