vulkan: add UMA zero-copy async transfers and fix event_record deferred memcpy handling by neilopet · Pull Request #20018 · ggml-org/llama.cpp

neilopet · 2026-03-01T20:10:20Z

Problem:

On UMA, host-visible mapped buffers were still using staging in async transfer paths.
When the UMA zero-copy async path is used, deferred memcpy queues can be pending at event_record; without drain/preserve across context reset, copies can be dropped.

Changes:

Add UMA zero-copy async fast paths for host-visible mapped buffers in ggml_vk_buffer_write_2d_async and ggml_vk_buffer_read_2d_async.
In ggml_backend_vk_event_record, drain pending in_memcpys before submit and preserve/restore out_memcpys across compute-context reset.
Add regression tests in tests/test-vulkan-uma.cpp for round-trip, multiple event-record sequences, empty-queue path, and out-memcpy persistence.

Validation:

test-vulkan-uma: 4/4 pass.
test-backend-ops: baseline-matched RDNA f16 DIV failures only.
llama-bench local UMA:
- Qwen3.5-27B-Q4_K_M: pp512=152.75, tg128=11.23
- Fine-tuned private Qwen3-Coder-Next-Q4_K_M: pp512=460.61, tg128=42.69

Scope / consistency:

This builds on vulkan: improve partial offloading performance on AMD #19976 transfer-context changes and only adds deferred memcpy lifecycle handling for UMA zero-copy paths.
Preserves existing event lifecycle structure (ctx_end -> submit -> reset/recreate), adding only deferred-copy correctness handling.
UMA zero-copy path remains gated by (device->uma && HOST_VISIBLE).
No shader changes.
No changes to non-UMA transfer behavior.

Related: #16059, #18302, #18047

AI Disclosure: AI tools were used for review and test planning.

Skip staging buffers for HOST_VISIBLE mapped buffers on UMA devices by routing through deferred_memcpy in write_2d_async and read_2d_async. Fix event_record to drain in_memcpys before submit and preserve out_memcpys across context reset, preventing data loss during the set_tensor_async -> event_record -> synchronize model-loading pattern.

Four tests covering the UMA zero-copy deferred memcpy lifecycle: round-trip integrity, repeated event records, empty queues, and out_memcpy persistence through event_record context reset.

jeffbolznv

A change like this needs more clear before/after perf results.

jeffbolznv · 2026-03-02T03:26:10Z

        return true;
    }
+
+    // UMA zero-copy: destination is directly mapped, skip staging buffer


I don't think this is correct. When called from set_tensor_async, I think the operation needs to be ordered against other work submitted to the same backend. When called from set_tensor, we already have a CPU copy path so this would be redundant.

jeffbolznv · 2026-03-02T03:29:55Z

@@ -0,0 +1,343 @@
+// Regression tests for UMA zero-copy async buffer transfers.


FWIW, I do think we need more thorough tests of the set/get paths and ordering guarantees of the backend interfaces. But if we have 300 lines of AI-generated tests per PR nobody will be able to maintain it.

neilopet added 2 commits March 1, 2026 14:53

vulkan: add regression tests for UMA async event persistence

11c1b06

Four tests covering the UMA zero-copy deferred memcpy lifecycle: round-trip integrity, repeated event records, empty queues, and out_memcpy persistence through event_record context reset.

github-actions Bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 1, 2026

neilopet changed the title ~~vulkan: add UMA zero-copy async path and fix event_record deferred memcpy handling~~ vulkan: add UMA zero-copy async transfers and fix event_record deferred memcpy handling Mar 1, 2026

neilopet marked this pull request as ready for review March 1, 2026 20:16

neilopet requested review from 0cc4m and ggerganov as code owners March 1, 2026 20:16

jeffbolznv requested changes Mar 2, 2026

View reviewed changes

jeffbolznv mentioned this pull request Apr 29, 2026

Optimize Vulkan buffer transfers on UMA (Unified Memory Architecture) devices #22462

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: add UMA zero-copy async transfers and fix event_record deferred memcpy handling#20018

vulkan: add UMA zero-copy async transfers and fix event_record deferred memcpy handling#20018
neilopet wants to merge 2 commits into
ggml-org:masterfrom
neilopet:vulkan-uma-zero-copy

neilopet commented Mar 1, 2026 •

edited

Loading

Uh oh!

jeffbolznv left a comment

Uh oh!

jeffbolznv Mar 2, 2026

Uh oh!

jeffbolznv Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,343 @@
		// Regression tests for UMA zero-copy async buffer transfers.

Conversation

neilopet commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv left a comment

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neilopet commented Mar 1, 2026 •

edited

Loading