webgpu: Fix buffer overflow in BufferManager::Upload causing data corruption by qjia7 · Pull Request #27948 · microsoft/onnxruntime

qjia7 · 2026-04-02T08:58:33Z

Description

BufferManager::Upload() used NormalizeBufferSize() (16-byte alignment) to determine both the staging buffer size and the CopyBufferToBuffer copy size. When the actual data size was not a multiple of 16, the extra padding bytes in the staging buffer were uninitialized, and CopyBufferToBuffer would copy those garbage bytes into the destination GPU buffer beyond the intended range.

This caused data corruption when external code (e.g., onnxruntime-genai) uploaded partial data to a pre-allocated static GPU buffer using ORT's CopyTensors API. For example, uploading 24 bytes (3 x int64) of attention mask data would copy 32 bytes (rounded to 16), writing 8 garbage bytes at position 24-31 of the destination buffer, corrupting the 4th element.

This manifested as a 'device lost' crash in FlashAttention when running LLM inference with graph capture enabled and odd prompt lengths (e.g., 1 or 3 tokens), because the corrupted attention mask caused ReduceSum to produce wrong seqlen_k values, leading to out-of-bounds GPU memory access.

Fix:

Use NormalizeCopySize() (4-byte alignment, the WebGPU minimum for CopyBufferToBuffer) instead of NormalizeBufferSize() (16-byte alignment) for both the staging buffer allocation and the copy command.
Zero any padding bytes between actual size and copy size to prevent garbage from being written to the destination buffer.
Apply the same 4-byte alignment fix to MemCpy() for consistency.

… corruption BufferManager::Upload() used NormalizeBufferSize() (16-byte alignment) to determine both the staging buffer size and the CopyBufferToBuffer copy size. When the actual data size was not a multiple of 16, the extra padding bytes in the staging buffer were uninitialized, and CopyBufferToBuffer would copy those garbage bytes into the destination GPU buffer beyond the intended range. This caused data corruption when external code (e.g., onnxruntime-genai) uploaded partial data to a pre-allocated static GPU buffer using ORT's CopyTensors API. For example, uploading 24 bytes (3 x int64) of attention mask data would copy 32 bytes (rounded to 16), writing 8 garbage bytes at position 24-31 of the destination buffer, corrupting the 4th element. This manifested as a 'device lost' crash in FlashAttention when running LLM inference with graph capture enabled and odd prompt lengths (e.g., 1 or 3 tokens), because the corrupted attention mask caused ReduceSum to produce wrong seqlen_k values, leading to out-of-bounds GPU memory access. Fix: - Use NormalizeCopySize() (4-byte alignment, the WebGPU minimum for CopyBufferToBuffer) instead of NormalizeBufferSize() (16-byte alignment) for both the staging buffer allocation and the copy command. - Zero any padding bytes between actual size and copy size to prevent garbage from being written to the destination buffer. - Apply the same 4-byte alignment fix to MemCpy() for consistency.

The zero-padding of trailing bytes when copy_size > size does not prevent dirty data in the destination buffer, since the destination may already have non-zero values in those positions that get overwritten by the aligned CopyBufferToBuffer. Replace with a comment documenting the issue and noting that a CopyBuffer + compute shader approach could fix it.

qjia7 requested review from fs-eire and guschmue April 2, 2026 09:00

guschmue previously approved these changes Apr 6, 2026

View reviewed changes

Merge branch 'main' into fix/webgpu-upload-buffer-overflow

6fd326e

fs-eire reviewed Apr 7, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/webgpu/buffer_manager.cc Outdated

qjia7 dismissed guschmue’s stale review via f73de83 April 8, 2026 07:24

qjia7 requested review from fs-eire and guschmue April 8, 2026 07:26

guschmue enabled auto-merge (squash) April 8, 2026 15:22

guschmue approved these changes Apr 8, 2026

View reviewed changes

guschmue merged commit f7751fe into main Apr 8, 2026
100 of 102 checks passed

guschmue deleted the fix/webgpu-upload-buffer-overflow branch April 8, 2026 19:40

BrewTestBot mentioned this pull request May 8, 2026

onnxruntime 1.26.0 Homebrew/homebrew-core#281672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu: Fix buffer overflow in BufferManager::Upload causing data corruption#27948

webgpu: Fix buffer overflow in BufferManager::Upload causing data corruption#27948
guschmue merged 3 commits intomainfrom
fix/webgpu-upload-buffer-overflow

qjia7 commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qjia7 commented Apr 2, 2026

Description

Fix:

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants