Fix CPU Attention overflow issue by edgchen1 · Pull Request #27822 · microsoft/onnxruntime

edgchen1 · 2026-03-24T01:54:04Z

Description

Fix int overflow issue in ComputeAttentionSoftmaxInplace<MLFloat16>() by using size_t and SafeInt instead.

Motivation and Context

Fix overflow issue.

Copilot

Pull request overview

This PR addresses an integer overflow risk in the CPU Attention kernel’s FP16 softmax path by switching the softmax dimension parameters to size_t and using SafeInt for element/byte-count computations, and adds a regression test to exercise the overflow scenario.

Changes:

Update ComputeAttentionSoftmaxInplace<MLFloat16> to use size_t for N/D and SafeInt for N*D-derived allocation/conversion sizes.
Add a new large-dimension CPU Attention regression test intended to catch the prior overflow behavior.
Add a small test utility (GetTotalPhysicalMemoryBytes) used to skip the new test on low-memory machines.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
onnxruntime/core/providers/cpu/llm/attention.cc	Fix overflow-prone `N*D` sizing in FP16 softmax temp buffer allocation by using `size_t` + `SafeInt`.
onnxruntime/test/providers/cpu/llm/attention_op_test.cc	Add regression test covering large `q_seq_len * kv_seq_len` dimensions (and a RAM-based skip).
onnxruntime/test/util/include/system_info.h	Declare test utility for querying total physical RAM.
onnxruntime/test/util/system_info.cc	Implement total physical RAM query via WinAPI or `sysconf`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T18:47:29Z

@edgchen1 I've opened a new pull request, #27829, to work on those changes. Once the pull request is ready, I'll request review from you.

Extract ComputeAttentionSoftmaxInplace into attention_softmax.h, changing parameters from int to size_t and using SafeInt for the N*D multiplication. Previously, N*D could overflow int32 when q_sequence_length * total_sequence_length > INT_MAX, causing an undersized buffer allocation. Replace the old Attention_FP16_SoftmaxLargeDimensions operator-level test in attention_op_test.cc with a direct regression test in the new attention_softmax_test.cc. The new test uses a custom allocator to verify the correct allocation size without needing the ~8GB buffer, and handles both 64-bit (verifies size) and 32-bit (SafeInt overflow) builds. Remove system_info.h/.cc (no longer needed).

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

titaiwangms

Thanks!

edgchen1 added 7 commits March 23, 2026 18:04

add GetTotalPhysicalMemoryBytes helper

c9f7190

add test

b72abd5

fix overflow issue

1769061

lint

20332e8

update comment

d30ee59

increase tolerance for test output

70cd5b1

expect test overflow in 32-bit builds, update test name and comment

605328f

edgchen1 requested a review from Copilot March 24, 2026 18:37

Copilot started reviewing on behalf of edgchen1 March 24, 2026 18:38 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/llm/attention_op_test.cc Outdated

Comment thread onnxruntime/test/util/system_info.cc Outdated

Comment thread onnxruntime/test/providers/cpu/llm/attention_op_test.cc Outdated

Copilot AI mentioned this pull request Mar 24, 2026

Refactor FP16 softmax buffer size into testable helper; replace expensive regression test #27829

Closed

edgchen1 requested a review from Copilot March 24, 2026 20:53

Copilot started reviewing on behalf of edgchen1 March 24, 2026 20:54 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/llm/attention_softmax.h

Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc

edgchen1 and others added 2 commits March 24, 2026 14:07

Apply suggestion from @Copilot

8c69b10

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

propagate safeint

a168897

edgchen1 requested a review from Copilot March 24, 2026 21:11

Copilot started reviewing on behalf of edgchen1 March 24, 2026 21:12 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc Outdated

edgchen1 added 2 commits March 24, 2026 14:58

Add float16 header.

33cc1ea

improve size_t overflow check

136a6e2

edgchen1 requested a review from Copilot March 24, 2026 22:34

Copilot started reviewing on behalf of edgchen1 March 24, 2026 22:36 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc

Comment thread onnxruntime/core/providers/cpu/llm/attention_softmax.h

Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc

update comments

9923ec0

edgchen1 marked this pull request as ready for review March 25, 2026 00:32

edgchen1 requested a review from Copilot March 25, 2026 00:32

Copilot started reviewing on behalf of edgchen1 March 25, 2026 00:34 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

titaiwangms requested review from titaiwangms and xadupre March 30, 2026 21:06

titaiwangms approved these changes Mar 30, 2026

View reviewed changes

edgchen1 enabled auto-merge (squash) April 2, 2026 16:05

edgchen1 merged commit 2bf09e9 into main Apr 2, 2026
114 of 125 checks passed

edgchen1 deleted the edgchen1/fix_attention_softmax_overflow branch April 2, 2026 18:11

BrewTestBot mentioned this pull request Apr 20, 2026

onnxruntime 1.25.0 Homebrew/homebrew-core#278543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CPU Attention overflow issue#27822

Fix CPU Attention overflow issue#27822
edgchen1 merged 13 commits intomainfrom
edgchen1/fix_attention_softmax_overflow

edgchen1 commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

titaiwangms left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

edgchen1 commented Mar 24, 2026

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

titaiwangms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants