Skip to content

Fix CPU Attention overflow issue#27822

Merged
edgchen1 merged 13 commits intomainfrom
edgchen1/fix_attention_softmax_overflow
Apr 2, 2026
Merged

Fix CPU Attention overflow issue#27822
edgchen1 merged 13 commits intomainfrom
edgchen1/fix_attention_softmax_overflow

Conversation

@edgchen1
Copy link
Copy Markdown
Contributor

Description

Fix int overflow issue in ComputeAttentionSoftmaxInplace<MLFloat16>() by using size_t and SafeInt instead.

Motivation and Context

Fix overflow issue.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an integer overflow risk in the CPU Attention kernel’s FP16 softmax path by switching the softmax dimension parameters to size_t and using SafeInt for element/byte-count computations, and adds a regression test to exercise the overflow scenario.

Changes:

  • Update ComputeAttentionSoftmaxInplace<MLFloat16> to use size_t for N/D and SafeInt for N*D-derived allocation/conversion sizes.
  • Add a new large-dimension CPU Attention regression test intended to catch the prior overflow behavior.
  • Add a small test utility (GetTotalPhysicalMemoryBytes) used to skip the new test on low-memory machines.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
onnxruntime/core/providers/cpu/llm/attention.cc Fix overflow-prone N*D sizing in FP16 softmax temp buffer allocation by using size_t + SafeInt.
onnxruntime/test/providers/cpu/llm/attention_op_test.cc Add regression test covering large q_seq_len * kv_seq_len dimensions (and a RAM-based skip).
onnxruntime/test/util/include/system_info.h Declare test utility for querying total physical RAM.
onnxruntime/test/util/system_info.cc Implement total physical RAM query via WinAPI or sysconf.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cpu/llm/attention_op_test.cc Outdated
Comment thread onnxruntime/test/util/system_info.cc Outdated
Comment thread onnxruntime/test/providers/cpu/llm/attention_op_test.cc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 24, 2026

@edgchen1 I've opened a new pull request, #27829, to work on those changes. Once the pull request is ready, I'll request review from you.

Extract ComputeAttentionSoftmaxInplace into attention_softmax.h, changing
parameters from int to size_t and using SafeInt for the N*D multiplication.
Previously, N*D could overflow int32 when q_sequence_length *
total_sequence_length > INT_MAX, causing an undersized buffer allocation.

Replace the old Attention_FP16_SoftmaxLargeDimensions operator-level test
in attention_op_test.cc with a direct regression test in the new
attention_softmax_test.cc. The new test uses a custom allocator to verify
the correct allocation size without needing the ~8GB buffer, and handles
both 64-bit (verifies size) and 32-bit (SafeInt overflow) builds.

Remove system_info.h/.cc (no longer needed).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/llm/attention_softmax.h
Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc
edgchen1 and others added 2 commits March 24, 2026 14:07
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc
Comment thread onnxruntime/core/providers/cpu/llm/attention_softmax.h
Comment thread onnxruntime/test/providers/cpu/llm/attention_softmax_test.cc
@edgchen1 edgchen1 marked this pull request as ready for review March 25, 2026 00:32
@edgchen1 edgchen1 requested a review from Copilot March 25, 2026 00:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@titaiwangms titaiwangms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants