Refactor FP16 softmax buffer size into testable helper; replace expensive regression test by Copilot · Pull Request #27829 · microsoft/onnxruntime

Copilot · 2026-03-24T18:47:27Z

Description

Addresses reviewer feedback on #27822: the regression test for the FP16 softmax integer overflow was allocating multi-GB buffers and running a ~2.1B-element attention computation, making it unsuitable for CI.

attention.h — extract detail::Fp16SoftmaxTempBufferBytes:

New inline helper centralizes the SafeInt<size_t>(n) * d * sizeof(float) computation (the exact fix for the overflow bug) in a single, directly testable unit

attention.cc — ComputeAttentionSoftmaxInplace<MLFloat16>:

Calls detail::Fp16SoftmaxTempBufferBytes(N, D) for the allocation size instead of an inline SafeInt expression

attention_op_test.cc — replace AttentionCpuFp16SoftmaxLargeDimensions:

New test AttentionCpuFp16SoftmaxBufferSizeNoOverflow calls the helper directly with N=D=46341 (same dimensions that triggered the original int32 overflow) and asserts the result equals N * D * sizeof(float) — no tensor allocation, no op execution, runs in microseconds
Removes the 16 GB RAM guard and system_info.h dependency
On 32-bit builds (sizeof(void*) < 8), asserts that SafeInt correctly throws on size_t overflow

Motivation and Context

The original regression test exercised correctness of the full attention path at huge dimensions, but the real bug was a single integer overflow in the buffer size calculation. Isolating that calculation into a helper makes the regression test fast, reliable, and directly tied to the fixed code.

Original PR: #27822

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9c8c83e9-86ca-4f72-bc68-d5f8d113160d

…etection comment Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9c8c83e9-86ca-4f72-bc68-d5f8d113160d

Initial plan

3c78b02

Copilot AI assigned Copilot and edgchen1 Mar 24, 2026

Copilot AI mentioned this pull request Mar 24, 2026

Fix CPU Attention overflow issue #27822

Merged

Copilot started work on behalf of edgchen1 March 24, 2026 18:47 View session

Copilot AI and others added 2 commits March 24, 2026 18:54

Refactor FP16 softmax buffer size into lightweight testable helper

1854774

Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com> Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/9c8c83e9-86ca-4f72-bc68-d5f8d113160d

Copilot AI changed the title ~~[WIP] [WIP] Address feedback on fix CPU attention overflow issue PR~~ Refactor FP16 softmax buffer size into testable helper; replace expensive regression test Mar 24, 2026

Copilot AI requested a review from edgchen1 March 24, 2026 19:00

Copilot finished work on behalf of edgchen1 March 24, 2026 19:00

edgchen1 closed this Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor FP16 softmax buffer size into testable helper; replace expensive regression test#27829

Refactor FP16 softmax buffer size into testable helper; replace expensive regression test#27829
Copilot wants to merge 3 commits intoedgchen1/fix_attention_softmax_overflowfrom
copilot/sub-pr-27822

Copilot AI commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 24, 2026 •

edited

Loading