Skip to content

Refactor FP16 softmax buffer size into testable helper; replace expensive regression test#27829

Closed
Copilot wants to merge 3 commits intoedgchen1/fix_attention_softmax_overflowfrom
copilot/sub-pr-27822
Closed

Refactor FP16 softmax buffer size into testable helper; replace expensive regression test#27829
Copilot wants to merge 3 commits intoedgchen1/fix_attention_softmax_overflowfrom
copilot/sub-pr-27822

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 24, 2026

Description

Addresses reviewer feedback on #27822: the regression test for the FP16 softmax integer overflow was allocating multi-GB buffers and running a ~2.1B-element attention computation, making it unsuitable for CI.

attention.h — extract detail::Fp16SoftmaxTempBufferBytes:

  • New inline helper centralizes the SafeInt<size_t>(n) * d * sizeof(float) computation (the exact fix for the overflow bug) in a single, directly testable unit

attention.ccComputeAttentionSoftmaxInplace<MLFloat16>:

  • Calls detail::Fp16SoftmaxTempBufferBytes(N, D) for the allocation size instead of an inline SafeInt expression

attention_op_test.cc — replace AttentionCpuFp16SoftmaxLargeDimensions:

  • New test AttentionCpuFp16SoftmaxBufferSizeNoOverflow calls the helper directly with N=D=46341 (same dimensions that triggered the original int32 overflow) and asserts the result equals N * D * sizeof(float) — no tensor allocation, no op execution, runs in microseconds
  • Removes the 16 GB RAM guard and system_info.h dependency
  • On 32-bit builds (sizeof(void*) < 8), asserts that SafeInt correctly throws on size_t overflow

Motivation and Context

The original regression test exercised correctness of the full attention path at huge dimensions, but the real bug was a single integer overflow in the buffer size calculation. Isolating that calculation into a helper makes the regression test fast, reliable, and directly tied to the fixed code.

Original PR: #27822


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits March 24, 2026 18:54
Copilot AI changed the title [WIP] [WIP] Address feedback on fix CPU attention overflow issue PR Refactor FP16 softmax buffer size into testable helper; replace expensive regression test Mar 24, 2026
Copilot AI requested a review from edgchen1 March 24, 2026 19:00
@edgchen1 edgchen1 closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants