Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses an integer overflow risk in the CPU Attention kernel’s FP16 softmax path by switching the softmax dimension parameters to size_t and using SafeInt for element/byte-count computations, and adds a regression test to exercise the overflow scenario.
Changes:
- Update
ComputeAttentionSoftmaxInplace<MLFloat16>to usesize_tforN/DandSafeIntforN*D-derived allocation/conversion sizes. - Add a new large-dimension CPU Attention regression test intended to catch the prior overflow behavior.
- Add a small test utility (
GetTotalPhysicalMemoryBytes) used to skip the new test on low-memory machines.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/llm/attention.cc | Fix overflow-prone N*D sizing in FP16 softmax temp buffer allocation by using size_t + SafeInt. |
| onnxruntime/test/providers/cpu/llm/attention_op_test.cc | Add regression test covering large q_seq_len * kv_seq_len dimensions (and a RAM-based skip). |
| onnxruntime/test/util/include/system_info.h | Declare test utility for querying total physical RAM. |
| onnxruntime/test/util/system_info.cc | Implement total physical RAM query via WinAPI or sysconf. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Extract ComputeAttentionSoftmaxInplace into attention_softmax.h, changing parameters from int to size_t and using SafeInt for the N*D multiplication. Previously, N*D could overflow int32 when q_sequence_length * total_sequence_length > INT_MAX, causing an undersized buffer allocation. Replace the old Attention_FP16_SoftmaxLargeDimensions operator-level test in attention_op_test.cc with a direct regression test in the new attention_softmax_test.cc. The new test uses a custom allocator to verify the correct allocation size without needing the ~8GB buffer, and handles both 64-bit (verifies size) and 32-bit (SafeInt overflow) builds. Remove system_info.h/.cc (no longer needed).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
Fix
intoverflow issue inComputeAttentionSoftmaxInplace<MLFloat16>()by usingsize_tandSafeIntinstead.Motivation and Context
Fix overflow issue.