Skip to content

Fix -Warray-bounds build error in MLAS on clang 17+#27499

Merged
tianleiwu merged 3 commits intomainfrom
copilot/fix-array-index-build-error
Mar 3, 2026
Merged

Fix -Warray-bounds build error in MLAS on clang 17+#27499
tianleiwu merged 3 commits intomainfrom
copilot/fix-array-index-build-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 28, 2026

Description

The -Warray-bounds suppression pragma in sqnbitgemm_kernel_avx2_int8_blklen32.h was gated on defined(HAS_ARRAY_BOUNDS), which is set in onnxruntime_config.h. MLAS never includes that header, so the guard was dead code and the pragma never fired.

Changed the guard to #ifdef __clang__:

// Before: HAS_ARRAY_BOUNDS never defined in MLAS TU
#if defined(__clang__) && defined(HAS_ARRAY_BOUNDS)

// After
#ifdef __clang__

Note: __has_warning("-Warray-bounds") was considered but the C preprocessor does not short-circuit &&, so GCC fails to parse it even behind defined(__clang__).

Motivation and Context

Build fails on Intel Mac with Apple Clang 17.0.0 (-Werror,-Warray-bounds). Clang raises a false-positive array-bounds warning on acc[4..7] inside an if constexpr (NCols4 == 8) branch that is dead when NCols4 == 4.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Build] error: array index 4 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]</issue_title>
<issue_description>### Describe the issue

Unable to build from main branch (0768f42 as of time writing this issue) on Intel Mac

/usr/bin/c++ --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: x86_64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Urgency

No response

Target platform

MacOS

Build script

./build.sh --config RelWithDebInfo --build_shared_lib --parallel --cmake_extra_defines CMAKE_OSX_ARCHITECTURES=x86_64

Error / output

[ 18%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2.cpp.o
In file included from /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2.cpp:26:
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:49: error: array index 4 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]
1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]);
| ^ ~
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here
1531 | __m256 acc[NCols4];
| ^
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:57: error: array index 5 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]
1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]);
| ^ ~
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here
1531 | __m256 acc[NCols4];
| ^
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:65: error: array index 6 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]
1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]);
| ^ ~
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here
1531 | __m256 acc[NCols4];
| ^
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:73: error: array index 7 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]
1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]);
| ^ ~
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here
1531 | __m256 acc[NCols4];
| ^
4 errors generated.

Visual Studio Version

No response

GCC / Compiler Version

Apple clang version 17.0.0 (clang-1700.0.13.5)</issue_description>

<agent_instructions>Please investigate the build error. If code need fix, create a pull requests. Otherwise, suggest ways to avoid the build errors.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

…y-bounds in MLAS

The HAS_ARRAY_BOUNDS macro is defined in onnxruntime_config.h which is never
included by the MLAS compilation unit. Replace with clang's built-in
__has_warning("-Warray-bounds") so the pragma actually takes effect and
suppresses the false positive array-bounds error on clang 17+.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix build error for array index out of bounds Fix -Warray-bounds build error in MLAS on clang 17+ Feb 28, 2026
@tianleiwu tianleiwu marked this pull request as ready for review February 28, 2026 23:31
@tianleiwu
Copy link
Copy Markdown
Contributor

@copilot, please fix build errors in CI pipeline like in gcc 14:

/onnxruntime_src/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1673:40: error: missing binary operator before token "("
 1673 | #if defined(__clang__) && __has_warning("-Warray-bounds")
      |                                        ^
/onnxruntime_src/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1678:40: error: missing binary operator before token "("
 1678 | #if defined(__clang__) && __has_warning("-Warray-bounds")
      |                          

…ocessor error

__has_warning is a clang-specific built-in that GCC doesn't recognize.
The C preprocessor doesn't short-circuit &&, so GCC fails to parse the
expression even with defined(__clang__) guarding it. Since -Warray-bounds
is universally supported by all clang versions, a simple #ifdef __clang__
is sufficient.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 28, 2026

@copilot, please fix build errors in CI pipeline like in gcc 14:

/onnxruntime_src/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1673:40: error: missing binary ...

Fixed in 2e038f4. __has_warning is a clang-only built-in and the C preprocessor doesn't short-circuit &&, so GCC fails to parse the expression. Simplified the guards to #ifdef __clang__ — since -Warray-bounds is supported by all clang versions, the __has_warning check was unnecessary.

@tianleiwu
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit c7c939f into main Mar 3, 2026
90 checks passed
@tianleiwu tianleiwu deleted the copilot/fix-array-index-build-error branch March 3, 2026 00:26
tianleiwu added a commit that referenced this pull request Mar 5, 2026
### Description

The `-Warray-bounds` suppression pragma in
`sqnbitgemm_kernel_avx2_int8_blklen32.h` was gated on
`defined(HAS_ARRAY_BOUNDS)`, which is set in `onnxruntime_config.h`.
MLAS never includes that header, so the guard was dead code and the
pragma never fired.

Changed the guard to `#ifdef __clang__`:

```cpp
// Before: HAS_ARRAY_BOUNDS never defined in MLAS TU
#if defined(__clang__) && defined(HAS_ARRAY_BOUNDS)

// After
#ifdef __clang__
```

Note: `__has_warning("-Warray-bounds")` was considered but the C
preprocessor does not short-circuit `&&`, so GCC fails to parse it even
behind `defined(__clang__)`.

### Motivation and Context

Build fails on Intel Mac with Apple Clang 17.0.0
(`-Werror,-Warray-bounds`). Clang raises a false-positive array-bounds
warning on `acc[4..7]` inside an `if constexpr (NCols4 == 8)` branch
that is dead when `NCols4 == 4`.

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>[Build] error: array index 4 is past the end of the array
(that has type '__m256[4]') [-Werror,-Warray-bounds]</issue_title>
> <issue_description>### Describe the issue
> 
> Unable to build from main branch
(0768f42 as of time writing this issue)
on Intel Mac
> 
> ```
> /usr/bin/c++ --version
> Apple clang version 17.0.0 (clang-1700.0.13.5)
> Target: x86_64-apple-darwin24.5.0
> Thread model: posix
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
> ```
> 
> 
> ### Urgency
> 
> _No response_
> 
> ### Target platform
> 
> MacOS
> 
> ### Build script
> 
> ./build.sh --config RelWithDebInfo --build_shared_lib --parallel
--cmake_extra_defines CMAKE_OSX_ARCHITECTURES=x86_64
> 
> ### Error / output
> 
> [ 18%] Building CXX object
CMakeFiles/onnxruntime_mlas.dir/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2.cpp.o
> In file included from
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2.cpp:26:
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:49:
error: array index 4 is past the end of the array (that has type
'__m256[4]') [-Werror,-Warray-bounds]
> 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6],
acc[7]);
>       |                                                 ^   ~
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13:
note: array 'acc' declared here
>  1531 |             __m256 acc[NCols4];
>       |             ^
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:57:
error: array index 5 is past the end of the array (that has type
'__m256[4]') [-Werror,-Warray-bounds]
> 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6],
acc[7]);
>       |                                                         ^   ~
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13:
note: array 'acc' declared here
>  1531 |             __m256 acc[NCols4];
>       |             ^
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:65:
error: array index 6 is past the end of the array (that has type
'__m256[4]') [-Werror,-Warray-bounds]
> 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6],
acc[7]);
> | ^ ~
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13:
note: array 'acc' declared here
>  1531 |             __m256 acc[NCols4];
>       |             ^
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:73:
error: array index 7 is past the end of the array (that has type
'__m256[4]') [-Werror,-Warray-bounds]
> 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6],
acc[7]);
> | ^ ~
>
/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13:
note: array 'acc' declared here
>  1531 |             __m256 acc[NCols4];
>       |             ^
> 4 errors generated.
> 
> ### Visual Studio Version
> 
> _No response_
> 
> ### GCC / Compiler Version
> 
> Apple clang version 17.0.0 (clang-1700.0.13.5)</issue_description>
> 
> <agent_instructions>Please investigate the build error. If code need
fix, create a pull requests. Otherwise, suggest ways to avoid the build
errors.</agent_instructions>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> </comments>
> 


</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes #27497

<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent
pull requests. You can protect all pull requests by enabling Advanced
Security for your repositories. [Learn more about Advanced
Security.](https://gh.io/cca-advanced-security)

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
tianleiwu added a commit that referenced this pull request Mar 5, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| d5387d8 | #27192 | Avoid repetitive creation of fp4/fp8
native-custom-op domains for NvTensorRtRtx EP |
| 0b9906a | #27454 | Suppress spurious Array Out of Bounds warnings
produced by GCC 14.2 compiler on Linux builds |
| 4a80b0b | #27471 | Fix double-free in TRT EP custom op domain
Release functions |
| c7c939f | #27499 | Fix -Warray-bounds build error in MLAS on clang
17+ |
| f99dcca | #27514 | [Build] Fix pybind11 vcpkg configuration |
| ef04b10 | #27518 | [CXX Lora] Prevent heap OOB from maliciously
crafted Lora Adapters. |
| 0b2b6d0 | #27288 | [NvTensorRTRTX EP]: Add missing override
specifiers to suppress warnings |
| c1d8f5c | #27522 | Add "library_path" metadata entry to OrtEpDevice
instances for plugin and provider bridge EPs |
| fdead1c | #27537 | Account for ORT_NO_EXCEPTIONS builds in Lora
test |
| 3d1365e | #27521 | increase kMaxValueLength to 8192 |
| df8f4a7 | #27535 | Add OrtEnv.DisableDllImportResolver to prevent
fatal error on resolver conflict |
| bdd672a | #27356 | Add/Update telemetry events |
| 2da1a30 | #27543 | Fix RoiAlign heap out-of-bounds read via
unchecked batch_indices |
| 5c3f544 | #27466 | DQ→MatMulNBits fusion transformer for
NvTensorRtRtx ep |

---------

Co-authored-by: vishalpandya1990 <vipandya@nvidia.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Stephan Seitz <sseitz@nvidia.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Darshak Bhatti <47045043+dabhattimsft@users.noreply.github.com>
Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: anujj <ajalota@nvidia.com>
Co-authored-by: praneshgo <227579474+praneshgo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Build] error: array index 4 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]

3 participants