[MLAS] Add kleidiai pad ptr invalidation test case by JonathanC-ARM · Pull Request #27465 · microsoft/onnxruntime

JonathanC-ARM · 2026-02-26T15:56:25Z

Description

This pr introduces some minor code changes which do the following:

Fix copilot header include suggestions from [MLAS] ukernel interface logging changes #27439
Add testcase which covers code path fixed via Fix Conv LHS packing padding/uninitialized ptrs V2 #27215 and test case discussed in Fix Conv LHS packing padding/uninitialized ptrs #27214
Change pointer invalidation to cover only updated pointer in pad structure

Testing

This patch was tested in two ways.

After creating tests which I thought would trigger a previous failure case I reverted the convolve_kleidiai.cpp code to before the initial fix in Hari's change for pad ptr was introduced. Added debug logging and tested for failures to highlight the moving and invalidation of pointer. Example failure below
I reintroduced the current code and then tested multiple times

for i in $(seq 1 2000); do echo "ITER=$i"; ./onnxruntime_mlas_test --long --gtest_filter='*Conv2d*' || break; done

Explanation of Subsequent logs

Padding buffer relocation

KLEIDIAI_CONV_LHS pad_buf MOVED ci=320 padsize=512 old=0x12e80d800 new=0x12e81ac00
Meaning: the internal zero padding buffer used for out-of-bounds pixels was resized and the underlying storage address changed (old → new). Any previously-built indirection table entries that pointed at the old padding buffer are now stale.

Reuse of cached indirection table after the move

KLEIDIAI_CONV_LHS indirection_cache HIT ci=64 m=121 **pad=0x12e81ac00 old_pad=0x12e80d800 (after_pad_move)**
Meaning: for a later convolution (ci=64) the indirection-table cache returned a HIT. The log prints the current pad buffer address (pad=...) and the most recent prior padding-buffer address (old_pad=...) captured during the move. The (after_pad_move) tag indicates that this cache HIT occurred after a pad-buffer relocation event, which is the dangerous case in the pre-fix implementation (cached tables may still contain pointers to old_pad).

In failing runs, the output mismatch occurs immediately after this sequence, showing a clear correlation: pad buffer moved → cached indirection table reused → incorrect results.

one note for the test is I commented out most of the rest of the fixture in the changed file before running for time constraints on the 2000 runs

jonclo01$ ./onnxruntime_mlas_test --long --gtest_filter='*Conv2d*' clear
-------------------------------------------------------
----Total 3066 tests registered programmably!
-------------------------------------------------------
Note: Google Test filter = *Conv2d*
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Conv2d_SingleThread
[ RUN      ] Conv2d_SingleThread.LongExecute
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 496 : KLEIDIAI_CONV_LHS pad_buf ci=64 padsize=256 addr=0x12e80d800
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 543 : KLEIDIAI_CONV_LHS indirection_cache MISS ci=64 m=121 pad=0x12e80d800
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 325 : kai_run_lhs_imatmul_pack_x32p2vlx1_x32p_sme M=121 k_chunk_count=9 k_chunk_length=64
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 376 : kai_run_rhs_imatmul_pack_kxn_x32p2vlx1b_x32_x32_sme N=32 k_chunk_count=9 k_chunk_length=64 rhs_stride_row=128
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 653 : kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa M=121 N=32 k_chunk_count=9 k_chunk_length=64
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp : 349 : kai_run_rhs_pack_kxn_f32p2vlx1biasf32_f32_f32_sme Groups=1 N=121 K=576 nr=32 kr=1 sr=1 rhs_stride_row=484
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 490 : KLEIDIAI_CONV_LHS **pad_buf MOVED ci=320 padsize=512 old=0x12e80d800 new=0x12e81ac00**
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 543 : KLEIDIAI_CONV_LHS indirection_cache MISS ci=320 m=121 pad=0x12e81ac00
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 325 : kai_run_lhs_imatmul_pack_x32p2vlx1_x32p_sme M=121 k_chunk_count=9 k_chunk_length=320
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 376 : kai_run_rhs_imatmul_pack_kxn_x32p2vlx1b_x32_x32_sme N=32 k_chunk_count=9 k_chunk_length=320 rhs_stride_row=128
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 653 : kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa M=121 N=32 k_chunk_count=9 k_chunk_length=320
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp : 349 : kai_run_rhs_pack_kxn_f32p2vlx1biasf32_f32_f32_sme Groups=1 N=121 K=2880 nr=32 kr=1 sr=1 rhs_stride_row=484
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 535 : KLEIDIAI_CONV_LHS indirection_cache HIT ci=64 m=121 **pad=0x12e81ac00 old_pad=0x12e80d800 (after_pad_move)**
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 325 : kai_run_lhs_imatmul_pack_x32p2vlx1_x32p_sme M=121 k_chunk_count=9 k_chunk_length=64
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp : 653 : kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa M=121 N=32 k_chunk_count=9 k_chunk_length=64
[KLEIDIAI KERNEL]: /Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp : 349 : kai_run_rhs_pack_kxn_f32p2vlx1biasf32_f32_f32_sme Groups=1 N=121 K=576 nr=32 kr=1 sr=1 rhs_stride_row=484
/Users/jonclo01/kfi-devenv/repos/onnxruntime/onnxruntime/test/mlas/unittest/test_conv2d.h:249: Failure
Expected equality of these values:
  memcmp(Output, OutputReference, OutputElements * sizeof(float))
    Which is: 90
  0
B1/G1/Cpg64/Fpg32/H11/W11/KH3/KW3/Pad1,1,1,1/Dilation1,1/Stride1,1
Stack trace:
  0x10247ba34: MlasConv2DTest<>::ExecuteLong()
  0x102651904: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x1026517a4: testing::Test::Run()
  0x102652b5c: testing::TestInfo::Run()
  0x102653c84: testing::TestSuite::Run()
... Google Test internal frames ...

[  FAILED  ] Conv2d_SingleThread.LongExecute, where GetParam() = LongExecute (10 ms)

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>

hariharans29 · 2026-02-26T18:53:14Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-02-26T18:53:35Z

Azure Pipelines successfully started running 4 pipeline(s).

Copilot

Pull request overview

This PR introduces minor improvements to the MLAS KleidiAI implementation, addressing include suggestions from PR #27439 and adding a regression test for the padding buffer pointer invalidation issue fixed in PR #27215.

Changes:

Added missing standard library header includes (<vector>, <cstring>, <cstddef>, <array>) to kleidiai implementation files
Improved cache invalidation strategy to only clear entries associated with the old pad buffer address rather than clearing the entire cache
Added regression test case that exercises pad buffer growth scenario with convolutions of varying channel sizes

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
onnxruntime/test/mlas/unittest/test_conv2d.h	Added regression test that forces pad buffer growth by running convolutions with different channel sizes to verify cache invalidation works correctly
onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp	Added missing standard library includes (`<vector>`, `<cstring>`, `<cstddef>`) used throughout the file
onnxruntime/core/mlas/lib/kleidiai/qgemm_kleidiai.cpp	Added missing standard library includes (`<array>`, `<cstddef>`, `<vector>`) used throughout the file
onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp	Added missing includes and improved cache invalidation to only remove old pad buffer's cache entries rather than clearing all caches, reducing unnecessary cache misses

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

JonathanC-ARM added 2 commits February 26, 2026 09:33

Add hash pad buffer address in LHS cache key & add regression test

2ac49f4

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>

Fix Lintrunner issues,includes, and invalidate single cache entries only

e47605f

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>

hariharans29 requested a review from Copilot February 26, 2026 18:57

Copilot started reviewing on behalf of hariharans29 February 26, 2026 18:59 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

hariharans29 approved these changes Feb 26, 2026

View reviewed changes

hariharans29 enabled auto-merge (squash) February 26, 2026 19:30

hariharans29 merged commit d1d596c into microsoft:main Feb 26, 2026
97 of 105 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLAS] Add kleidiai pad ptr invalidation test case#27465

[MLAS] Add kleidiai pad ptr invalidation test case#27465
hariharans29 merged 2 commits intomicrosoft:mainfrom
JonathanC-ARM:jonclo01/convolve_pad_ptr_invalidation

JonathanC-ARM commented Feb 26, 2026

Uh oh!

hariharans29 commented Feb 26, 2026

Uh oh!

azure-pipelines bot commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JonathanC-ARM commented Feb 26, 2026

Description

Testing

Explanation of Subsequent logs

Uh oh!

hariharans29 commented Feb 26, 2026

Uh oh!

azure-pipelines bot commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants