Skip to content

Conversation

@vraspar
Copy link
Contributor

@vraspar vraspar commented Dec 1, 2025

Description

This PR introduces a new experimental lookup-table(LUT) based matrix multiplication method for 2-bit MatMulNBits on x64 AVX2 inspired from T-MAC paper and T-MAC repository to speed up low bit LLM inference.

Unlike the existing quant-dequant methods, the LUT-based method directly supports mixed-precision-GEMM without dequantization. It uses bit-wise table lookup to eliminate multiplications and reduce additions required in matrix multiplication.

image

This PR:

  • Add mlas.use_lut_gemm session option allowing use of LUT GEMM inside matmulnbits when it is available (2-bit, BlkLen multiple of 32, K multiple of 32, N multiple of 128, AVX2 present).
  • Introduces LUT packing + kernel config cache (packs bitplanes, scales, ZP) and the main MlasLUTGemm entry that generates per-row LUTs and calls the AVX2 kernel.
  • Implements AVX2 LUT generation GenerateLUT_avx2 and GEMM compute TMACComputeGemm_avx2 and wires dispatch in MLAS platform init.
  • Updates MatMulNBits PrePack/Compute to use LUT packing/compute when opted-in; keeps existing quant-dequant path as fallback.
  • Extends Python quant bindings with 2-bit QDQ helper for parity with the new path.
  • Adds MLAS unit tests covering LUT GEMM across symmetric/asymmetric quant and multiple shapes/block sizes.

Main components:

  • MlasInitLUTGemmKernelConfig: Config for LUT kernels

  • MlasLUTGemmPackQuantBData: Pre Packing of quantized weight

  • MlasLUTPackScalesAndZeroPoints: Pre Packing of qunatized scales and zero points

  • MlasLUTGemm: Main Entry point

  • GenerateLUT_avx2: LUT construction from activations

  • TMACComputeGemm_avx2: AVX2 LUT GEMM kernel

  • Session option: mlas.use_lut_gemm

How to test

  • MLAS LUT GEMM unit tests: see test_sqlutgemm.cpp
  • Run MatMulNBits models with session option mlas.use_lut_gemm=1 on AVX2 machines; expect fallback to existing path if availability checks fail.

Perf

Focus of this PR is functional + kernel bring-up; perf to be reported separately once broader profiling is done.

Future Work

  • Support MLFloat16 (FP16 scales and zero points)
  • Add neon kernel for ARM.
  • Add kernels for 4 bit weights and bitnet kernels
  • Broader batch (N>1) support and additional shape coverage.

liqunfu and others added 30 commits January 29, 2025 19:11
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
…as kernel not implemented for fp32. Also, I need to write the packing logic for the scales as well.
…ssert issue with the data shuffling in prepack
auto scale_ptr = scales ? scales->DataRaw() : nullptr;
packed_b_ = IAllocator::MakeUniquePtr<void>(alloc, packed_b_size_, true);
MlasQNBitGemmPackQuantBData(N_, K_, nbits_, block_size_, compute_type_, qptr, packed_b_.get(), scale_ptr,
has_zp_input_, nullptr, threadpool_ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC - The usage of threadpool in the existing non-LUT path seems like a new addition - is that intentaional (and come with apprioriate tests) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I thought tests in test_sqnbitgemm.cpp should suffice since they already test it with thread pool. I applied changes to only use thread pool for LUT path now.

Once we add tests, I think it might be beneficial to use thread pool for pre packing for other paths

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closing comment for now to merge as discussed offline

}

// Conditional pragma unroll for compiler compatibility
#if defined(__INTEL_COMPILER) || defined(__clang__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this complier dependent ? Is this implementation from the T-MAC library as is ?

// Each iteration processes one row of the activation matrix
// TODO(vraspar): Ideally we have to do block parallelism here

MlasTrySimpleParallel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If M == 1, can we parallelize on N ?

}

size_t n_div = 0;
switch (BlkBitWidth) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this switch if BlkBitWidth is guaranteed to be 2 at this stage ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to have it generalized for when we add int4 kernels

@jambayk jambayk merged commit 8e050d1 into main Jan 15, 2026
90 checks passed
@jambayk jambayk deleted the vraspar/lut-gemm branch January 15, 2026 18:56
* @brief Parameters for TMAC kernel
*/
struct MlasTMACKernelParams {
size_t g;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief comment describing what each config is and what it is used for will help

)
{
const MlasTMACKernelParams& tmac_params = MlasGetLutGemmKernelParams(N, K, BlkBitWidth, BlkLen, HasZeroPoint);
const size_t PackedQuantBDataSize = (N * BlkBitWidth) * (K / tmac_params.g / tmac_params.ngroups_per_elem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an alignment requirement for the packed weights ?

assert(bm % mgroup == 0);
assert(bm % bits == 0);

std::unique_ptr<uint8_t[]> buf(new uint8_t[N * bits * (K / g)]);
Copy link
Member

@hariharans29 hariharans29 Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of what is being done here, a standard RAII containers like vector would do, do we really need a unique_ptr here ?

alex-spacemit pushed a commit to spacemit-com/onnxruntime that referenced this pull request Jan 20, 2026
…TMAC) (microsoft#26695)

### Description

This PR introduces a new experimental lookup-table(LUT) based matrix
multiplication method for 2-bit MatMulNBits on x64 AVX2 inspired from
[T-MAC paper](https://arxiv.org/abs/2407.00088) and [T-MAC
repository](https://github.com/microsoft/T-MAC) to speed up low bit LLM
inference.

Unlike the existing quant-dequant methods, the LUT-based method directly
supports mixed-precision-GEMM without dequantization. It uses bit-wise
table lookup to eliminate multiplications and reduce additions required
in matrix multiplication.

<img width="1910" height="759" alt="image"
src="https://github.com/user-attachments/assets/3e3f2ced-eba4-4d4e-a63c-fec479943202"
/>
 

This PR:
- Add` mlas.use_lut_gemm` session option allowing use of LUT GEMM inside
matmulnbits when it is available (2-bit, BlkLen multiple of 32, K
multiple of 32, N multiple of 128, AVX2 present).
- Introduces LUT packing + kernel config cache (packs bitplanes, scales,
ZP) and the main `MlasLUTGemm` entry that generates per-row LUTs and
calls the AVX2 kernel.
- Implements AVX2 LUT generation `GenerateLUT_avx2` and GEMM compute
`TMACComputeGemm_avx2` and wires dispatch in MLAS platform init.
- Updates MatMulNBits PrePack/Compute to use LUT packing/compute when
opted-in; keeps existing quant-dequant path as fallback.
- Extends Python quant bindings with 2-bit QDQ helper for parity with
the new path.
- Adds MLAS unit tests covering LUT GEMM across symmetric/asymmetric
quant and multiple shapes/block sizes.

 
 ### Main components:
 
 - `MlasInitLUTGemmKernelConfig`: Config for LUT kernels
 - `MlasLUTGemmPackQuantBData`: Pre Packing of quantized weight
- `MlasLUTPackScalesAndZeroPoints`: Pre Packing of qunatized scales and
zero points
 
 - `MlasLUTGemm`: Main Entry point
 - `GenerateLUT_avx2`:  LUT construction from activations
 - `TMACComputeGemm_avx2`: AVX2 LUT GEMM kernel
 - Session option: mlas.use_lut_gemm


### How to test
- MLAS LUT GEMM unit tests: see `test_sqlutgemm.cpp`
- Run MatMulNBits models with session option `mlas.use_lut_gemm=1` on
AVX2 machines; expect fallback to existing path if availability checks
fail.

### Perf
Focus of this PR is functional + kernel bring-up; perf to be reported
separately once broader profiling is done.


### Future Work
- Support MLFloat16 (FP16 scales and zero points)
- Add neon kernel for ARM.
- Add kernels for 4 bit weights and bitnet kernels
- Broader batch (N>1) support and additional shape coverage.

---------

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: carzh <wolfivyaura@gmail.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: carzh <carolinezhu@microsoft.com>
Co-authored-by: Vrajang Parikh <vrparikh@microsoft.com>
tianleiwu pushed a commit that referenced this pull request Jan 21, 2026
…TMAC) (#26695)

### Description

This PR introduces a new experimental lookup-table(LUT) based matrix
multiplication method for 2-bit MatMulNBits on x64 AVX2 inspired from
[T-MAC paper](https://arxiv.org/abs/2407.00088) and [T-MAC
repository](https://github.com/microsoft/T-MAC) to speed up low bit LLM
inference.

Unlike the existing quant-dequant methods, the LUT-based method directly
supports mixed-precision-GEMM without dequantization. It uses bit-wise
table lookup to eliminate multiplications and reduce additions required
in matrix multiplication.

<img width="1910" height="759" alt="image"
src="https://github.com/user-attachments/assets/3e3f2ced-eba4-4d4e-a63c-fec479943202"
/>

This PR:
- Add` mlas.use_lut_gemm` session option allowing use of LUT GEMM inside
matmulnbits when it is available (2-bit, BlkLen multiple of 32, K
multiple of 32, N multiple of 128, AVX2 present).
- Introduces LUT packing + kernel config cache (packs bitplanes, scales,
ZP) and the main `MlasLUTGemm` entry that generates per-row LUTs and
calls the AVX2 kernel.
- Implements AVX2 LUT generation `GenerateLUT_avx2` and GEMM compute
`TMACComputeGemm_avx2` and wires dispatch in MLAS platform init.
- Updates MatMulNBits PrePack/Compute to use LUT packing/compute when
opted-in; keeps existing quant-dequant path as fallback.
- Extends Python quant bindings with 2-bit QDQ helper for parity with
the new path.
- Adds MLAS unit tests covering LUT GEMM across symmetric/asymmetric
quant and multiple shapes/block sizes.

 ### Main components:

 - `MlasInitLUTGemmKernelConfig`: Config for LUT kernels
 - `MlasLUTGemmPackQuantBData`: Pre Packing of quantized weight
- `MlasLUTPackScalesAndZeroPoints`: Pre Packing of qunatized scales and
zero points

 - `MlasLUTGemm`: Main Entry point
 - `GenerateLUT_avx2`:  LUT construction from activations
 - `TMACComputeGemm_avx2`: AVX2 LUT GEMM kernel
 - Session option: mlas.use_lut_gemm

### How to test
- MLAS LUT GEMM unit tests: see `test_sqlutgemm.cpp`
- Run MatMulNBits models with session option `mlas.use_lut_gemm=1` on
AVX2 machines; expect fallback to existing path if availability checks
fail.

### Perf
Focus of this PR is functional + kernel bring-up; perf to be reported
separately once broader profiling is done.

### Future Work
- Support MLFloat16 (FP16 scales and zero points)
- Add neon kernel for ARM.
- Add kernels for 4 bit weights and bitnet kernels
- Broader batch (N>1) support and additional shape coverage.

---------

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: carzh <wolfivyaura@gmail.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: carzh <carolinezhu@microsoft.com>
Co-authored-by: Vrajang Parikh <vrparikh@microsoft.com>
(cherry picked from commit 8e050d1)
tianleiwu added a commit that referenced this pull request Jan 23, 2026
### Description
This PR cherry-picks the following changes for the 1.24.0 release.

### Cherry-picked Commits
| Commit | Commit Title | Author |
|---|---|---|
| 744e7fe | Add type definitions, registration, utilities for
INT2/UINT2 support (#26824) | vraspar |
| 530a1fb | [QNN EP] Add BFloat16 dtype support in QNN EP (#26987) |
tirupath-qti |
| 8e050d1 | Implement new experimental lookup-based matrix
multiplication method(TMAC) (#26695) | vraspar |
| 2d2ba6b | [MLAS/CPU EP] Improve performance of Silu activation path
within the QuickGelu CPU kernel (#26753) | Hariharan Seshadri |
| 1c02b79 | [QNN EP] Add support for handling 0-dimension for Concat
Op (#27000) | Ashwath Shankarnarayan |
| cc2b01b | Fix ClipQuantFusion crash when Clip has multiple input
edges (#27016) | Edward Chen |
| bbd3850 | [QNN EP] Support quantized BatchNorm with per-channel DQ
params on QNN HTP (#26959) | qti-yuduo |
| d8f0318 | Add API to get ep graph partitioning info (#26781) |
Adrian Lizarraga |
| b912b18 | [OVEP] OpenVINO EP Features and bug-fixes for ORT-1.24 -
Follow up (#27007) | Preetha Veeramalai |
| ba11af4 | [QNN-EP] Add MatMulNBits translation for GPU (#26340) |
quic-tirupath |
| c03c419 | [MLAS/NEON] Add dedicated kernel for depthwise
convolution for ARM64 using NEON intrinsics (#26688) | Hariharan
Seshadri |
| e7dfd69 | [QNN-EP] Support alternate Layernorm fusion pattern in
QNN preprocess (#26060) | qti-mattsinc |
| 4013dc1 | Implement multithreading in qgemm_kleidi (#26301) |
Melike Kaptan |
| 9f06181 | [CXX] Enable users to specify custom OrtSyncStream via
RunOptions (#26988) | Dmitri Smirnov |
| cfccd64 | Added support for QMX kernels in MLAS (#26849) |
qti-vaiskv |
| 29d9b2f | Tweak external resource importer handle structs (#27040)
| Scott McKay |
| 9d108d0 | [QNN EP] Add QuickGELU operator support for QNN provider
(#27034) | tirupath-qti |
| b35688f | Add INT2 and UINT2 support for QDQ, transpose and cast
ops (#27022) | vraspar |
| 6d34aba | Introducing BF16 Pointwise NCHWc Convolution for Arm64
(#26838) | Rohanjames1997 |
| 36017ad | [EP ABI] Add CreateCustomOpDomains() API for plugin EP to
register custom ops (#27050) | Chi Lo |
| 50a03e4 | Add a new pipeline for CUDA 13 nuget builds (#27023) |
eserscor |
| a0d4439 | [EP ABI] Update Graph_GetGraphView() implementation
(#26711) | Chi Lo |
| 34bb209 | [webgpu] Fix a bug for im2col (#27069) | Wenqin Yang |
| 46e8d45 | [QNN EP] Add FusedMatMul operator support (#27044) |
tirupath-qti |
| 5e7e7a3 | Disable Float32_2Bits_Asymmetric_256x256 test (#27046) |
vraspar |
| 39f966e | Fix Doxygen documentation build error in
onnxruntime_c_api.h (#27083) | Nick Eubanks |
| 8a7a797 | Print tensor for new packed type of 2 bits (#27064) |
Tianlei Wu |
| 01f40e6 | Fix GPU JAR testing on Linux (#27011) | eserscor |
| b6ed7f3 | Fix warning around ununsed code in QNN Android Emulator
builds by clang (#27026) | Hariharan Seshadri |
| d7daa45 | Raise the timeout for the ios simulator job (#27045) |
Hariharan Seshadri |
| 7e1d818 | upgrade emsdk to 4.0.23 (#27029) | Yulong Wang |
| 347b990 | Fix failing mainline build on Arm64 linux (#27101) |
Rohanjames1997 |
| f481b17 | Add dedicated API to support extracting compatibility
string from model metadata (#27015) | adrastogi |

---------

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com>
Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
Signed-off-by: melkap01 <melike.kaptan@arm.com>
Co-authored-by: vraspar <vrajang@outlook.com>
Co-authored-by: tirupath-qti <tirupath@qti.qualcomm.com>
Co-authored-by: Ashwath Shankarnarayan <ashwshan@qti.qualcomm.com>
Co-authored-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: carzh <wolfivyaura@gmail.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: carzh <carolinezhu@microsoft.com>
Co-authored-by: Vrajang Parikh <vrparikh@microsoft.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Yuduo Wu <yuduow@qti.qualcomm.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: saurabh <saurabh1.kale@intel.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Javier Martinez <javier.e.martinez@intel.com>
Co-authored-by: Bartlomiej Filipek <bartlomiej.filipek@intel.com>
Co-authored-by: bopeng1234 <bo.peng@intel.com>
Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Yaru Du <yaru.du@intel.com>
Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com>
Co-authored-by: Dvoretckii, Mikhail <mikhail.dvoretckii@intel.com>
Co-authored-by: Pallavi Gupta <pallavi.gupta@intel.com>
Co-authored-by: Jianhui Dai <jianhui.j.dai@intel.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Fei Chen <feich@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Akupadhye <aupadhye@qti.qualcomm.com>
Co-authored-by: Wang Ning <ning4.wang@intel.com>
Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com>
Co-authored-by: Jie Chen <jie.a.chen@intel.com>
Co-authored-by: xhcao <xinghua.cao@intel.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: quic-hungjuiw <quic_hungjuiw@quicinc.com>
Co-authored-by: Ian Hunter <ianfhunter@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Jeff Kilpatrick <jkilpatrick@qti.qualcomm.com>
Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Nenad Banfic <46795300+nenad1002@users.noreply.github.com>
Co-authored-by: derdeljan-msft <derdeljan@microsoft.com>
Co-authored-by: n1harika <niharika.sathish@intel.com>
Co-authored-by: Ryan Metcalfe <ryan.metcalfe@intel.com>
Co-authored-by: Jaswanth Gannamaneni <jaswanth.gannamaneni@intel.com>
Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com>
Co-authored-by: liang <gxgaoliang@126.com>
Co-authored-by: Garth Long <garth.long@intel.com>
Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com>
Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: Christopher Warrington <chwarr@microsoft.com>
Co-authored-by: Ishwar Raut <iraut@nvidia.com>
Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
Co-authored-by: Xinpeng Dou <15529241576@163.com>
Co-authored-by: adrastogi <aditya.rastogi@microsoft.com>
Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
Co-authored-by: qti-hungjuiw <hungjuiw@qti.qualcomm.com>
Co-authored-by: Pradeep Sakhamoori <psakhamoori@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: mingyue <131847423+mingyueliuh@users.noreply.github.com>
Co-authored-by: Susanta Bhattacharjee <susanta.bhattacharjee@intel.com>
Co-authored-by: Jozef Wludzik <jozef.wludzik@intel.com>
Co-authored-by: Rajeev Sekar <rajeevsekar21@gmail.com>
Co-authored-by: Mayuresh M Varerkar <mayuresh.m.varerkar@intel.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Wenqin Yang <wenqin.yang@intel.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com>
Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: chunghow-qti <chunghow@qti.qualcomm.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Jiawei Shao <jiawei.shao@intel.com>
Co-authored-by: czekun <chen.zekun@intel.com>
Co-authored-by: Jaskaran Singh Nagi <jaskaran.singh.nagi@intel.com>
Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com>
Co-authored-by: qti-mattsinc <mattsinc@qti.qualcomm.com>
Co-authored-by: Melike Kaptan <melike.kaptan@arm.com>
Co-authored-by: Damien Dooley <damien.dooley@arm.com>
Co-authored-by: qti-vaiskv <vaiskv@qti.qualcomm.com>
Co-authored-by: Rohanjames1997 <rohan.james4@gmail.com>
Co-authored-by: eserscor <erscor@microsoft.com>
Co-authored-by: eserscor <247253654+eserscor@users.noreply.github.com>
Co-authored-by: Nick Eubanks <nieubank@microsoft.com>
Co-authored-by: adrastogi <8368026+adrastogi@users.noreply.github.com>
Co-authored-by: Rohanjames1997 <rohanjms@amazon.com>
@tianleiwu tianleiwu added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.24.0 labels Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants