Skip to content

MatmulNBits prepacking scales fix#27412

Merged
hariharans29 merged 11 commits intomainfrom
hari/scales_prepack
Feb 24, 2026
Merged

MatmulNBits prepacking scales fix#27412
hariharans29 merged 11 commits intomainfrom
hari/scales_prepack

Conversation

@hariharans29
Copy link
Copy Markdown
Member

Description

Fix incorrect scales element count while pre-packing scales while we processing the B input in the Prepack() method of MatmulNBits operator

Motivation and Context

Fix potential crash due to incorrect element count

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the MatMulNBits operator's PrePack method for the MLFloat16 specialization. The bug occurs when prepacking the B input tensor (weights) and scales need to be converted from MLFloat16 to float32. The code was incorrectly using the B tensor's size instead of the scales tensor's size for buffer allocation and conversion, which could lead to buffer overruns or underruns depending on the relative sizes of B and scales tensors.

Changes:

  • Fixed incorrect size calculation when prepacking scales for MLFloat16 MatMulNBits operator
  • Changed from using B tensor size to scales tensor size for scales conversion buffer allocation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

guschmue
guschmue previously approved these changes Feb 23, 2026
tianleiwu
tianleiwu previously approved these changes Feb 23, 2026
@hariharans29 hariharans29 dismissed stale reviews from tianleiwu and guschmue via 9d40b7d February 23, 2026 18:42
guschmue
guschmue previously approved these changes Feb 23, 2026
tianleiwu
tianleiwu previously approved these changes Feb 23, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hariharans29 and others added 5 commits February 23, 2026 11:43
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@hariharans29 hariharans29 enabled auto-merge (squash) February 24, 2026 00:08
@hariharans29 hariharans29 merged commit 0982844 into main Feb 24, 2026
90 checks passed
@hariharans29 hariharans29 deleted the hari/scales_prepack branch February 24, 2026 16:17
tianleiwu pushed a commit that referenced this pull request Feb 26, 2026
### Description
Fix incorrect scales element count while pre-packing scales while we
processing the B input in the Prepack() method of MatmulNBits operator


### Motivation and Context
Fix potential crash due to incorrect element count

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
tianleiwu added a commit that referenced this pull request Feb 27, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| decd177 | #27090 | Fix GatherND division by zero when batch
dimensions mismatch |
| 55f8234 | #27360 | Fix QMoE CPU Operator |
| df9146f | #27403 | [MLAS] Adding DynamicQGemm function pointers and
ukernel interface |
| 0f93853 | #27318 | [js/web] Use embedded WASM module in Blob URL
workers when wasmBinary is provided |
| b2a6e69 | #27364 | QMoE CPU Performance Update (Up to 4x on 4-bit)
|
| f501e1d | #27413 | Fix refcount bug in map input conversion that
caused shutdown segfault |
| b32b205 | #27421 | Fix error where bytes is not assigned for
dynamic qgemm pack b size |
| 426b006 | #27397 | Fix DllImportResolver |
| 0982844 | #27412 | MatmulNBits prepacking scales fix |
| 9afb0d2 | #27430 | Fix validation for external data paths for
models loaded from bytes |
| 71d2cd0 | #27401 | Enable Python 3.14 CI and Upgrade Dependencies |
| 79e0676 | #27419 | fix: out of bounds access for resize operation |
| 82eb99c | #27459 | Fix SkipLayerNorm fusion incorrectly applied
when gamma/beta are not 1D |
| 355278a | #27444 | Fix GatherCopyData Integer Truncation Leading to
Heap Out-of-Bounds Read/Write |
| cf96123 | #27411 | [web] fix usage of wasmBinary together with a
blob URL for .mjs |
| 1131a86 | #27399 | [web] remove the unhelpful "Unknown CPU vendor"
warning. |
| ffbbc4f | #27316 | Build Windows ARM64X binaries as part of
packaging pipeline |

---------

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>
Co-authored-by: patryk-kaiser-ARM <patryk.kaiser@arm.com>
Co-authored-by: don <70039285+0-don@users.noreply.github.com>
Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Lukas Folle <126877803+lukas-folle-snkeos@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Chaya <cha182350@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Erik <erscor@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants