Use weight cache for quantized tensor scale data #14448

GregoryComer · 2025-09-19T22:42:18Z

Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.

The root cause appears to be that tensor scale data is bypassing the weight cache - likely an oversight. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.

This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.

Differential Revision: D82862629

pytorch-bot · 2025-09-19T22:42:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14448

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit 0ce8b1e with merge base 07d1092 ():

NEW FAILURE - The following job has failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t bf80263b6f5c0d05f2b998ca01e0b8913c229f60fc0ce8883b0c50086f4378bc /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-19T22:42:28Z

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629.

facebook-github-bot · 2025-09-19T22:48:39Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D82862629.

GregoryComer · 2025-09-19T22:52:12Z

Adding to 1.0 release blockers, as weight cache is enabled by default in several presets (including Android). QC8 and QB4 linear is broken when it's enabled, so we'll want to cherry pick this fix.

digantdesai

Review automatically exported from Phabricator review in Meta.

digantdesai

can we also run test with a draft PR to see if we enable this by default what fails if anything..

Summary: When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled. The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to. This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled. Test Plan: ``` buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear ``` Reviewed By: digantdesai Differential Revision: D82862629 Pulled By: GregoryComer

facebook-github-bot · 2025-09-19T23:43:42Z

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629.

Summary: When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled. The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to. This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled. Test Plan: ``` buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear ``` Reviewed By: lucylq, digantdesai Differential Revision: D82862629 Pulled By: GregoryComer

facebook-github-bot · 2025-09-19T23:46:37Z

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629.

Summary: When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled. The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to. This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled. Test Plan: ``` buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear ``` Reviewed By: lucylq, digantdesai Differential Revision: D82862629 Pulled By: GregoryComer

facebook-github-bot · 2025-09-20T00:07:47Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D82862629.

shoumikhin · 2025-09-22T04:03:05Z

@pytorchbot cherry-pick --onto release/1.0 -c fixnewfeature

Differential Revision: D82862629 Pull Request resolved: #14448 (cherry picked from commit cf1c4bc)

pytorchbot · 2025-09-22T04:05:17Z

Cherry picking #14448

The cherry pick PR is at #14455 and it is recommended to link a fixnewfeature cherry pick PR with an issue. The following tracker issues are updated:

[v1.0.0] Release Tracker #14288 (comment)

Details for Dev Infra team

Raised by workflow job

Differential Revision: D82862629 Pull Request resolved: pytorch#14448

GregoryComer requested a review from digantdesai as a code owner September 19, 2025 22:42

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 19, 2025

GregoryComer force-pushed the export-D82862629 branch from 2cb9258 to 9568209 Compare September 19, 2025 22:48

GregoryComer self-assigned this Sep 19, 2025

GregoryComer added the release notes: none Do not include this in the release notes label Sep 19, 2025

digantdesai approved these changes Sep 19, 2025

View reviewed changes

GregoryComer force-pushed the export-D82862629 branch from 9568209 to 391c7b3 Compare September 19, 2025 23:43

lucylq approved these changes Sep 19, 2025

View reviewed changes

GregoryComer force-pushed the export-D82862629 branch from 391c7b3 to 41fc9c4 Compare September 19, 2025 23:46

GregoryComer force-pushed the export-D82862629 branch from 41fc9c4 to 0ce8b1e Compare September 19, 2025 23:51

facebook-github-bot merged commit cf1c4bc into pytorch:main Sep 20, 2025
125 of 127 checks passed

pytorchbot pushed a commit that referenced this pull request Sep 22, 2025

Use weight cache for quantized tensor scale data

32fe0bf

Differential Revision: D82862629 Pull Request resolved: #14448 (cherry picked from commit cf1c4bc)

pytorchbot mentioned this pull request Sep 22, 2025

[v1.0.0] Release Tracker #14288

Open

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025

Use weight cache for quantized tensor scale data

b377e67

Differential Revision: D82862629 Pull Request resolved: pytorch#14448

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use weight cache for quantized tensor scale data #14448

Use weight cache for quantized tensor scale data #14448

GregoryComer commented Sep 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

GregoryComer commented Sep 19, 2025

Uh oh!

digantdesai left a comment

Uh oh!

digantdesai left a comment

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

facebook-github-bot commented Sep 20, 2025

Uh oh!

Uh oh!

shoumikhin commented Sep 22, 2025

Uh oh!

pytorchbot commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Use weight cache for quantized tensor scale data #14448

Use weight cache for quantized tensor scale data #14448

Conversation

GregoryComer commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14448

❌ 1 New Failure, 1 Cancelled Job

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

GregoryComer commented Sep 19, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

facebook-github-bot commented Sep 19, 2025

Uh oh!

facebook-github-bot commented Sep 20, 2025

Uh oh!

Uh oh!

shoumikhin commented Sep 22, 2025

Uh oh!

pytorchbot commented Sep 22, 2025

Cherry picking #14448

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

GregoryComer commented Sep 19, 2025 •

edited

Loading

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading