Skip to content

Conversation

@GregoryComer
Copy link
Member

@GregoryComer GregoryComer commented Sep 19, 2025

Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.

The root cause appears to be that tensor scale data is bypassing the weight cache - likely an oversight. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.

This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.

Differential Revision: D82862629

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14448

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit 0ce8b1e with merge base 07d1092 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2025
@facebook-github-bot
Copy link
Contributor

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629.

@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D82862629.

@GregoryComer GregoryComer self-assigned this Sep 19, 2025
@GregoryComer GregoryComer added the release notes: none Do not include this in the release notes label Sep 19, 2025
@GregoryComer
Copy link
Member Author

Adding to 1.0 release blockers, as weight cache is enabled by default in several presets (including Android). QC8 and QB4 linear is broken when it's enabled, so we'll want to cherry pick this fix.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also run test with a draft PR to see if we enable this by default what fails if anything..

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Sep 19, 2025
Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.

The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.

This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.


Test Plan:
``` 
buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear
```

Reviewed By: digantdesai

Differential Revision: D82862629

Pulled By: GregoryComer
@facebook-github-bot
Copy link
Contributor

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629.

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Sep 19, 2025
Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.

The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.

This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.


Test Plan:
``` 
buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear
```

Reviewed By: lucylq, digantdesai

Differential Revision: D82862629

Pulled By: GregoryComer
@facebook-github-bot
Copy link
Contributor

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629.

Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.

The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.

This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.

Test Plan:
```
buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear
```

Reviewed By: lucylq, digantdesai

Differential Revision: D82862629

Pulled By: GregoryComer
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D82862629.

@facebook-github-bot facebook-github-bot merged commit cf1c4bc into pytorch:main Sep 20, 2025
125 of 127 checks passed
@shoumikhin
Copy link
Contributor

@pytorchbot cherry-pick --onto release/1.0 -c fixnewfeature

pytorchbot pushed a commit that referenced this pull request Sep 22, 2025
Differential Revision: D82862629

Pull Request resolved: #14448

(cherry picked from commit cf1c4bc)
@pytorchbot
Copy link
Collaborator

Cherry picking #14448

The cherry pick PR is at #14455 and it is recommended to link a fixnewfeature cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
Differential Revision: D82862629

Pull Request resolved: pytorch#14448
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants