-
Notifications
You must be signed in to change notification settings - Fork 699
Use weight cache for quantized tensor scale data #14448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use weight cache for quantized tensor scale data #14448
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14448
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled JobAs of commit 0ce8b1e with merge base 07d1092 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629. |
2cb9258 to
9568209
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D82862629. |
|
Adding to 1.0 release blockers, as weight cache is enabled by default in several presets (including Android). QC8 and QB4 linear is broken when it's enabled, so we'll want to cherry pick this fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review automatically exported from Phabricator review in Meta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also run test with a draft PR to see if we enable this by default what fails if anything..
Summary: When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled. The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to. This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled. Test Plan: ``` buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear ``` Reviewed By: digantdesai Differential Revision: D82862629 Pulled By: GregoryComer
9568209 to
391c7b3
Compare
|
@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629. |
Summary: When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled. The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to. This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled. Test Plan: ``` buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear ``` Reviewed By: lucylq, digantdesai Differential Revision: D82862629 Pulled By: GregoryComer
391c7b3 to
41fc9c4
Compare
|
@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D82862629. |
Summary: When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled. The root cause appears to be that tensor scale data was bypassing the weight cache - likely an oversight in initial implementation. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to. This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled. Test Plan: ``` buck test -c executorch.xnnpack_weights_cache=1 fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops -- linear ``` Reviewed By: lucylq, digantdesai Differential Revision: D82862629 Pulled By: GregoryComer
41fc9c4 to
0ce8b1e
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D82862629. |
|
@pytorchbot cherry-pick --onto release/1.0 -c fixnewfeature |
Cherry picking #14448The cherry pick PR is at #14455 and it is recommended to link a fixnewfeature cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
Differential Revision: D82862629 Pull Request resolved: pytorch#14448
Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or qc8-quantized linear weights, it triggers an assertion that is intended to make sure all data is in the weight cache. This can be reproduced by running the XNNPACK backend linear op tests with weight cache enabled.
The root cause appears to be that tensor scale data is bypassing the weight cache - likely an oversight. This isn't a correctness issue, but does cause the aforementioned assert to fail and uses marginally more memory than it otherwise needs to.
This PR updates the XNNPACK compileModel call to use the weight cache for scale data (instead of putting it in the unpacked_buffers list). With this change, the linear op tests pass with weight cache enabled.
Differential Revision: D82862629