Skip to content

Conversation

@DannyYuyang-quic
Copy link
Contributor

@DannyYuyang-quic DannyYuyang-quic commented Nov 13, 2025

Summary

Qualcomm AI Engine Direct - Quantization Recipe for LLM

  • add a fine-grained quantization annotation mechanism – quantization recipe
  • applied to LLM models with fine-grained quantization configs

Test plan

All LLM CI under TestExampleLLMScript:

python -m backends.qualcomm.tests.test_qnn_delegate.TestExampleLLMScript -s ${device_id} -H ${host_id} -m ${soc} -b build-android

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15807

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f5b3916 with merge base 3bbe173 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2025
@DannyYuyang-quic
Copy link
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Nov 13, 2025
@DannyYuyang-quic
Copy link
Contributor Author

Hi @cccclai,

This PR includes the Quantization Recipe we went over in today's meeting.
It introduces fine-grained quantization annotation for current LLM models we have.
Please have a look.
Thanks!

cc: @haowhsu-quic

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/per_layer_quant branch from 2d4c061 to e09726d Compare November 14, 2025 09:01
 - add a fine-grained quantization annotation mechanism – quantization
   recipe
 - applied to Llama3-1B/3B with fine-grained quantization configs
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/per_layer_quant branch from e09726d to f0f016e Compare November 17, 2025 14:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new fine-grained quantization annotation mechanism called "quantization recipe" for LLM models in the Qualcomm AI Engine Direct backend. The new approach replaces the previous custom annotation system with a more flexible and maintainable recipe-based pattern.

Key Changes

  • Added QuantRecipe infrastructure providing a builder pattern for defining quantization strategies
  • Implemented model-specific quantization recipes for 14 LLM variants (Llama, Gemma, Qwen, Phi, etc.)
  • Migrated LLM model configurations from custom_annotation tuples to quant_recipe class references

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
examples/qualcomm/oss_scripts/llama/static_llm_quant_recipe.py New file defining StaticLLMQuantRecipe base class and 14 model-specific recipe implementations
examples/qualcomm/oss_scripts/llama/llama.py Updated to use quant_recipe instead of custom_annotations, simplified quantization flow
examples/qualcomm/oss_scripts/llama/__init__.py Removed custom annotation imports/configs, added quant_recipe imports, updated LLMModelConfig to use quant_recipe field
backends/qualcomm/quantizer/quant_recipe.py New core infrastructure with QuantRecipe builder, QuantizationStrategy patterns, and QuantGranularity enum
backends/qualcomm/quantizer/quantizer.py Added recipe support to QnnQuantizer.annotate(), added new use_8a4w QuantDtype
backends/qualcomm/quantizer/qconfig.py Added get_8a4w_qnn_ptq_config() for 8-bit activation, 4-bit weight quantization
backends/qualcomm/quantizer/custom_annotation.py Removed obsolete annotation functions (annotate_down_proj, annotate_output_16a8w, annotate_qkv_proj_sha, StaticLLMQuantConfig)
docs/source/llm/build-run-llama3-qualcomm-ai-engine-direct-backend.md Updated documentation to reference quant_recipe instead of ptq/group_size configs
backends/qualcomm/utils/utils.py Added show_nn_module_stack_for_quant_recipe() helper for debugging module stacks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@abhinaykukkadapu
Copy link
Contributor

abhinaykukkadapu commented Nov 17, 2025

@DannyYuyang-quic thanks for the PR, we've a native executorch.export infra and ExportRecipes (https://github.com/pytorch/executorch/blob/main/export/export.py#L38) for the users to easily use configurations such as these, for example, i added a recipe for QNN - FP16 (https://github.com/pytorch/executorch/blob/main/backends/qualcomm/recipes/qnn_recipe_types.py#L24), would be great if we can expose these quant configs as well for every one to use, this will significantly lower the friction to onboard to QNN.

Also note that, if you use ExportRecipes, you don't have to use to_edge_transform_and_lower_to_qnn as the recipe infra takes care of transforms before lowering. Let me know if you have any questions. Thanks!

CC: @cccclai

@cccclai
Copy link
Contributor

cccclai commented Nov 18, 2025

@abhinaykukkadapu this PR is different than the export recipe you added. It's about how to add more customization to quantize a model. The current recipe for different backends doesn't offer a way to this level of customization and we need to either expose some API or leave it for advanced users only.

@DannyYuyang-quic
Copy link
Contributor Author

Hi @abhinaykukkadapu, @cccclai,
Thanks for the feedback, and thanks Chen for clarifying!
Like Chen said, the goal of this PR is mainly to support more customization to quantize a model.

@abhinaykukkadapu for now, this PR does not use ExportRecipes.
And regarding exposing these quant configs in ExportRecipes, we’re currently working on refactoring the qconfig.py and QNNQuantizer, so we can discuss how to integrate this in a follow-up PR.

@abhinaykukkadapu
Copy link
Contributor

@DannyYuyang-quic

And regarding exposing these quant configs in ExportRecipes, we’re currently working on refactoring the qconfig.py and QNNQuantizer, so we can discuss how to integrate this in a follow-up PR.

Thanks for your work and for letting me know, yes, this would be great, if we expose these complex configs as ExportRecipes (in future), users can just lower a model with just a couple of lines of code.

@meta-codesync
Copy link

meta-codesync bot commented Nov 18, 2025

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D87349343.

@cccclai cccclai merged commit 101e915 into pytorch:main Nov 18, 2025
145 checks passed
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Nov 20, 2025
Summary:
Forward fix test failure in pytorch#15807

Main reason is that this API is called internally. In this PR, I recovered some of the deleted functions in the previous PRs

Differential Revision: D87566729
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Nov 20, 2025
Summary:

Forward fix test failure in pytorch#15807

Main reason is that this API is called internally. In this PR, I recovered some of the deleted functions in the previous PRs

Differential Revision: D87566729
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Nov 20, 2025
Summary:

Forward fix test failure in pytorch#15807

Main reason is that this API is called internally. In this PR, I recovered some of the deleted functions in the previous PRs

Reviewed By: abhinaykukkadapu

Differential Revision: D87566729
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Nov 20, 2025
Summary:

Forward fix test failure in pytorch#15807

Main reason is that this API is called internally. In this PR, I recovered some of the deleted functions in the previous PRs

Reviewed By: abhinaykukkadapu

Differential Revision: D87566729
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Nov 20, 2025
Summary:

Forward fix test failure in pytorch#15807

Main reason is that this API is called internally. In this PR, I recovered some of the deleted functions in the previous PRs

Reviewed By: abhinaykukkadapu

Differential Revision: D87566729
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants