[AWQ][Smooth] mapping shouldn't use ignore#2152
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Summary of ChangesHello @HDCharles, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses and rectifies an issue in the AWQ (Activation-aware Weight Quantization) modifier where the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug where the ignore list was improperly used during module mapping for AWQ smoothing. The change correctly removes the ignore parameter from the match_modules_set call, ensuring it's only used for quantization as intended. The accompanying update to README.md and the new code comment clearly explain this behavior, improving both user documentation and code maintainability. I've added one comment regarding an outdated docstring that this change introduces, which would be good to address for consistency.
7cc1bf0
|
Hi Charles, I’m working on the same issue as well. I don’t think your change will work correctly for multimodal LLMs. In practice, we usually do not want to quantize the vision encoder. As vision encoder usually shares a similar transformer structure, the vision encoder would also be picked up by the AWQ mapping resolution logic when you remove the 'ignore', which is not desirable. In my draft PR, I added an additional field in the modifier to separate the logic. |
I don't think it is an issue, its just going to do smoothing which will be mathematically irrelevent, though likely slow. Also this was tested on python /home/HDCharles/repos/llm-compressor/examples/awq/qwen3-vl-30b-a3b-Instruct-example.py which is multimodal. I left comments on your PR, I think we can make further enhancements on top of this PR since they seem orthagonal and the changes in this PR are enabling behaviors we had previously and are expected. |
|
Hi Charles, thank you for the suggestion. I've made a PR to filter out these redundant mappings #2179 |
1f3c8a6
0f40b37 to
1f3c8a6
Compare
|
@ZewenShen-Cohere sorry i'm back from the break now. i've added that functionality to this PR and added tests as well as consolidated the logic for smoothquant and AWQ |
brian-dellabetta
left a comment
There was a problem hiding this comment.
Updates and tests look good! One nit.
Summary: after the change to match_module_set i made it so AWQ would take into account the ignore list during module mapping, this was a mistake since the ignore list should be used for quantization. this fixes #2151 Test Plan: python /home/HDCharles/repos/llm-compressor/examples/awq/qwen3-vl-30b-a3b-Instruct-example.py Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
1f3c8a6 to
6be4660
Compare
Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Summary:
after the change to use match_module_set i made it so AWQ would take into account the ignore list during module mapping, this was a mistake since the ignore list should be used for quantization.
A similar issue occured with smoothquant which is also fixed, new tests were added to verify this behavior
at the same time, added behavior that if all balance layers + smooth layer are ignored, then it skips that mapping.
these changes should fix #2151 however i have not been able to get it to run, while the matching now works correctly, something is happening with the tracing, i'm investigating that separately as i'm able to run it when i switch to the basic pipeline so this isn't an issue with this
Test Plan:
pytest /home/HDCharles/repos/llm-compressor/tests/llmcompressor/modifiers/smoothquant/test_base.py
pytest /home/HDCharles/repos/llm-compressor/tests/llmcompressor/modifiers/awq/test_base.py