[Bugfix] Fix quantization skip modules logic by jeejeelee · Pull Request #13562 · vllm-project/vllm

jeejeelee · 2025-02-19T17:39:13Z

Motivation

Some models, such as QWEN25-VL, have modified their layer hierarchy compared to their original transformers implementation. This change causes quantization's skip modules to become ineffective, leading to incorrect initialization of linear methods.

Reproduce code

import vllm
llm = vllm.LLM(
    "unsloth/Qwen2.5-VL-72B-Instruct-unsloth-bnb-4bit",
    max_model_len=3200,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    trust_remote_code=True,
)

TODO

Investigate other quantization method (e.g. AWQ)
Optimize the implementation logic

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

github-actions · 2025-02-19T17:39:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mgoin · 2025-03-10T22:19:19Z

vllm/model_executor/model_loader/utils.py

+        # BitsAndBytes
+        if (isinstance(quant_config, BitsAndBytesConfig)
+                and quant_config.llm_int8_skip_modules):
+            quant_config.llm_int8_skip_modules = [
+                hf_to_vllm_mapper._map_name(module)
+                for module in quant_config.llm_int8_skip_modules
+            ]
+        # AWQ
+        elif (isinstance(quant_config, AWQConfig)
+              and quant_config.modules_to_not_convert):
+            quant_config.modules_to_not_convert = [
+                hf_to_vllm_mapper._map_name(module)
+                for module in quant_config.modules_to_not_convert
+            ]
+        # TODO: Supports more quantization types.


Maybe we should introduce a common ignored_modules or ignored_prefixes to QuantizationConfig like packed_modules_mapping https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/base_config.py#L60-L66

Then each quant config can convert their specific llm_int8_skip_modules, modules_to_not_convert, etc in a canonical format in ignored_modules. This will also allow us to generalize the is_layer_skipped function

I'd support an implementation like this as well. This current implementation could fail to properly map module names in nested models.

modules_to_not_convert = ["SubModel.A"] SubModel.hf_to_vllm_mapper = Mapper(orig_to_new_prefix={"A": "B"}) Note that "SubModel.A" will not match because "SubModel.A" does not start with "A"

This is a fairly minor issue, but something to keep in mind.

Another implementation could look like this:

Add a mutable ignored_modules attribute to QuantizationConfig

At construction-time, using the method-specific constructor to populate the ignored_modules attribute from disk

At initialize-time, within SupportsQuant, use the given model prefix and mapper to update the ignored_modules list with the proper model-specific mapping
a. ignored_modules = [prefix + hf_to_vllm_mapper[module - prefix] for module in ignored_modules]

This has the advantage of further standardizing around the QuantizationConfig base, as well as supporting mapping with nested models

@jeejeelee Here's a WIP of what that might look like: #14635

@kylesayrs Can you provide an example?

mgoin · 2025-03-10T22:21:08Z

vllm/model_executor/model_loader/utils.py

+    def _configure_packed_modules_mapping():
+        """
+        Pass packed_modules_mapping by reference to quant_config so that
+        quant_config can properly match fused modules
+
+        Note that model attributes are passed by reference to quant_config,
+        enabling them to be updated by model_class.__new__ (ex. chatglm, qwen)
+        """
+        packed_mapping = getattr(model_class, "packed_modules_mapping", None)
+        if packed_mapping is not None:
+            # pass packed_modules_mapping by reference to quant_config
+            quant_config.packed_modules_mapping = packed_mapping
+        else:
+            logger.warning(
+                "The model class %s has not defined `packed_modules_mapping`, "
+                "this may lead to incorrect mapping of quantized or ignored "
+                "modules", model_class.__name__)


Why is this needed after we added SupportsQuant (#13104), I thought getting the packed_modules_mapping from the model to the quant config was the main purpose of that. cc @kylesayrs

The _configure_packed_modules_mapping function needs to remain in place until SupportsQuant has been added to all applicable models

jeejeelee · 2025-03-13T01:28:06Z

Close due to #14635

Init

b33cc2c

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee marked this pull request as draft February 19, 2025 17:39

jeejeelee added 9 commits February 20, 2025 11:28

Merge branch 'vllm-project:main' into fix-quant-skip-modules

b1e18ba

Merge branch 'vllm-project:main' into fix-quant-skip-modules

3f73e8c

Move forward

dd2021f

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'vllm-project:main' into fix-quant-skip-modules

8883c72

Merge branch 'vllm-project:main' into fix-quant-skip-modules

61e3041

Merge branch 'vllm-project:main' into fix-quant-skip-modules

6ed5282

Merge branch 'vllm-project:main' into fix-quant-skip-modules

4230e6b

Merge branch 'vllm-project:main' into fix-quant-skip-modules

856bea1

Done

8d2badd

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee marked this pull request as ready for review March 5, 2025 12:49

jeejeelee requested a review from mgoin March 5, 2025 12:49

jeejeelee added 2 commits March 7, 2025 18:22

Merge branch 'vllm-project:main' into fix-quant-skip-modules

380ce5f

Merge branch 'vllm-project:main' into fix-quant-skip-modules

708b413

mgoin reviewed Mar 10, 2025

View reviewed changes

jeejeelee closed this Mar 13, 2025

jeejeelee deleted the fix-quant-skip-modules branch March 14, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix quantization skip modules logic#13562

[Bugfix] Fix quantization skip modules logic#13562
jeejeelee wants to merge 12 commits intovllm-project:mainfrom
jeejeelee:fix-quant-skip-modules

jeejeelee commented Feb 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

mgoin Mar 10, 2025

Uh oh!

kylesayrs Mar 10, 2025 •

edited

Loading

Uh oh!

kylesayrs Mar 11, 2025

Uh oh!

jeejeelee Mar 12, 2025

Uh oh!

mgoin Mar 10, 2025

Uh oh!

kylesayrs Mar 10, 2025

Uh oh!

jeejeelee commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jeejeelee commented Feb 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Reproduce code

TODO

Uh oh!

github-actions bot commented Feb 19, 2025

Uh oh!

mgoin Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

kylesayrs Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylesayrs Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

kylesayrs Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeejeelee commented Feb 19, 2025 •

edited by github-actions bot

Loading

kylesayrs Mar 10, 2025 •

edited

Loading