[AWQ] Support for a module used in an AWQ mapping to be unquantized && other bug fixes #2158

ZewenShen-Cohere · 2025-12-19T07:01:24Z

I'll revert these changes later

gemini-code-assist · 2025-12-19T06:58:32Z

The list comprehension for creating awq_ignore involves checking for membership in self.force_balance, which is a list. This results in a time complexity of O(N*M), where N is the length of self.ignore and M is the length of self.force_balance. You have already computed force_balance_set on line 298, which allows for O(1) average time complexity for membership checking. Using this set would make the operation more efficient, with a total complexity of O(N+M).

Suggested change

awq_ignore = [

ign for ign in (self.ignore or [])

if ign not in self.force_balance

]

awq_ignore = [

ign for ign in (self.ignore or [])

if ign not in force_balance_set

]

gemini-code-assist · 2025-12-19T06:58:31Z

Iterating over mapping.balance_layers here appears to be incorrect. This loop is part of a grid search to find the best scaling factor by minimizing quantization error. By including non-quantized layers (which can be in balance_layers due to force_balance), their weights are modified, and the resulting output distortion is included in the loss calculation. This loss should ideally only reflect quantization error from the quantized layers. Using balance_layers_to_patch, which is defined before this block and contains only the layers to be quantized, would be the correct approach. The influence of force_balance layers is already correctly handled in the computation of w_mean, which contributes to the scales.

Gemini’s answer seems incorrect. We also need to account for nn.Modules that should not be quantized, so that the model produced by AWQ remains functionally equivalent to the original network.

-Original file line number
+Diff line change
@@ Expand Up / @@ -53,6 +53,7 @@ def tokenize(sample): @@
     recipe = [
         AWQModifier(
             ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
+            force_balance=["re:.*mlp.gate$"],
             scheme="W4A16",
             targets=["Linear"],
         ),
@@ Expand All / @@ -67,6 +68,13 @@ def tokenize(sample): @@
         num_calibration_samples=NUM_CALIBRATION_SAMPLES,
     )
+    # Save to disk compressed.
+    SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-awq-sym-new"
+    model.save_pretrained(SAVE_DIR, save_compressed=True)
+    tokenizer.save_pretrained(SAVE_DIR)
     # Confirm generations of the quantized model look sane.
     print("\n\n")
     print("========== SAMPLE GENERATION ==============")
@@ Expand All / @@ -76,9 +84,4 @@ def tokenize(sample): @@
     )
     output = model.generate(input_ids, max_new_tokens=100)
     print(tokenizer.decode(output[0]))
-    print("==========================================\n\n")
-    # Save to disk compressed.
-    SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-awq-sym"
-    model.save_pretrained(SAVE_DIR, save_compressed=True)
-    tokenizer.save_pretrained(SAVE_DIR)
+    print("==========================================\n\n")

-Original file line number
+Diff line change
@@ Expand Up / @@ -46,7 +46,7 @@ class AWQMapping: @@
         AWQMapping("re:.*v_proj$", ["re:.*o_proj$"]),
         AWQMapping(
             "re:.*post_attention_layernorm$",
-            ["re:.*mlp.experts.*.gate_proj$", "re:.*mlp.experts.*.up_proj$"],
+            ["re:.*mlp.experts.*.gate_proj$", "re:.*mlp.experts.*.up_proj$", "re:.*mlp.gate$"],
         ),
         AWQMapping(
             "re:.*up_proj$",
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWQ] Support for a module used in an AWQ mapping to be unquantized && other bug fixes #2158

Uh oh!

Diff view

Diff view

There are no files selected for viewing

ZewenShen-Cohere Dec 19, 2025

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

gemini-code-assist bot Dec 19, 2025

Uh oh!

ZewenShen-Cohere Dec 19, 2025

Uh oh!

[AWQ] Support for a module used in an AWQ mapping to be unquantized && other bug fixes #2158

Uh oh!

[AWQ] Support for a module used in an AWQ mapping to be unquantized && other bug fixes #2158

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

ZewenShen-Cohere Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

ZewenShen-Cohere Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!