[Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference (clean-base mirror of #588) by danielhanchen · Pull Request #10 · danielhanchen/unsloth-zoo-staging

danielhanchen · 2026-04-20T14:10:08Z

Staging mirror of unslothai#603

Original PR: unslothai#603
Author: danielhanchen

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.

Original description

Summary

Clean-base mirror of unslothai#588. Contains only the original author's 9 commits from @Datta0 (7fa143f through b7052b5, 2026-03-30 to 2026-04-17), with none of the later auto-review fix commits that accumulated on that PR's branch.

Why

Provides a fresh starting point to re-run the PR-review pipeline against the author's original diff without interference from prior review-loop commits layered on top of unslothai#588. Useful for validating review-pipeline changes end-to-end.

Branch contents

Nine commits from @Datta0:

7fa143f [WIP] initial fast_inference support for qwen3.5
31c1bc3 [WIP] fixes for rope deltas
112f1b5 cleanup
a31dee8 fix lm_head detection and remove moe
5d07504 [WIP] gemma 4 dense fast inference
a85a4f4 Fix dtype setting
d2f7983 Merge remote-tracking branch 'origin/main' into qwen35-shared-final
43ed24f Fix gemma4 load on vllm 0.19.0
b7052b5 fix bnb loader for gemam4

Note

Not a replacement for unslothai#588. See unslothai#588 for the actively-maintained PR and associated review history.

danielhanchen · 2026-04-20T14:14:03Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Gemma 4 and Qwen 3.5 (GDN) architectures, adding specialized vLLM patches for LoRA and k_eq_v weight handling and a model finalization utility. It also expands layer configuration templates and improves dtype management. Reviewer feedback recommends simplifying helper functions like _get_model_device and _normalize_state_dict_tensor, and streamlining attribute-setting logic to improve code readability.

gemini-code-assist · 2026-04-20T14:17:05Z

+def _get_model_device(model):
+    for tensor in model.parameters():
+        return tensor.device
+    for tensor in model.buffers():
+        return tensor.device
+    return torch.device("cpu")
+pass


The _get_model_device function can be simplified using next() with a generator expression to avoid explicit loops and improve readability.

def _get_model_device(model): return next((t.device for t in model.parameters()), next((t.device for t in model.buffers()), torch.device("cpu"))) pass

gemini-code-assist · 2026-04-20T14:17:05Z

+        try:
+            setattr(config, field, runtime_dtype)
+            success = True
+            continue
+        except Exception:
+            pass
+
+        try:
+            config.__dict__[field] = runtime_dtype
+            success = True
+        except Exception:
+            pass


The nested try-except blocks for setting attributes are redundant. You can simplify this by attempting the standard setattr first and falling back to __dict__ only if it fails, or simply using setattr if the object is a standard Python object.

Suggested change

try:

setattr(config, field, runtime_dtype)

success = True

continue

except Exception:

pass

try:

config.__dict__[field] = runtime_dtype

success = True

except Exception:

pass

for field in target_fields:

try:

setattr(config, field, runtime_dtype)

success = True

break

except Exception:

try:

config.__dict__[field] = runtime_dtype

success = True

break

except Exception:

pass

gemini-code-assist · 2026-04-20T14:17:05Z

+    def _normalize_state_dict_tensor(value):
+        if isinstance(value, torch.nn.Parameter):
+            value = value.detach()
+        if value.is_sparse:
+            value = value.to_dense()
+        return value.contiguous()


The _normalize_state_dict_tensor function can be simplified by chaining the operations directly.

def _normalize_state_dict_tensor(value): if isinstance(value, torch.nn.Parameter): value = value.detach() return value.to_dense().contiguous() if value.is_sparse else value.contiguous()

danielhanchen · 2026-04-20T15:24:47Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for Gemma 4 and Qwen 3.5 (GDN) architectures, featuring vLLM patches for LoRA and k_eq_v support, a new finalize_huggingface_model utility, and extract_gdn_layers for Gated Delta Net. It also improves dtype configuration and tensor unwrapping across the codebase. Feedback highlights opportunities to improve the robustness of language model detection, ensure safer attribute checks for rotary embeddings, and align with modern PyTorch practices by using .detach() instead of .data.

gemini-code-assist · 2026-04-20T15:27:29Z

+                        device = target_device,
+                        dtype = dtype,
+                    )
+        if hasattr(module, "rotary_pos_emb"):


hasattr(module, "rotary_pos_emb") will return True even if the attribute is set to None. If it is None, the subsequent call to module.rotary_pos_emb.__class__(...) on line 713 will raise a TypeError. It is safer to check if the attribute is not None.

Suggested change

if hasattr(module, "rotary_pos_emb"):

if getattr(module, "rotary_pos_emb", None) is not None:

gemini-code-assist · 2026-04-20T15:27:30Z

+    if original_meta_model is not None:
+        copy_attributes(original_meta_model, new_model)
+
+    language_model = getattr(getattr(new_model, "model", None), "language_model", None)


The detection of language_model here is less robust than the logic used in set_additional_modules (lines 541-549). It may fail to identify the language model component for standard causal LM models or models where language_model is a direct attribute of new_model. Using a more robust detection logic ensures that layer_idx is correctly patched across different model architectures.

Suggested change

language_model = getattr(getattr(new_model, "model", None), "language_model", None)

language_model = getattr(new_model, "language_model", None)

if language_model is None:

language_model = getattr(new_model, "model", None)

if language_model is not None:

language_model = getattr(language_model, "language_model", language_model)

gemini-code-assist · 2026-04-20T15:27:30Z

+    store(f"{prefix}.conv1d.weight", gdn.conv1d.weight.data)
+    store(f"{prefix}.dt_bias", gdn.dt_bias.data)
+    store(f"{prefix}.A_log", gdn.A_log.data)


Using .data is discouraged in modern PyTorch as it can lead to silent bugs by bypassing autograd tracking and safety checks. It is recommended to use .detach() instead when a view of the tensor is needed without gradient tracking.

Suggested change

store(f"{prefix}.conv1d.weight", gdn.conv1d.weight.data)

store(f"{prefix}.dt_bias", gdn.dt_bias.data)

store(f"{prefix}.A_log", gdn.A_log.data)

store(f"{prefix}.conv1d.weight", gdn.conv1d.weight.detach())

store(f"{prefix}.dt_bias", gdn.dt_bias.detach())

store(f"{prefix}.A_log", gdn.A_log.detach())

empty_model.py (patch_gemma4_vllm_lora_support, line 349): Drop the `model=model` keyword from the `lora_manager_cls(...)` call so model is passed only positionally. Blame: 5d07504 "[WIP] gemma 4 dense fast inference" introduced `lora_manager_cls(model = model, *args, **kwargs)`. Upstream vllm.lora.model_manager.create_lora_manager is `(model, max_num_seqs, max_num_batched_tokens, vocab_size, lora_config, vllm_config, device, lora_manager_cls=..., **kwargs)` -- model is positional. When vLLM calls the patched wrapper, `*args` captures the remaining positional arguments. Re-expanding them with a `model=model` keyword raises `TypeError: got multiple values for argument 'model'` because the first positional in `args` (max_num_seqs) also binds to the `model` formal parameter of LoRAModelManager.__init__. Passing model positionally preserves the intended WIP behavior without the double-bind. Not a behavioral deletion -- the call signature is the same, only the argument-passing style changed. vllm_utils.py (assert_same_state_dict._normalize_state_dict_tensor): Add `isinstance(value, torch.Tensor)` early-return guard so non-tensor entries in a state_dict are passed through instead of raising AttributeError on `.is_sparse` / `.contiguous()`. Safe because the outer try/except already treats the call as a lookup and the tensor path below requires a real tensor anyway.

danielhanchen · 2026-04-20T16:27:49Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Gemma 4 and Qwen 3.5 (Gated Delta Net) architectures, featuring vLLM to Hugging Face conversion utilities, LoRA patches, and quantization support. Key additions include the finalize_huggingface_model function for post-conversion cleanup, robust dtype configuration handling, and expanded layer mapping. Feedback identifies a critical issue where skipping v_proj extraction for Gemma 4 k_eq_v layers causes missing keys in the Hugging Face state dict; a suggestion is provided to duplicate the k_proj shard instead. Additionally, the slicing logic in extract_gdn_layers should be validated to prevent potential IndexError when handling different fused projection counts.

gemini-code-assist · 2026-04-20T16:30:25Z

+                if kk not in gemma4_k_eq_v_layers:
+                    get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj)


For Gemma 4 layers where attention_k_eq_v is enabled, vLLM's qkv_proj typically contains only Q and K shards. However, the Hugging Face model architecture still expects a v_proj weight. Skipping the extraction of v_proj here will result in a missing key in the converted state dictionary, which can cause errors when loading the model in Transformers (e.g., missing keys or uninitialized weights).

Instead of skipping, you should duplicate the k_proj shard (index 1) into v_proj to ensure the Hugging Face model is fully populated and functional.

Suggested change

if kk not in gemma4_k_eq_v_layers:

get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj)

if kk not in gemma4_k_eq_v_layers:

get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj)

else:

# For k_eq_v layers, v_proj is a copy of k_proj

get_state_dict(f"{prefix}.v_proj", 1, state_dict, qkv_proj)

gemini-code-assist · 2026-04-20T16:30:26Z

+        qkv_weight = weight[offsets[0]:offsets[3]]
+        z_weight = weight[offsets[3]:offsets[4]]


The slicing logic for qkv_weight and z_weight assumes that offsets has at least 5 elements (derived from 4 output sizes: Q, K, V, and Z). If a model variant has a different number of fused projections, this will raise an IndexError. It would be safer to validate the length of output_sizes or offsets before performing these slices.

empty_model.py extract_gdn_layers (~line 1086): store gdn.norm.weight when the GDN module exposes a norm submodule. Upstream Qwen3_5GatedDeltaNet constructs `self.norm` as Qwen3_5RMSNormGated / FusedRMSNormGated (modeling_qwen3_5.py:391-401) and get_model_layer_config already lists `linear_attn.norm` under layernorms, so the extractor was silently leaving the empty-model placeholder weight in place. Addition only; no existing code removed. empty_model.py get_model_layer_config: add Gemma4 per-layer-input entries that conversion drops. Upstream Gemma4DecoderLayer creates `per_layer_input_gate`, `per_layer_projection`, and `post_per_layer_input_norm` whenever `hidden_size_per_layer_input > 0` (default 256 per configuration_gemma4.py:169). Addition only; no existing entries removed. vllm_utils.py _get_vllm_state_dict per-layer-input extraction (line 1170-1176): reuses the existing `get_state_dict` helper already used by surrounding self_attn / mlp extraction -- no new helper written, no duplicate logic. FA6 self-check: grepped workdir for per_layer_input / _buffers patterns, confirmed get_state_dict is the correct existing helper for Linear extraction, and that no set_additional_modules-style helper for these specific modules already exists. vllm_utils.py convert_vllm_to_huggingface bare-tensor branch (line ~1405): wrap the existing Parameter-assign path in an if/else that first checks `_buffers[attr_name]`. FA3 blame analysis: - line 1401 (the Parameter-wrap call) blames 5d07504 "[WIP] gemma 4 dense fast inference" -- the original code was written to handle Gemma4 layer_scalar as an nn.Parameter, but upstream Gemma4DecoderLayer registers it as a buffer (modeling_gemma4.py:1337: `self.register_buffer("layer_scalar", torch.ones(1))`). The original intent of 5d07504 was to move these tensors onto the model; preserving that intent requires honoring the source module's buffer registration rather than forcing nn.Parameter on all bare tensors. The existing Parameter-wrap path is retained for the non-buffer case and routes through the same exec(...) assignment, so no historical code is deleted -- only a pre-branch was added to select the buffer target when appropriate. - line 1402 (the exec-assign call) blames 2afbcc1 "Fast Inference with vLLM for VLMs (unslothai#202)". This line is moved verbatim into the `else` arm of the new if/else; not deleted. FA4 note: nn.Parameter(...) is still invoked, but only for the non-buffer case; the buffer case uses `_buffers[name] = value`, matching the pattern already used in empty_model.py:706 and 743 inside finalize_huggingface_model. vllm_utils.py convert_vllm_to_huggingface LayerNorm branch (line ~1465): special-case `.conv1d` before the weight-only path. FA3 blame: line 1453 (the "# LayerNorms (including vision norms)" comment) blames 2afbcc1 "Fast Inference with vLLM for VLMs (unslothai#202)". The comment was edited to acknowledge the additional conv1d case; the original LayerNorm weight-only logic is retained for non-conv1d layer_names. Upstream Qwen3_5GatedDeltaNet builds conv1d as a depthwise Conv1d with `kernel_size=linear_conv_kernel_dim` and `groups=conv_dim` (modeling_qwen3_5.py:375-382). The pre-existing LayerNorm-only path only wrote `.weight`, so Conv1d kept placeholder `kernel_size=1 / padding=0 / groups=1` and F.conv1d produced a wrong output length for any `linear_conv_kernel_dim > 1`. Fixing conv1d is the minimum change; the surrounding LayerNorm code is unchanged. rl_replacements.py grpo_accumulated_loss (lines 762-763): fix pre-existing typo `io_same_decice` -> `io_same_device` to match accelerate's AlignDevicesHook attribute (accelerate/hooks.py:266,275, 347). FA3 blame: both lines blame 8ac4171 "GPT OSS RL (unslothai#303)". The original commit introduced the typo, which made the hasattr check always False and the intended hook reset never run. This commit restores the intended behavior and does not remove the guard or the assignment -- only the attribute name is corrected. The PR's newly added `rope_deltas` reset on the immediately following lines places the overlap inside the changed region, which is why the typo is being fixed in this PR rather than in a separate cleanup.

danielhanchen · 2026-04-20T17:10:15Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Gemma4 and Qwen 3.5 (Gated Delta Net) architectures, implementing specialized vLLM patches for LoRA and quantization state management. Key additions include the finalize_huggingface_model function for centralized model post-processing, expanded layer configuration templates for new architectures, and enhanced robustness in dtype configuration and state dict conversion. Review feedback identifies potential TypeError vulnerabilities when re-initializing rotary embeddings if attributes are None, and suggests more robust string parsing for layer names to prevent potential ValueError during model conversion.

gemini-code-assist · 2026-04-20T17:13:46Z

+    vision_config = getattr(config, "vision_config", None)
+
+    for module in new_model.modules():
+        if hasattr(module, "rotary_emb"):


The hasattr check will return True even if module.rotary_emb is None. Accessing module.rotary_emb.__class__ on line 699 when it is None will result in a TypeError because NoneType is not callable. It is safer to check if the attribute is not None.

Suggested change

if hasattr(module, "rotary_emb"):

if getattr(module, "rotary_emb", None) is not None:

gemini-code-assist · 2026-04-20T17:13:46Z

+                        device = target_device,
+                        dtype = dtype,
+                    )
+        if hasattr(module, "rotary_pos_emb"):


Similar to rotary_emb, module.rotary_pos_emb should be checked for None before accessing its __class__ on line 713 to avoid a TypeError.

Suggested change

if hasattr(module, "rotary_pos_emb"):

if getattr(module, "rotary_pos_emb", None) is not None:

gemini-code-assist · 2026-04-20T17:13:46Z

+            assert vision_config is not None, "Unsloth: vision_config is required for models with vision rotary_pos_emb"
+            head_dim = vision_config.hidden_size // vision_config.num_heads
+            module.rotary_pos_emb = module.rotary_pos_emb.__class__(head_dim//2).to(target_device)
+        if hasattr(module, "rotary_emb_local"):


Similar to rotary_emb, module.rotary_emb_local should be checked for None before accessing its __class__ on line 718 to avoid a TypeError.

Suggested change

if hasattr(module, "rotary_emb_local"):

if getattr(module, "rotary_emb_local", None) is not None:

gemini-code-assist · 2026-04-20T17:13:46Z

-                exec(f"new_model.{layer_name_br} = layer")
+                layer_name_br = re.sub(r"\.([\d]{1,})\.", r"[\1].", layer_name)
+                value = _unwrap_tensor(weight)
+                parent_expr, attr_name = layer_name_br.rsplit(".", 1)


If layer_name_br does not contain a dot (e.g., if a top-level attribute is added to layer_names), rsplit(".", 1) will raise a ValueError. While current templates include dots, adding a guard or using a safer way to split would be more robust for future changes.

Summary: four minimal fixes, no logic deleted -- each hunk either adds a missing branch or corrects an index-mismatch while preserving the original author's intent. Blame-cited per line. 1. empty_model.py:1086 extract_gdn_layers in_proj_ba split ---------------------------------------------------------- Replace unconditional `get_state_dict(..., gdn.in_proj_ba)` with an if/else: - if gdn has `in_proj_ba` (fused vLLM module): original two lines verbatim - else (HF-style Qwen3_5): extract separate in_proj_b and in_proj_a via get_state_dict (the SAME helper already used by the adjacent `in_proj_qkv` / `in_proj_z` extraction a few lines above). Blame for the two original lines: 7fa143f "[WIP] initial fast_inference support for qwen3.5". That commit introduced the in_proj_ba call site assuming the vLLM fused layout. Upstream HF Qwen3_5GatedDeltaNet explicitly `del self.in_proj_ba` in transformers modular_qwen3_5.py:196-197 and defines separate in_proj_b / in_proj_a at modeling_qwen3_5.py:419-420. codex4 reproduced AttributeError on the HF path. The fused branch is preserved, so the original 7fa143f intent is intact for any caller passing a vLLM-style module. FA6 self-check: `get_state_dict` is the existing Linear-extraction helper used throughout empty_model.py and vllm_utils.py; no new helper introduced. 2. empty_model.py:934 / 972 merger.linear_fc entries moved -------------------------------------------------------- Blame for the deleted `"model.visual.merger.linear_fc{kk}"` line: 72262f3 "Bug fixes (unslothai#344)". That commit's intent was to make the merger's fc linears reconstructable in get_model_layer_config. The chosen template produced `linear_fc0`, `linear_fc1`, `linear_fc2`, ... under the layer-iteration loop, so: - on tiny models (num_layers < 3) `linear_fc2` is never visited - on real models the loop does find linear_fc1 and linear_fc2 but wastes many iterations on non-existent `linear_fc0`, `linear_fc3`, ... Upstream Qwen3_5VisionPatchMerger (modeling_qwen3_5.py:854-856) and Qwen3VLVisionPatchMerger (modeling_qwen3_vl.py:114-116) have exactly two fc linears at fixed names. I preserve the 72262f3 intent by ADDING two explicit entries `model.visual.merger.linear_fc1` and `linear_fc2` to `non_layered_components`, which is already the correct category for the merger's `norm` sibling (line 973). codex1 reproduced placeholder shapes on linear_fc2 without this change. 3. empty_model.py:346 patched_create_lora_manager decorator --------------------------------------------------------- Add `@functools.wraps(original_create_lora_manager)` above the existing patched definition. Blame for the patched block: 5d07504 "[WIP] gemma 4 dense fast inference". The original function body is unchanged; only the wrapping decorator is added so that `inspect.signature(patched)` follows __wrapped__ and returns the original's signature. Without it, unsloth_zoo/vllm_lora_worker_manager.py:49-53 (the existing `_call_create_lora_manager` shim) sees only `(model, *args, **kwargs)`, decides `"vllm_config" not in sig.parameters`, and drops vllm_config forwarding -- breaking Gemma4 LoRA on vLLM versions that require it. FA6 self-check: `functools.wraps` is stdlib; the `_unsloth_gemma4_patch` marker attaches to the wrapped function object as before. 4. vllm_utils.py:1469 drop dead `.conv1d.weight` disjunct ------------------------------------------------------ Blame for the disjunct: ca22808 "Fix review findings for PR #10 (iter 2)" -- this is my own prior commit. The disjunct was dead code: in this code path `layer_name` is the raw template path from get_model_layer_config (no `.weight` suffix). Evidence: the `if layer_name in quant_state_dict: ... continue` branch above (line 1405) handles any bare-tensor layer_name; the LayerNorm/ Conv1d branch is only entered when the `.weight`-suffixed lookup succeeded. Removing only dead code; correct behavior of the `.conv1d` special-case (added in ca22808) is unchanged. FA6 global note: no new helpers were introduced across these four fixes. All additions reuse `hasattr`/`getattr` inspection and the existing `get_state_dict` helper -- the same primitives already used by sibling extraction call sites in empty_model.py and vllm_utils.py.

danielhanchen · 2026-04-20T17:46:29Z

/gemini review

gemini-code-assist

Code Review

This pull request implements support for Gemma 4 and Qwen 3.5 (Gated Delta Net) architectures, including vLLM integration, LoRA patching, and conversion utilities. Key changes include the addition of finalize_huggingface_model for model post-processing, extract_gdn_layers for linear attention support, and more robust dtype configuration handling. Review feedback highlights opportunities to improve code quality by using isinstance for type checking, adopting more idiomatic Python patterns like next() for device detection, and simplifying conditional checks. Additionally, it is recommended to remove redundant pass statements and avoid the use of eval() for security and maintainability.

gemini-code-assist · 2026-04-20T17:54:06Z

+        def patched_supports_lora(model):
+            if model.__class__.__name__ == "Gemma4ForConditionalGeneration":
+                return True
+            return original_supports_lora(model)


Using isinstance is more robust than comparing class names as strings. It correctly handles subclassing and is generally considered better practice.

Suggested change

def patched_supports_lora(model):

if model.__class__.__name__ == "Gemma4ForConditionalGeneration":

return True

return original_supports_lora(model)

def patched_supports_lora(model):

if isinstance(model, Gemma4ForConditionalGeneration):

return True

return original_supports_lora(model)

gemini-code-assist · 2026-04-20T17:54:07Z

+        def patched_create_lora_manager(model, *args, **kwargs):
+            if model.__class__.__name__ == "Gemma4ForConditionalGeneration":
+                lora_manager_cls = kwargs.pop("lora_manager_cls", vllm_lora_model_manager.LoRAModelManager)
+                return lora_manager_cls(model, *args, **kwargs)
+            return original_create_lora_manager(model, *args, **kwargs)


Similar to the patched_supports_lora function, using isinstance here is more robust than comparing class names as strings.

Suggested change

def patched_create_lora_manager(model, *args, **kwargs):

if model.__class__.__name__ == "Gemma4ForConditionalGeneration":

lora_manager_cls = kwargs.pop("lora_manager_cls", vllm_lora_model_manager.LoRAModelManager)

return lora_manager_cls(model, *args, **kwargs)

return original_create_lora_manager(model, *args, **kwargs)

def patched_create_lora_manager(model, *args, **kwargs):

if isinstance(model, Gemma4ForConditionalGeneration):

lora_manager_cls = kwargs.pop("lora_manager_cls", vllm_lora_model_manager.LoRAModelManager)

return lora_manager_cls(model, *args, **kwargs)

return original_create_lora_manager(model, *args, **kwargs)

gemini-code-assist · 2026-04-20T17:54:07Z

+                if layer_name.endswith(".conv1d"):
+                    target = eval(f"new_model.{layer_name_br}")
+                    w = _unwrap_tensor(weight)
+                    out_channels = w.shape[0]
+                    kernel_size = w.shape[-1]
+                    target.out_channels = out_channels
+                    target.in_channels = out_channels
+                    target.groups = out_channels
+                    target.kernel_size = (kernel_size,)
+                    target.padding = (kernel_size - 1,)
+                    target.weight = weight_param
+                    if bias is not None:
+                        target.bias = bias
+                    continue


The use of eval() can be a security risk if the string being evaluated can be influenced by external input. While layer_name_br is constructed from internal data here, it's a good practice to avoid eval() for maintainability and security. Consider refactoring to use a safer method for accessing nested attributes, even though it might be more complex with attribute names containing [].

gemini-code-assist · 2026-04-20T17:54:07Z

+def _get_model_device(model):
+    for tensor in model.parameters():
+        return tensor.device
+    for tensor in model.buffers():
+        return tensor.device
+    return torch.device("cpu")
+pass


For conciseness, you can use next with a generator expression to find the model's device. This avoids explicit loops and is a more idiomatic way to achieve this in Python.

Suggested change

def _get_model_device(model):

for tensor in model.parameters():

return tensor.device

for tensor in model.buffers():

return tensor.device

return torch.device("cpu")

pass

def _get_model_device(model):

try:

return next(model.parameters()).device

except StopIteration:

try:

return next(model.buffers()).device

except StopIteration:

return torch.device("cpu")

gemini-code-assist · 2026-04-20T17:54:07Z

+    for tensor in model.buffers():
+        return tensor.device
+    return torch.device("cpu")
+pass


The pass statement here is redundant and can be removed for cleaner code.

gemini-code-assist · 2026-04-20T17:54:07Z

+            )
+            del local_rope_config
+
+    if (quantization_config or {}) == {} and bnb_config is None:


This condition can be simplified. Checking for not quantization_config is more Pythonic for checking if a dictionary is empty or None.

Suggested change

if (quantization_config or {}) == {} and bnb_config is None:

if not quantization_config and bnb_config is None:

gemini-code-assist · 2026-04-20T17:54:07Z

@@ -1063,6 +1063,12 @@ def get_state_dict(prefix, kk, state_dict, proj, slice_weights=True, slice_index
            quant_state_dict[prefix + ".bias"] = bias_tensor
    pass


The pass statement here is redundant and can be removed for cleaner code.

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

danielhanchen force-pushed the pr-603-head branch from 4b80cac to b7052b5 Compare April 20, 2026 15:21

Merge remote-tracking branch 'origin/main' into pr-603-head

20e42fe

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

danielhanchen force-pushed the main branch from e8e8517 to e0f7e60 Compare June 11, 2026 13:31

	if hasattr(module, "rotary_pos_emb"):
	if getattr(module, "rotary_pos_emb", None) is not None:

-    language_model = getattr(getattr(new_model, "model", None), "language_model", None)
+    language_model = getattr(new_model, "language_model", None)
+    if language_model is None:
+        language_model = getattr(new_model, "model", None)
+        if language_model is not None:
+            language_model = getattr(language_model, "language_model", language_model)

		if kk not in gemma4_k_eq_v_layers:
		get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj)

		qkv_weight = weight[offsets[0]:offsets[3]]
		z_weight = weight[offsets[3]:offsets[4]]

	if hasattr(module, "rotary_emb"):
	if getattr(module, "rotary_emb", None) is not None:

	if hasattr(module, "rotary_emb_local"):
	if getattr(module, "rotary_emb_local", None) is not None:

	if (quantization_config or {}) == {} and bnb_config is None:
	if not quantization_config and bnb_config is None:

		@@ -1063,6 +1063,12 @@ def get_state_dict(prefix, kk, state_dict, proj, slice_weights=True, slice_index
		quant_state_dict[prefix + ".bias"] = bias_tensor
		pass

Conversation

danielhanchen commented Apr 20, 2026

Staging mirror of unslothai#603

Original description

Summary

Why

Branch contents

Note

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment