[Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference by Datta0 · Pull Request #588 · unslothai/unsloth-zoo

Datta0 · 2026-04-10T08:23:09Z

fast inference support for Qwen 3.5 with vLLM :)
Tried to make sure the changes are minimal so when we detect linear_attn, we hand it off to a separate function

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

gemini-code-assist

Code Review

This pull request adds support for Qwen 3.5 and Gated Delta Net (GDN) architectures, including new configuration attributes, layer extraction logic for GDN, and enhanced vLLM-to-Hugging Face conversion. Key changes include the implementation of extract_gdn_layers, improved lm_head detection, and clearing rope_deltas during RL training. Review feedback suggests using .detach() instead of .data for safer tensor extraction and notes a semantic naming issue where conv1d was added to a list of normalization layers.

gemini-code-assist · 2026-04-10T08:25:30Z

+    conv_w = gdn.conv1d.weight.data
+    state_dict[f"{prefix}.conv1d.weight"] = conv_w
+    quant_state_dict[f"{prefix}.conv1d.weight"] = conv_w
+
+    state_dict[f"{prefix}.dt_bias"] = gdn.dt_bias.data
+    quant_state_dict[f"{prefix}.dt_bias"] = gdn.dt_bias.data
+    state_dict[f"{prefix}.A_log"] = gdn.A_log.data
+    quant_state_dict[f"{prefix}.A_log"] = gdn.A_log.data


Using .data is generally discouraged in modern PyTorch as it can bypass safety checks and lead to unexpected behavior. It is recommended to use .detach() instead to obtain a tensor that does not track gradients, which is sufficient for state dict extraction.

Suggested change

conv_w = gdn.conv1d.weight.data

state_dict[f"{prefix}.conv1d.weight"] = conv_w

quant_state_dict[f"{prefix}.conv1d.weight"] = conv_w

state_dict[f"{prefix}.dt_bias"] = gdn.dt_bias.data

quant_state_dict[f"{prefix}.dt_bias"] = gdn.dt_bias.data

state_dict[f"{prefix}.A_log"] = gdn.A_log.data

quant_state_dict[f"{prefix}.A_log"] = gdn.A_log.data

conv_w = gdn.conv1d.weight.detach()

state_dict[f"{prefix}.conv1d.weight"] = conv_w

quant_state_dict[f"{prefix}.conv1d.weight"] = conv_w

state_dict[f"{prefix}.dt_bias"] = gdn.dt_bias.detach()

quant_state_dict[f"{prefix}.dt_bias"] = gdn.dt_bias.detach()

state_dict[f"{prefix}.A_log"] = gdn.A_log.detach()

quant_state_dict[f"{prefix}.A_log"] = gdn.A_log.detach()

gemini-code-assist · 2026-04-10T08:25:30Z

        "norm1",              # Qwen2.5-VL vision encoder
        "norm2",              # Qwen2.5-VL vision encoder
        "norm",
+        "conv1d",


Adding "conv1d" to the layernorm_names list is semantically incorrect as it is a convolutional layer, not a normalization layer. While the current implementation reuses the direct parameter assignment logic in the else block (line 1421) which works for conv1d.weight, this naming is misleading and could lead to maintenance issues. Consider renaming the list to something more generic like simple_parameter_modules or adding a comment to explain this exception.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7052b55a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T10:49:54Z

+        qkv_weight = weight[offsets[0]:offsets[3]]
+        z_weight = weight[offsets[3]:offsets[4]]
+        store(f"{prefix}.in_proj_qkv.weight", qkv_weight)
+        store(f"{prefix}.in_proj_z.weight", z_weight)


Preserve quant states when splitting GDN fused qkvz

When linear_attn.in_proj_qkvz is present, this path slices proj.weight into in_proj_qkv/in_proj_z and stores only raw weights, but it never carries over per-shard bnb_quant_state. For 4-bit/BnB checkpoints this means these layers lose their quant metadata, and convert_vllm_to_huggingface will no longer take the Linear4bit branch for them, producing incorrect dense modules (often with uint8-backed weights) and broken inference/training numerics on Qwen3.5 GDN models.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T10:49:54Z

+    if hasattr(
+        BitsAndBytesModelLoader._stack_quantization_states,
+        "_unsloth_gemma4_k_eq_v_patch",


Guard Gemma4 k_eq_v patch for missing loader method

This accesses BitsAndBytesModelLoader._stack_quantization_states directly before verifying the attribute exists. Because load_vllm now calls this patch unconditionally for Gemma4 vision models, environments with a different vLLM loader API will raise AttributeError during model load (even when not using 4-bit quantization). Add a defensive getattr/version gate and no-op when the method is unavailable.

Useful? React with 👍 / 👎.

danielhanchen · 2026-04-19T10:33:04Z

Possible duplicate of a trusted maintainer's PR. This PR looks like it solves the same underlying problem as unslothai/unsloth-zoo#580 by @danielhanchen (trusted maintainer).

Directly patches Gemma-4 temporary inference code in the same gemma4.py path and fixes the same runtime failure class.

Canonical PR summary: This PR fixes Gemma-4 inference and generation crashes by patching the temporary compatibility layer so zero num_kv_shared_layers builds caches correctly and KV-sharing models still work with use_cache=False. It does this with a config proxy and cache/forward wrappers that inject and then strip temporary caches in unsloth_zoo/temporary_patches/gemma4.py.

The auto-review is still running against this PR — reviewers will factor in the canonical above. If this PR is genuinely different, call out the delta in the review discussion so the maintainer can decide which to merge.

…e_huggingface_model - patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so _call_create_lora_manager's signature inspection still sees vllm_config; pass model positionally to lora_manager_cls to avoid "multiple values for 'model'". - patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed qkv_proj path as fallback. - load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model), so text-only Gemma4 + LoRA / BnB also works. - extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars; handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8; export linear_attn.norm.weight. - finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path); rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard rotary_pos_emb on vision_config availability; mirror language_model detection from set_additional_modules. - get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection / post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop the broken linear_fc{kk} template. - set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to 'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on current transformers. - vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers (defensive) while still capturing layer_scalar. - _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor) so non-tensor state-dict values pass through.

…n, finalize_huggingface_model - patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so _call_create_lora_manager's signature inspection still sees vllm_config; pass model positionally to lora_manager_cls to avoid "multiple values for 'model'". - patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed qkv_proj path as fallback. - load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model), so text-only Gemma4 + LoRA / BnB also works. - extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars; handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8; export linear_attn.norm.weight. - finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path); rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard rotary_pos_emb on vision_config availability; mirror language_model detection from set_additional_modules. - get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection / post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop the broken linear_fc{kk} template. - set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to 'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on current transformers. - vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers (defensive) while still capturing layer_scalar. - _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor) so non-tensor state-dict values pass through.

danielhanchen · 2026-04-19T11:18:10Z

Auto-review verdict: Approved

PR #588 adds vLLM-to-HF fast inference support for Qwen3.5 (Gated Delta Net linear attention) and Gemma4 VLMs (LoRA + k_eq_v BnB patches), plus consolidates post-conversion rotary/device setup into a new finalize_huggingface_model helper. It meaningfully extends Unsloth's model coverage for two important next-gen architectures with correct conversion behavior after the review fixes.

Reason: All 20 accepted findings fixed on pr-588-head (commit e4f530c); 14 consolidated tests pass on fixed code and catch the pre-fix regressions.

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

danielhanchen · 2026-04-20T01:36:10Z

Auto-review verdict: Approved

PR adds vLLM-to-HF conversion support for Qwen 3.5 Gated Delta Net and Gemma4 (LoRA, k_eq_v BnB, per-layer inputs) plus a unified finalize_huggingface_model helper; it unblocks fast inference for both new model families and, with the applied fixes to rotary handling, GDN dequantize slicing, and lm_head lookup, is correct and ready to merge.

Reason: All real review findings were fixed; no unfixed regressions remain and 43/43 tests pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 086aa52c82

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T01:46:59Z

+            "model.visual.merger.linear_fc1",
+            "model.visual.merger.linear_fc2",


Keep merger FCs on quant-aware conversion path

Moving model.visual.merger.linear_fc1/linear_fc2 into non_layered_components routes them through set_additional_modules, which assigns raw tensors as nn.Parameter and never consults *.weight.quant_state (see the generic assignment path in set_additional_modules). For prequantized 4-bit checkpoints, these FCs are therefore rebuilt as plain nn.Linear with packed uint8 weights instead of Linear4bit, which can trigger dtype/runtime errors or silently wrong numerics in Qwen3-VL conversion.

Useful? React with 👍 / 👎.

danielhanchen · 2026-04-20T03:07:56Z

Possible duplicate of a trusted maintainer's PR. This PR looks like it solves the same underlying problem as unslothai/unsloth-zoo#580 by @danielhanchen (trusted maintainer).

Gemma4 inference crash fix on the same runtime path; target adds Gemma4 conversion and finalization support.

Canonical PR summary: This PR fixes Gemma-4 inference and generation crashes by patching the temporary compatibility layer so zero num_kv_shared_layers builds caches correctly and KV-sharing models still work with use_cache=False. It does this with a config proxy and cache/forward wrappers that inject and then strip temporary caches in unsloth_zoo/temporary_patches/gemma4.py.

Other related PRs:

unslothai/unsloth-zoo#531 — @danielhanchen [trusted] — Direct overlap on vLLM-to-HF state extraction and empty-model reconstruction for unsupported architectures.

The auto-review is still running against this PR — reviewers will factor in the canonical above. If this PR is genuinely different, call out the delta in the review discussion so the maintainer can decide which to merge.

- vllm_utils.py: convert_vllm_to_huggingface parameter-list path regex now handles trailing-digit segments (e.g. `embed_tokens_per_layer.0`) via the same anchor-or-end pattern used below for `exec` assignment. - empty_model.py: finalize_huggingface_model rotary reinit no longer swallows failures silently; the float32 buffer lift is skipped when the reinit raised, and the exception is logged so wrong-shape rotary state does not propagate. - empty_model.py: patch_gemma4_vllm_lora_support guards both the `gemma4_mm` and `gemma4` imports, does not clobber a pre-existing `embedding_modules` registry, and delegates to the original `create_lora_manager` so vLLM shim kwargs (e.g. `vllm_config`) reach the manager constructor correctly. - empty_model.py: patch_gemma4_vllm_k_eq_v_support uses getattr for `_stack_quantization_states` so absent / renamed private attrs do not crash, and adds `model.language_model` to the k/v name-prefix search so HF-style Gemma4 multimodal parameters match. - empty_model.py + vllm_utils.py: both Gemma4 gates share a new `_is_gemma4_config` helper that accepts `gemma4` and the text-only `gemma4_text` model_type. - vllm_utils.py: Gemma4 patch block runs after the BnB autodetect so that `-bnb-4bit` / `quant_method == bitsandbytes` models trigger the k_eq_v patch even when the caller did not pass use_bitsandbytes=True.

…in set_additional_modules

Adds 12 regression tests covering the iter-1 hardening (trailing-digit regex path, rotary reinit success guard, _is_gemma4_config helper, gemma4 gate migration, gemma4_mm import guard, private loader attr guard, HF-style k_eq_v prefix, lora manager delegation, behavioral no-op tests that stub missing vLLM modules). Updates test_gemma4_lora_patch_preserves_signature_for_inspect and test_gemma4_k_eq_v_set_hoists_constant_check to match the new source shape.

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

danielhanchen · 2026-04-20T04:14:28Z

Auto-review verdict: Approved

Adds vLLM <-> HF fast-inference support for Qwen 3.5 (Gated Delta Net linear-attention extraction) and Gemma 4 (per-layer input modules, k_eq_v attention, LoRA patch), plus a factored-out finalize_huggingface_model that fixes layer_idx, rotary reinit, and dtype propagation. Useful because it unlocks conversion/inference for two newly-supported architectures while the refactor cleans up rotary / dtype / lm_head paths that were already fragile.

Reason: All 9 accepted review findings fixed and verified by 55/55 passing tests; merge-back landed cleanly on upstream.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba822fcb87

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T04:24:55Z

+            rotary_emb = getattr(module, "rotary_emb", None)
+            if rotary_emb is None:
+                continue
+            for buffer_name, buffer in list(rotary_emb._buffers.items()):
+                if torch.is_tensor(buffer) and buffer.is_floating_point():


Keep rotary_emb_local buffers in float32 after final cast

finalize_huggingface_model reinitializes rotary_emb_local earlier, but in the final non-quantized path it calls new_model.to(..., dtype=dtype) and then only restores float32 buffers for rotary_emb. Models that expose rotary_emb_local (local/global RoPE split variants) will therefore have local RoPE buffers downcast to bf16/fp16, which regresses RoPE precision versus the pre-cast state and can subtly change generation quality.

Useful? React with 👍 / 👎.

danielhanchen · 2026-04-20T08:38:57Z

Auto-review verdict: Approved

Adds vLLM fast-inference support for Qwen3.5 (Gated Delta Net linear-attention extraction/rebuild) and Gemma4 (k_eq_v attention, per-layer input modules, LoRA + BnB vLLM patches), plus a centralized finalize_huggingface_model for dtype/rotary/layer_idx cleanup. Review iterations added defensive GDN quant-state handling, restored in_proj_z emission on the multi-quant path, routed merger.linear_fc1/fc2 through the quantized loop, and extended fp32 lift to rotary_emb_local and rotary_pos_emb.

Reason: All real issues found during review were fixed and verified by tests; no remaining regressions.

Datta0 · 2026-04-21T04:04:37Z

danielhanchen#5
Still have unaddressed comments from here

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93f1143594

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T07:12:12Z

+                if kk not in gemma4_k_eq_v_layers and kk not in gemma4_kv_shared_layers:
+                    get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj)


Extract v_proj weights for Gemma4 k_eq_v layers

Do not skip v_proj extraction for attention_k_eq_v layers here: Gemma4’s split-attention layout still exposes v_proj weights in many builds, and this guard drops them from quant_state_dict. During convert_vllm_to_huggingface, missing ...self_attn.v_proj.weight entries are silently skipped, so those layers keep the tiny empty-model placeholders instead of real weights, which corrupts converted Gemma4 checkpoints (especially full-attention layers with attention_k_eq_v=True).

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1680006be3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T08:42:58Z

+    if _is_gemma4_config(config) and getattr(text_config, "attention_k_eq_v", False):
+        gemma4_k_eq_v_layers = {
+            kk
+            for kk, layer_type in enumerate(getattr(text_config, "layer_types", ()))
+            if layer_type == "full_attention"


Guard Gemma4 layer_types iteration against None

When attention_k_eq_v is enabled, this comprehension iterates text_config.layer_types directly; on Gemma4 configs where layer_types is present but set to None (the documented optional/default form), enumerate(None) raises TypeError and aborts state-dict extraction. This makes conversion fail before any weights are exported for those checkpoints, so this path should coerce None to an empty/default iterable (for example getattr(..., None) or ()).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T08:42:58Z

+
+        param_names = set(name for name, _ in model.named_parameters())
+        pairs = []
+        for layer_idx, layer_type in enumerate(getattr(text_config, "layer_types", ())):


Handle None layer_types in Gemma4 k_eq_v loader patch

The Gemma4 bitsandbytes patch assumes text_config.layer_types is iterable whenever attention_k_eq_v is true; if that field is None, this loop throws TypeError during vLLM quant-state stacking, causing Gemma4 4-bit loading to fail before model initialization completes. The loop should defensively treat a None value as empty/default instead of iterating it directly.

Useful? React with 👍 / 👎.

Datta0 added 4 commits March 30, 2026 13:41

[WIP] initial fast_inference support for qwen3.5

7fa143f

[WIP] fixes for rope deltas

31c1bc3

cleanup

112f1b5

fix lm_head detection and remove moe

a31dee8

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

[WIP] gemma 4 dense fast inference

5d07504

Datta0 mentioned this pull request Apr 14, 2026

Gemma 4 31B GRPO: chunked_hidden_states_selective_log_softmax workspace OOM on 32 GB VRAM unslothai/unsloth#4985

Closed

Datta0 changed the title ~~[Qwen 3.5] Qwen35 fast inference~~ [Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference Apr 15, 2026

Datta0 added 4 commits April 17, 2026 06:48

Fix dtype setting

a85a4f4

Merge remote-tracking branch 'origin/main' into qwen35-shared-final

d2f7983

Fix gemma4 load on vllm 0.19.0

43ed24f

fix bnb loader for gemam4

b7052b5

Datta0 marked this pull request as ready for review April 17, 2026 10:44

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

danielhanchen mentioned this pull request Apr 19, 2026

[Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference danielhanchen/unsloth-zoo-staging-1#5

Open

danielhanchen added the auto-reviewing Auto-review in progress label Apr 19, 2026

danielhanchen added a commit to danielhanchen/unsloth-zoo-staging-1 that referenced this pull request Apr 19, 2026

Add review tests for PR unslothai#588

b0ef09e

danielhanchen mentioned this pull request Apr 19, 2026

[Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference shimmyshimmer/unsloth-zoo-staging-2#3

Closed

danielhanchen added the auto-has-duplicate Pre-flight: similar to a trusted maintainer's PR label Apr 19, 2026

danielhanchen added 3 commits April 19, 2026 11:07

Add review tests

68a02c3

danielhanchen added a commit to shimmyshimmer/unsloth-zoo-staging-2 that referenced this pull request Apr 19, 2026

Add review tests for PR unslothai#588

a077b67

danielhanchen added 2 commits April 19, 2026 11:15

Consolidate review tests into test_vllm_to_hf_conversion.py

cebdaf3

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

6b97a7c

danielhanchen mentioned this pull request Apr 19, 2026

[tests] Review tests for PR #588 shimmyshimmer/unsloth-zoo-staging-2#4

Closed

danielhanchen removed the auto-reviewing Auto-review in progress label Apr 19, 2026

danielhanchen added the auto-approved Auto-review approved the PR label Apr 19, 2026

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

086aa52

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

danielhanchen removed the auto-reviewing Auto-review in progress label Apr 20, 2026

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

Split: keep only 1 file(s)

3584fd0

danielhanchen added the auto-reviewing Auto-review in progress label Apr 20, 2026

danielhanchen added 5 commits April 20, 2026 03:40

Drop redundant section-header comments and stale commented-out block …

4a8dc86

…in set_additional_modules

Rephrase upstream issue reference to avoid bare-hash scan trigger

6ce4a0a

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

ba822fc

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

danielhanchen mentioned this pull request Apr 20, 2026

[tests] Review tests for PR #588 danielhanchen/unsloth-zoo-staging-1#6

Closed

danielhanchen removed the auto-reviewing Auto-review in progress label Apr 20, 2026

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

This was referenced Apr 20, 2026

[Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference danielhanchen/unsloth-zoo-staging-1#8

Open

[tests] [Qwen 3.5][gemma4] Qwen35 and Gemma 4 fast inference danielhanchen/unsloth-zoo-staging-1#9

Open

danielhanchen added auto-reviewing Auto-review in progress and removed auto-reviewing Auto-review in progress labels Apr 20, 2026

Fix dense vLLM state dict parity

93f1143

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

Make exact vLLM state dict test cover dense variants

1680006

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

danielhanchen mentioned this pull request May 22, 2026

PR-588 cross-platform CI sandbox (Ubuntu / macOS / Windows) danielhanchen/unsloth-staging-2#140

Open

4 tasks

		"model.visual.merger.linear_fc1",
		"model.visual.merger.linear_fc2",

		if kk not in gemma4_k_eq_v_layers and kk not in gemma4_kv_shared_layers:
		get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj)

Conversation

Datta0 commented Apr 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 19, 2026

Uh oh!

danielhanchen commented Apr 19, 2026

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

Datta0 commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants