[tests] Review tests for PR #588 by danielhanchen · Pull Request #6 · danielhanchen/unsloth-zoo-staging

danielhanchen · 2026-04-19T09:19:53Z

Automated test files from review process

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

…e_huggingface_model - patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so _call_create_lora_manager's signature inspection still sees vllm_config; pass model positionally to lora_manager_cls to avoid "multiple values for 'model'". - patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed qkv_proj path as fallback. - load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model), so text-only Gemma4 + LoRA / BnB also works. - extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars; handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8; export linear_attn.norm.weight. - finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path); rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard rotary_pos_emb on vision_config availability; mirror language_model detection from set_additional_modules. - get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection / post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop the broken linear_fc{kk} template. - set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to 'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on current transformers. - vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers (defensive) while still capturing layer_scalar. - _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor) so non-tensor state-dict values pass through.

…n, finalize_huggingface_model - patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so _call_create_lora_manager's signature inspection still sees vllm_config; pass model positionally to lora_manager_cls to avoid "multiple values for 'model'". - patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed qkv_proj path as fallback. - load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model), so text-only Gemma4 + LoRA / BnB also works. - extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars; handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8; export linear_attn.norm.weight. - finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path); rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard rotary_pos_emb on vision_config availability; mirror language_model detection from set_additional_modules. - get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection / post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop the broken linear_fc{kk} template. - set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to 'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on current transformers. - vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers (defensive) while still capturing layer_scalar. - _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor) so non-tensor state-dict values pass through.

- hf_utils.set_dtype_in_config: store string (JSON-safe, keeps string comparisons in patch_model_and_tokenizer working); fix fallback else-branch that had the HAS_TORCH_DTYPE field selection inverted. - empty_model.extract_gdn_layers: read bnb_quant_state off the raw Params4bit before unwrapping .data; emit weight.quant_state and FP8 weight_scale(_inv) shards for the in_proj_b / in_proj_a split so quantized Qwen3.5 GDN layers round-trip correctly. - vllm_utils.convert_vllm_to_huggingface: rebuild linear_attn.conv1d as a grouped Conv1d with real channels/kernel_size/groups/padding instead of treating it as a LayerNorm-style weight swap. - empty_model.patch_gemma4_vllm_lora_support: soft-import vllm.v1.worker.lora_model_runner_mixin so older supported vLLM layouts keep working. - vllm_utils._get_vllm_state_dict: extract Gemma4 per_layer_input_gate and per_layer_projection so converted HF models carry the real checkpoint weights. - empty_model.finalize_huggingface_model: restrict dtype propagation to the top-level config and its known text/vision/audio subconfigs; consolidate the duplicated Gemma4 rotary re-init into one loop while keeping the post-.to(dtype) float32 buffer / attention_scaling restoration. - vllm_utils.assert_same_state_dict: _normalize_state_dict_tensor now returns None for non-tensor entries (e.g. BnB QuantState dicts) and callers skip those; align tied-embedding fallback tolerances with the outer comparison (atol=1e-4, rtol=1e-3). - vllm_utils._test_is_same_vlm: cast only floating-point tensors to model.dtype for Gemma3/Gemma4 processors, leaving integer inputs like pixel_values untouched. - vllm_utils._get_vllm_state_dict: collapse the unreachable lm_head elif chain; hoist the constant model_type/attention_k_eq_v check out of the gemma4_k_eq_v_layers set comprehension. - empty_model.get_model_layer_config: move model.visual.merger. linear_fc1 / linear_fc2 from additional_layers (which expected a {kk} placeholder) into non_layered_components.

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/hf_utils.py # unsloth_zoo/vllm_utils.py

Apply 16 accepted review fixes across two files: - set_additional_modules now honors non_layered_components explicitly so Qwen3-VL merger.linear_fc1/2 are restored instead of dropped by the generic "linear" substring filter. - _get_vllm_state_dict moves layernorm extraction (and layer_scalar capture) above the no-mlp early-continue so layers without an mlp attribute still get their input/post layernorms exported. - extract_gdn_layers dequantizes per-shard BnB QuantStates before concatenating into the fused in_proj_qkv weight, avoiding K/V being dequantized with Q's scales. The in_proj_ba single-shard merged-layer case now dequantizes and splits instead of silently dropping in_proj_a quant_state. - Gemma4 top-level per-layer-input modules (embed_tokens_per_layer, per_layer_model_projection, per_layer_projection_norm) are added to non_layered_components and extracted from the vLLM text model. - patch_gemma4_vllm_lora_support now also patches Gemma4ForCausalLM (when available) and guards class-level supports_lora / embedding_modules writes behind an idempotency flag. - finalize_huggingface_model reapplies dtype to the live config tree after copy_attributes, switches vision-rotary detection from class equality to identity-based id() membership, and keeps inv_freq buffers at float32 for all archs (matching transformers default). - convert_vllm_to_huggingface preserves buffer registration for layer_scalar-style entries instead of unconditionally wrapping them in nn.Parameter. - assert_same_state_dict only relaxes tolerances on the dtype-mismatch / FP8 upcast branch; same-dtype comparisons keep torch defaults. - Conv1d rebuild branch is qualified with linear_attn substring so it won't silently rebuild future non-GDN conv1d layers as depthwise. - _test_is_same_vlm falls back to a synthetic PIL image when the remote sloth URL load_image fails, so the test runs offline.

Append 9 regression tests to tests/test_vllm_to_hf_conversion.py covering the fixes applied during review: - set_additional_modules now restores visual merger linear_fc1/2. - _get_vllm_state_dict extracts layernorms even when a decoder layer lacks an mlp attribute. - finalize_huggingface_model propagates dtype to live config tree after copy_attributes replaces the config object. - finalize_huggingface_model uses identity-based vision rotary detection so text rotary is not misclassified when text and vision configs share a Python class. - convert_vllm_to_huggingface preserves buffer registration for layer_scalar-style entries instead of converting them to nn.Parameter. - assert_same_state_dict uses tight torch defaults for same-dtype comparisons; loose tolerance only applies on the FP8/dtype-mismatch upcast branch. - Conv1d rebuild branch is qualified with linear_attn substring. - patch_gemma4_vllm_lora_support now covers both Gemma4ForConditionalGeneration and Gemma4ForCausalLM. - get_model_layer_config includes Gemma4 top-level per-layer-input modules in non_layered_components. Also corrects the rotary inv_freq dtype assertion in test_finalize_non_gemma4_rotary_buffers_follow_model_dtype to match the new always-float32 behavior of finalize_huggingface_model.

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

- finalize_huggingface_model: guard Gemma4 multimodal rotary rebuild with try/except and broaden vision-rotary detection by module path, so a copy-attributes id() drift no longer reroutes a vision rotary through the text_config (which lacks the vision rope_parameters shape and crashed with KeyError 'rope_type' / NoneType ** Tensor). - finalize_huggingface_model: lift float rotary buffers to float32 on all non-quantized models (not just Gemma4) after new_model.to(dtype), fixing an inv_freq / original_inv_freq downcast regression for e.g. Qwen3.5. Drops the redundant is_gemma4 fresh-rotary clone used only to re-copy attention_scaling (a Python float unaffected by .to). - finalize_huggingface_model: hoist deepcopy(text_config) out of the rotary_emb_local loop so multi-layer Gemma3/4 models don't deepcopy the text config once per decoder layer. - extract_gdn_layers: when dequantizing the fused in_proj_ba BnB shard, compute the b/a split midpoint on the dequantized tensor rather than the packed uint8 Params4bit buffer whose shape[0] is numel/2. - _get_vllm_state_dict: match lm_head by exact name or .lm_head suffix instead of substring so unrelated submodule names containing 'lm_head' cannot shadow the real head.

Trim WHAT-restatement comments and collapse a multi-line rationale to one line stating the load-bearing fact. No behavioural change.

…vation, GDN dequantize midpoint, and lm_head exact match

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

Adds 12 regression tests covering the iter-1 hardening (trailing-digit regex path, rotary reinit success guard, _is_gemma4_config helper, gemma4 gate migration, gemma4_mm import guard, private loader attr guard, HF-style k_eq_v prefix, lora manager delegation, behavioral no-op tests that stub missing vLLM modules). Updates test_gemma4_lora_patch_preserves_signature_for_inspect and test_gemma4_k_eq_v_set_hoists_constant_check to match the new source shape.

danielhanchen · 2026-04-20T04:13:50Z

Fixes pushed to unslothai#588.

Datta0 and others added 22 commits March 30, 2026 13:41

[WIP] initial fast_inference support for qwen3.5

7fa143f

[WIP] fixes for rope deltas

31c1bc3

cleanup

112f1b5

fix lm_head detection and remove moe

a31dee8

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

[WIP] gemma 4 dense fast inference

5d07504

Fix dtype setting

a85a4f4

Merge remote-tracking branch 'origin/main' into qwen35-shared-final

d2f7983

Fix gemma4 load on vllm 0.19.0

43ed24f

fix bnb loader for gemam4

b7052b5

Add review tests

68a02c3

Consolidate review tests into test_vllm_to_hf_conversion.py

cebdaf3

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

6b97a7c

Split: keep only 1 file(s)

ae3a9c6

Trim PR review comments in empty_model.py

718b6c1

Consolidate review tests into test_vllm_to_hf_conversion.py

8e18b97

Rephrase upstream issue reference to avoid bare-hash scan trigger

88a2b53

Rephrase upstream issue reference to avoid bare-hash scan trigger

b7e2b9e

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

a6f73ac

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/hf_utils.py # unsloth_zoo/vllm_utils.py

Split: keep only 1 file(s)

d954d7b

danielhanchen force-pushed the pr-588-tests branch from b0ef09e to d954d7b Compare April 19, 2026 23:35

danielhanchen added 7 commits April 20, 2026 00:12

Rephrase upstream issue reference to avoid bare-hash scan trigger

b005f62

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

e8a5bb6

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

Split: keep only 1 file(s)

120768b

Comment hygiene pass

4c2dc75

Trim WHAT-restatement comments and collapse a multi-line rationale to one line stating the load-bearing fact. No behavioural change.

danielhanchen added 6 commits April 20, 2026 01:32

Add regression tests for Gemma4 rotary safety, non-Gemma4 fp32 preser…

6df4f73

…vation, GDN dequantize midpoint, and lm_head exact match

Rephrase upstream issue reference to avoid bare-hash scan trigger

822e656

Merge remote-tracking branch 'staging/pr-588-tests' into pr-588-head

086aa52

# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py

Split: keep only 1 file(s)

3584fd0

Rephrase upstream issue reference to avoid bare-hash scan trigger

6ce4a0a

danielhanchen closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] Review tests for PR #588#6

[tests] Review tests for PR #588#6
danielhanchen wants to merge 35 commits into
mainfrom
pr-588-tests

danielhanchen commented Apr 19, 2026

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented Apr 19, 2026

Uh oh!

danielhanchen commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants