[tests] Review tests for PR #588#6
Closed
danielhanchen wants to merge 35 commits into
Closed
Conversation
Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
…e_huggingface_model
- patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so
_call_create_lora_manager's signature inspection still sees vllm_config; pass model
positionally to lora_manager_cls to avoid "multiple values for 'model'".
- patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current
upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed
qkv_proj path as fallback.
- load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model),
so text-only Gemma4 + LoRA / BnB also works.
- extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when
ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead
of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars;
handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8;
export linear_attn.norm.weight.
- finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path);
rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard
rotary_pos_emb on vision_config availability; mirror language_model detection from
set_additional_modules.
- get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection /
post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop
the broken linear_fc{kk} template.
- set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to
'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on
current transformers.
- vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers
(defensive) while still capturing layer_scalar.
- _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor)
so non-tensor state-dict values pass through.
…n, finalize_huggingface_model
- patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so
_call_create_lora_manager's signature inspection still sees vllm_config; pass model
positionally to lora_manager_cls to avoid "multiple values for 'model'".
- patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current
upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed
qkv_proj path as fallback.
- load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model),
so text-only Gemma4 + LoRA / BnB also works.
- extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when
ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead
of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars;
handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8;
export linear_attn.norm.weight.
- finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path);
rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard
rotary_pos_emb on vision_config availability; mirror language_model detection from
set_additional_modules.
- get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection /
post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop
the broken linear_fc{kk} template.
- set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to
'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on
current transformers.
- vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers
(defensive) while still capturing layer_scalar.
- _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor)
so non-tensor state-dict values pass through.
- hf_utils.set_dtype_in_config: store string (JSON-safe, keeps string
comparisons in patch_model_and_tokenizer working); fix fallback
else-branch that had the HAS_TORCH_DTYPE field selection inverted.
- empty_model.extract_gdn_layers: read bnb_quant_state off the raw
Params4bit before unwrapping .data; emit weight.quant_state and FP8
weight_scale(_inv) shards for the in_proj_b / in_proj_a split so
quantized Qwen3.5 GDN layers round-trip correctly.
- vllm_utils.convert_vllm_to_huggingface: rebuild linear_attn.conv1d
as a grouped Conv1d with real channels/kernel_size/groups/padding
instead of treating it as a LayerNorm-style weight swap.
- empty_model.patch_gemma4_vllm_lora_support: soft-import
vllm.v1.worker.lora_model_runner_mixin so older supported vLLM
layouts keep working.
- vllm_utils._get_vllm_state_dict: extract Gemma4 per_layer_input_gate
and per_layer_projection so converted HF models carry the real
checkpoint weights.
- empty_model.finalize_huggingface_model: restrict dtype propagation
to the top-level config and its known text/vision/audio subconfigs;
consolidate the duplicated Gemma4 rotary re-init into one loop while
keeping the post-.to(dtype) float32 buffer / attention_scaling
restoration.
- vllm_utils.assert_same_state_dict: _normalize_state_dict_tensor now
returns None for non-tensor entries (e.g. BnB QuantState dicts) and
callers skip those; align tied-embedding fallback tolerances with
the outer comparison (atol=1e-4, rtol=1e-3).
- vllm_utils._test_is_same_vlm: cast only floating-point tensors to
model.dtype for Gemma3/Gemma4 processors, leaving integer inputs
like pixel_values untouched.
- vllm_utils._get_vllm_state_dict: collapse the unreachable lm_head
elif chain; hoist the constant model_type/attention_k_eq_v check
out of the gemma4_k_eq_v_layers set comprehension.
- empty_model.get_model_layer_config: move model.visual.merger.
linear_fc1 / linear_fc2 from additional_layers (which expected a
{kk} placeholder) into non_layered_components.
# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/hf_utils.py # unsloth_zoo/vllm_utils.py
b0ef09e to
d954d7b
Compare
Apply 16 accepted review fixes across two files: - set_additional_modules now honors non_layered_components explicitly so Qwen3-VL merger.linear_fc1/2 are restored instead of dropped by the generic "linear" substring filter. - _get_vllm_state_dict moves layernorm extraction (and layer_scalar capture) above the no-mlp early-continue so layers without an mlp attribute still get their input/post layernorms exported. - extract_gdn_layers dequantizes per-shard BnB QuantStates before concatenating into the fused in_proj_qkv weight, avoiding K/V being dequantized with Q's scales. The in_proj_ba single-shard merged-layer case now dequantizes and splits instead of silently dropping in_proj_a quant_state. - Gemma4 top-level per-layer-input modules (embed_tokens_per_layer, per_layer_model_projection, per_layer_projection_norm) are added to non_layered_components and extracted from the vLLM text model. - patch_gemma4_vllm_lora_support now also patches Gemma4ForCausalLM (when available) and guards class-level supports_lora / embedding_modules writes behind an idempotency flag. - finalize_huggingface_model reapplies dtype to the live config tree after copy_attributes, switches vision-rotary detection from class equality to identity-based id() membership, and keeps inv_freq buffers at float32 for all archs (matching transformers default). - convert_vllm_to_huggingface preserves buffer registration for layer_scalar-style entries instead of unconditionally wrapping them in nn.Parameter. - assert_same_state_dict only relaxes tolerances on the dtype-mismatch / FP8 upcast branch; same-dtype comparisons keep torch defaults. - Conv1d rebuild branch is qualified with linear_attn substring so it won't silently rebuild future non-GDN conv1d layers as depthwise. - _test_is_same_vlm falls back to a synthetic PIL image when the remote sloth URL load_image fails, so the test runs offline.
Append 9 regression tests to tests/test_vllm_to_hf_conversion.py covering the fixes applied during review: - set_additional_modules now restores visual merger linear_fc1/2. - _get_vllm_state_dict extracts layernorms even when a decoder layer lacks an mlp attribute. - finalize_huggingface_model propagates dtype to live config tree after copy_attributes replaces the config object. - finalize_huggingface_model uses identity-based vision rotary detection so text rotary is not misclassified when text and vision configs share a Python class. - convert_vllm_to_huggingface preserves buffer registration for layer_scalar-style entries instead of converting them to nn.Parameter. - assert_same_state_dict uses tight torch defaults for same-dtype comparisons; loose tolerance only applies on the FP8/dtype-mismatch upcast branch. - Conv1d rebuild branch is qualified with linear_attn substring. - patch_gemma4_vllm_lora_support now covers both Gemma4ForConditionalGeneration and Gemma4ForCausalLM. - get_model_layer_config includes Gemma4 top-level per-layer-input modules in non_layered_components. Also corrects the rotary inv_freq dtype assertion in test_finalize_non_gemma4_rotary_buffers_follow_model_dtype to match the new always-float32 behavior of finalize_huggingface_model.
# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py
- finalize_huggingface_model: guard Gemma4 multimodal rotary rebuild with try/except and broaden vision-rotary detection by module path, so a copy-attributes id() drift no longer reroutes a vision rotary through the text_config (which lacks the vision rope_parameters shape and crashed with KeyError 'rope_type' / NoneType ** Tensor). - finalize_huggingface_model: lift float rotary buffers to float32 on all non-quantized models (not just Gemma4) after new_model.to(dtype), fixing an inv_freq / original_inv_freq downcast regression for e.g. Qwen3.5. Drops the redundant is_gemma4 fresh-rotary clone used only to re-copy attention_scaling (a Python float unaffected by .to). - finalize_huggingface_model: hoist deepcopy(text_config) out of the rotary_emb_local loop so multi-layer Gemma3/4 models don't deepcopy the text config once per decoder layer. - extract_gdn_layers: when dequantizing the fused in_proj_ba BnB shard, compute the b/a split midpoint on the dequantized tensor rather than the packed uint8 Params4bit buffer whose shape[0] is numel/2. - _get_vllm_state_dict: match lm_head by exact name or .lm_head suffix instead of substring so unrelated submodule names containing 'lm_head' cannot shadow the real head.
Trim WHAT-restatement comments and collapse a multi-line rationale to one line stating the load-bearing fact. No behavioural change.
…vation, GDN dequantize midpoint, and lm_head exact match
# Conflicts: # unsloth_zoo/empty_model.py # unsloth_zoo/vllm_utils.py
Adds 12 regression tests covering the iter-1 hardening (trailing-digit regex path, rotary reinit success guard, _is_gemma4_config helper, gemma4 gate migration, gemma4_mm import guard, private loader attr guard, HF-style k_eq_v prefix, lora manager delegation, behavioral no-op tests that stub missing vLLM modules). Updates test_gemma4_lora_patch_preserves_signature_for_inspect and test_gemma4_k_eq_v_set_hoists_constant_check to match the new source shape.
Owner
Author
|
Fixes pushed to unslothai#588. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated test files from review process