Skip to content

[tests] Review tests for PR #588#6

Closed
danielhanchen wants to merge 35 commits into
mainfrom
pr-588-tests
Closed

[tests] Review tests for PR #588#6
danielhanchen wants to merge 35 commits into
mainfrom
pr-588-tests

Conversation

@danielhanchen

Copy link
Copy Markdown
Owner

Automated test files from review process

Datta0 and others added 22 commits March 30, 2026 13:41
Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
…e_huggingface_model

- patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so
  _call_create_lora_manager's signature inspection still sees vllm_config; pass model
  positionally to lora_manager_cls to avoid "multiple values for 'model'".
- patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current
  upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed
  qkv_proj path as fallback.
- load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model),
  so text-only Gemma4 + LoRA / BnB also works.
- extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when
  ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead
  of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars;
  handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8;
  export linear_attn.norm.weight.
- finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path);
  rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard
  rotary_pos_emb on vision_config availability; mirror language_model detection from
  set_additional_modules.
- get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection /
  post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop
  the broken linear_fc{kk} template.
- set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to
  'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on
  current transformers.
- vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers
  (defensive) while still capturing layer_scalar.
- _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor)
  so non-tensor state-dict values pass through.
…n, finalize_huggingface_model

- patch_gemma4_vllm_lora_support: use functools.wraps on patched_create_lora_manager so
  _call_create_lora_manager's signature inspection still sees vllm_config; pass model
  positionally to lora_manager_cls to avoid "multiple values for 'model'".
- patch_gemma4_vllm_k_eq_v_support: also handle split k_proj/v_proj layout (current
  upstream Gemma4) by duplicating k quant-state to synthetic v entry; keep packed
  qkv_proj path as fallback.
- load_vllm: gate Gemma4 patches on enable_lora / use_bitsandbytes (not is_vision_model),
  so text-only Gemma4 + LoRA / BnB also works.
- extract_gdn_layers: derive qkvz offsets from gdn.key_dim/value_dim when
  ColumnParallelLinear has no output_sizes; manually split in_proj_ba into b/a instead
  of calling get_state_dict with kk=1 (IndexError); preserve BnB quant_state sidecars;
  handle FP8 weight_scale (not only weight_scale_inv) and dynamic/row-wise FP8;
  export linear_attn.norm.weight.
- finalize_huggingface_model: fix layer_idx for standard causal LMs (not only VLM path);
  rebuild Gemma4 vision rotary_emb from vision_config with fp32 buffers; guard
  rotary_pos_emb on vision_config availability; mirror language_model detection from
  set_additional_modules.
- get_model_layer_config: register Gemma4 per_layer_input_gate / per_layer_projection /
  post_per_layer_input_norm; add Qwen3.5 visual.merger.linear_fc1 / linear_fc2 and drop
  the broken linear_fc{kk} template.
- set_dtype_in_config (hf_utils): prefer the modern 'dtype' field; fall back to
  'torch_dtype' only when 'dtype' is absent, avoiding the deprecation warning on
  current transformers.
- vllm_utils state-dict loop: skip layer.mlp extraction for linear-attn-only layers
  (defensive) while still capturing layer_scalar.
- _normalize_state_dict_tensor: guard is_sparse behind isinstance(value, torch.Tensor)
  so non-tensor state-dict values pass through.
- hf_utils.set_dtype_in_config: store string (JSON-safe, keeps string
  comparisons in patch_model_and_tokenizer working); fix fallback
  else-branch that had the HAS_TORCH_DTYPE field selection inverted.
- empty_model.extract_gdn_layers: read bnb_quant_state off the raw
  Params4bit before unwrapping .data; emit weight.quant_state and FP8
  weight_scale(_inv) shards for the in_proj_b / in_proj_a split so
  quantized Qwen3.5 GDN layers round-trip correctly.
- vllm_utils.convert_vllm_to_huggingface: rebuild linear_attn.conv1d
  as a grouped Conv1d with real channels/kernel_size/groups/padding
  instead of treating it as a LayerNorm-style weight swap.
- empty_model.patch_gemma4_vllm_lora_support: soft-import
  vllm.v1.worker.lora_model_runner_mixin so older supported vLLM
  layouts keep working.
- vllm_utils._get_vllm_state_dict: extract Gemma4 per_layer_input_gate
  and per_layer_projection so converted HF models carry the real
  checkpoint weights.
- empty_model.finalize_huggingface_model: restrict dtype propagation
  to the top-level config and its known text/vision/audio subconfigs;
  consolidate the duplicated Gemma4 rotary re-init into one loop while
  keeping the post-.to(dtype) float32 buffer / attention_scaling
  restoration.
- vllm_utils.assert_same_state_dict: _normalize_state_dict_tensor now
  returns None for non-tensor entries (e.g. BnB QuantState dicts) and
  callers skip those; align tied-embedding fallback tolerances with
  the outer comparison (atol=1e-4, rtol=1e-3).
- vllm_utils._test_is_same_vlm: cast only floating-point tensors to
  model.dtype for Gemma3/Gemma4 processors, leaving integer inputs
  like pixel_values untouched.
- vllm_utils._get_vllm_state_dict: collapse the unreachable lm_head
  elif chain; hoist the constant model_type/attention_k_eq_v check
  out of the gemma4_k_eq_v_layers set comprehension.
- empty_model.get_model_layer_config: move model.visual.merger.
  linear_fc1 / linear_fc2 from additional_layers (which expected a
  {kk} placeholder) into non_layered_components.
# Conflicts:
#	unsloth_zoo/empty_model.py
#	unsloth_zoo/hf_utils.py
#	unsloth_zoo/vllm_utils.py
Apply 16 accepted review fixes across two files:

- set_additional_modules now honors non_layered_components explicitly so
  Qwen3-VL merger.linear_fc1/2 are restored instead of dropped by the
  generic "linear" substring filter.
- _get_vllm_state_dict moves layernorm extraction (and layer_scalar
  capture) above the no-mlp early-continue so layers without an mlp
  attribute still get their input/post layernorms exported.
- extract_gdn_layers dequantizes per-shard BnB QuantStates before
  concatenating into the fused in_proj_qkv weight, avoiding K/V being
  dequantized with Q's scales. The in_proj_ba single-shard merged-layer
  case now dequantizes and splits instead of silently dropping
  in_proj_a quant_state.
- Gemma4 top-level per-layer-input modules (embed_tokens_per_layer,
  per_layer_model_projection, per_layer_projection_norm) are added to
  non_layered_components and extracted from the vLLM text model.
- patch_gemma4_vllm_lora_support now also patches Gemma4ForCausalLM
  (when available) and guards class-level supports_lora /
  embedding_modules writes behind an idempotency flag.
- finalize_huggingface_model reapplies dtype to the live config tree
  after copy_attributes, switches vision-rotary detection from class
  equality to identity-based id() membership, and keeps inv_freq
  buffers at float32 for all archs (matching transformers default).
- convert_vllm_to_huggingface preserves buffer registration for
  layer_scalar-style entries instead of unconditionally wrapping them
  in nn.Parameter.
- assert_same_state_dict only relaxes tolerances on the dtype-mismatch
  / FP8 upcast branch; same-dtype comparisons keep torch defaults.
- Conv1d rebuild branch is qualified with linear_attn substring so it
  won't silently rebuild future non-GDN conv1d layers as depthwise.
- _test_is_same_vlm falls back to a synthetic PIL image when the
  remote sloth URL load_image fails, so the test runs offline.
Append 9 regression tests to tests/test_vllm_to_hf_conversion.py covering
the fixes applied during review:

- set_additional_modules now restores visual merger linear_fc1/2.
- _get_vllm_state_dict extracts layernorms even when a decoder layer
  lacks an mlp attribute.
- finalize_huggingface_model propagates dtype to live config tree after
  copy_attributes replaces the config object.
- finalize_huggingface_model uses identity-based vision rotary detection
  so text rotary is not misclassified when text and vision configs
  share a Python class.
- convert_vllm_to_huggingface preserves buffer registration for
  layer_scalar-style entries instead of converting them to nn.Parameter.
- assert_same_state_dict uses tight torch defaults for same-dtype
  comparisons; loose tolerance only applies on the FP8/dtype-mismatch
  upcast branch.
- Conv1d rebuild branch is qualified with linear_attn substring.
- patch_gemma4_vllm_lora_support now covers both
  Gemma4ForConditionalGeneration and Gemma4ForCausalLM.
- get_model_layer_config includes Gemma4 top-level per-layer-input
  modules in non_layered_components.

Also corrects the rotary inv_freq dtype assertion in
test_finalize_non_gemma4_rotary_buffers_follow_model_dtype to match the
new always-float32 behavior of finalize_huggingface_model.
# Conflicts:
#	unsloth_zoo/empty_model.py
#	unsloth_zoo/vllm_utils.py
- finalize_huggingface_model: guard Gemma4 multimodal rotary rebuild with
  try/except and broaden vision-rotary detection by module path, so a
  copy-attributes id() drift no longer reroutes a vision rotary through
  the text_config (which lacks the vision rope_parameters shape and
  crashed with KeyError 'rope_type' / NoneType ** Tensor).
- finalize_huggingface_model: lift float rotary buffers to float32 on
  all non-quantized models (not just Gemma4) after new_model.to(dtype),
  fixing an inv_freq / original_inv_freq downcast regression for e.g.
  Qwen3.5. Drops the redundant is_gemma4 fresh-rotary clone used only
  to re-copy attention_scaling (a Python float unaffected by .to).
- finalize_huggingface_model: hoist deepcopy(text_config) out of the
  rotary_emb_local loop so multi-layer Gemma3/4 models don't deepcopy
  the text config once per decoder layer.
- extract_gdn_layers: when dequantizing the fused in_proj_ba BnB shard,
  compute the b/a split midpoint on the dequantized tensor rather than
  the packed uint8 Params4bit buffer whose shape[0] is numel/2.
- _get_vllm_state_dict: match lm_head by exact name or .lm_head suffix
  instead of substring so unrelated submodule names containing
  'lm_head' cannot shadow the real head.
Trim WHAT-restatement comments and collapse a multi-line rationale
to one line stating the load-bearing fact. No behavioural change.
…vation, GDN dequantize midpoint, and lm_head exact match
# Conflicts:
#	unsloth_zoo/empty_model.py
#	unsloth_zoo/vllm_utils.py
Adds 12 regression tests covering the iter-1 hardening (trailing-digit
regex path, rotary reinit success guard, _is_gemma4_config helper,
gemma4 gate migration, gemma4_mm import guard, private loader attr
guard, HF-style k_eq_v prefix, lora manager delegation, behavioral
no-op tests that stub missing vLLM modules). Updates
test_gemma4_lora_patch_preserves_signature_for_inspect and
test_gemma4_k_eq_v_set_hoists_constant_check to match the new source
shape.
@danielhanchen

Copy link
Copy Markdown
Owner Author

Fixes pushed to unslothai#588.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants