Fix for save_steps, save_strategy, and resume_from_checkpoint#5
Closed
TheTsar8756 wants to merge 2 commits into
Closed
Fix for save_steps, save_strategy, and resume_from_checkpoint#5TheTsar8756 wants to merge 2 commits into
TheTsar8756 wants to merge 2 commits into
Conversation
Author
|
Since the gradient accumulation fix was added into huggingface transformers there is no need for this pr. |
danielhanchen
added a commit
to Datta0/unsloth-zoo
that referenced
this pull request
Apr 19, 2026
- hf_utils.set_dtype_in_config: store string (JSON-safe, keeps string
comparisons in patch_model_and_tokenizer working); fix fallback
else-branch that had the HAS_TORCH_DTYPE field selection inverted.
- empty_model.extract_gdn_layers: read bnb_quant_state off the raw
Params4bit before unwrapping .data; emit weight.quant_state and FP8
weight_scale(_inv) shards for the in_proj_b / in_proj_a split so
quantized Qwen3.5 GDN layers round-trip correctly.
- vllm_utils.convert_vllm_to_huggingface: rebuild linear_attn.conv1d
as a grouped Conv1d with real channels/kernel_size/groups/padding
instead of treating it as a LayerNorm-style weight swap.
- empty_model.patch_gemma4_vllm_lora_support: soft-import
vllm.v1.worker.lora_model_runner_mixin so older supported vLLM
layouts keep working.
- vllm_utils._get_vllm_state_dict: extract Gemma4 per_layer_input_gate
and per_layer_projection so converted HF models carry the real
checkpoint weights.
- empty_model.finalize_huggingface_model: restrict dtype propagation
to the top-level config and its known text/vision/audio subconfigs;
consolidate the duplicated Gemma4 rotary re-init into one loop while
keeping the post-.to(dtype) float32 buffer / attention_scaling
restoration.
- vllm_utils.assert_same_state_dict: _normalize_state_dict_tensor now
returns None for non-tensor entries (e.g. BnB QuantState dicts) and
callers skip those; align tied-embedding fallback tolerances with
the outer comparison (atol=1e-4, rtol=1e-3).
- vllm_utils._test_is_same_vlm: cast only floating-point tensors to
model.dtype for Gemma3/Gemma4 processors, leaving integer inputs
like pixel_values untouched.
- vllm_utils._get_vllm_state_dict: collapse the unreachable lm_head
elif chain; hoist the constant model_type/attention_k_eq_v check
out of the gemma4_k_eq_v_layers set comprehension.
- empty_model.get_model_layer_config: move model.visual.merger.
linear_fc1 / linear_fc2 from additional_layers (which expected a
{kk} placeholder) into non_layered_components.
mmathew23
pushed a commit
to mmathew23/unsloth-zoo
that referenced
this pull request
May 6, 2026
Feat/quant config
mmathew23
added a commit
to mmathew23/unsloth-zoo
that referenced
this pull request
May 22, 2026
Multi-reviewer pass on the autocast wrapper / norm-upcast path: - Instance-level forward (#2): an instance attribute `model.forward` (Unsloth runtime forward patching) shadows class-method overrides, so mutating __class__ silently bypassed the wrapper -> fp32 norm met a bf16 linear with no autocast and crashed. Now wrap the instance attribute when present; otherwise subclass as before. - Wrapper gating (unslothai#5, unslothai#7): install the wrapper iff fp32 norm params actually exist (from our upcast, the legacy env upcast, or an external _pre_set_compute_dtype policy) -- not on the upcast DECISION. Fixes the rollback path leaving external fp32 norms exposed, and stops wrapping models with no fp32 norm. Add _unwrap_forward_in_bf16_autocast for re-prepare (unslothai#10). - config.architectures leak (unslothai#8/unslothai#9): keep the original __name__ on the generated subclass (unique __qualname__ for registration) so save_pretrained records the base architecture. - Device detection (unslothai#11): recurse into mapping/list/tuple batches and fall back to the model's parameter device instead of defaulting to "cuda". - Legacy UNSLOTH_UPCAST_LAYERNORM (#1/unslothai#3/unslothai#4): route through the shared _cast_named_module + union matcher and honour the external-policy deferral. - Recursive external-ownership guard (unslothai#6): record descendants of tagged modules (the external policy casts recursively). - Fresh-interpreter pickle test (unslothai#12): real subprocess load. Shared helpers: _find_tensor_device_type, _call_forward_with_bf16_autocast, _canonical_module_name, _cast_named_module. Unit suite: 25 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently this seems to fix unsloth_train so that it can save checkpoints and use save_strategy. A function to allow resume_from_checkpoint is implemented in the code but doesn't work. After testing it on my machine it saves the checkpoints but the training_state json is empty.
Also thank you so much for making Unsloth!