Fix for save_steps, save_strategy, and resume_from_checkpoint by TheTsar8756 · Pull Request #5 · unslothai/unsloth-zoo

TheTsar8756 · 2024-10-18T04:55:49Z

Currently this seems to fix unsloth_train so that it can save checkpoints and use save_strategy. A function to allow resume_from_checkpoint is implemented in the code but doesn't work. After testing it on my machine it saves the checkpoints but the training_state json is empty.

Also thank you so much for making Unsloth!

TheTsar8756 · 2024-10-19T04:00:23Z

Since the gradient accumulation fix was added into huggingface transformers there is no need for this pr.

- hf_utils.set_dtype_in_config: store string (JSON-safe, keeps string comparisons in patch_model_and_tokenizer working); fix fallback else-branch that had the HAS_TORCH_DTYPE field selection inverted. - empty_model.extract_gdn_layers: read bnb_quant_state off the raw Params4bit before unwrapping .data; emit weight.quant_state and FP8 weight_scale(_inv) shards for the in_proj_b / in_proj_a split so quantized Qwen3.5 GDN layers round-trip correctly. - vllm_utils.convert_vllm_to_huggingface: rebuild linear_attn.conv1d as a grouped Conv1d with real channels/kernel_size/groups/padding instead of treating it as a LayerNorm-style weight swap. - empty_model.patch_gemma4_vllm_lora_support: soft-import vllm.v1.worker.lora_model_runner_mixin so older supported vLLM layouts keep working. - vllm_utils._get_vllm_state_dict: extract Gemma4 per_layer_input_gate and per_layer_projection so converted HF models carry the real checkpoint weights. - empty_model.finalize_huggingface_model: restrict dtype propagation to the top-level config and its known text/vision/audio subconfigs; consolidate the duplicated Gemma4 rotary re-init into one loop while keeping the post-.to(dtype) float32 buffer / attention_scaling restoration. - vllm_utils.assert_same_state_dict: _normalize_state_dict_tensor now returns None for non-tensor entries (e.g. BnB QuantState dicts) and callers skip those; align tied-embedding fallback tolerances with the outer comparison (atol=1e-4, rtol=1e-3). - vllm_utils._test_is_same_vlm: cast only floating-point tensors to model.dtype for Gemma3/Gemma4 processors, leaving integer inputs like pixel_values untouched. - vllm_utils._get_vllm_state_dict: collapse the unreachable lm_head elif chain; hoist the constant model_type/attention_k_eq_v check out of the gemma4_k_eq_v_layers set comprehension. - empty_model.get_model_layer_config: move model.visual.merger. linear_fc1 / linear_fc2 from additional_layers (which expected a {kk} placeholder) into non_layered_components.

Feat/quant config

Multi-reviewer pass on the autocast wrapper / norm-upcast path: - Instance-level forward (#2): an instance attribute `model.forward` (Unsloth runtime forward patching) shadows class-method overrides, so mutating __class__ silently bypassed the wrapper -> fp32 norm met a bf16 linear with no autocast and crashed. Now wrap the instance attribute when present; otherwise subclass as before. - Wrapper gating (unslothai#5, unslothai#7): install the wrapper iff fp32 norm params actually exist (from our upcast, the legacy env upcast, or an external _pre_set_compute_dtype policy) -- not on the upcast DECISION. Fixes the rollback path leaving external fp32 norms exposed, and stops wrapping models with no fp32 norm. Add _unwrap_forward_in_bf16_autocast for re-prepare (unslothai#10). - config.architectures leak (unslothai#8/unslothai#9): keep the original __name__ on the generated subclass (unique __qualname__ for registration) so save_pretrained records the base architecture. - Device detection (unslothai#11): recurse into mapping/list/tuple batches and fall back to the model's parameter device instead of defaulting to "cuda". - Legacy UNSLOTH_UPCAST_LAYERNORM (#1/unslothai#3/unslothai#4): route through the shared _cast_named_module + union matcher and honour the external-policy deferral. - Recursive external-ownership guard (unslothai#6): record descendants of tagged modules (the external policy casts recursively). - Fresh-interpreter pickle test (unslothai#12): real subprocess load. Shared helpers: _find_tensor_device_type, _call_forward_with_bf16_autocast, _canonical_module_name, _cast_named_module. Unit suite: 25 passed.

TheTsar8756 added 2 commits October 17, 2024 20:33

Hotfix to fix save_steps, save_strategy, and resume_from_checkpoint.

928aaa0

Fix for error where trainer optimizer and lr_scheduler are empty.

a8fb546

TheTsar8756 closed this Oct 19, 2024

mmathew23 pushed a commit to mmathew23/unsloth-zoo that referenced this pull request May 6, 2026

Merge pull request unslothai#5 from Manan17/feat/quant_config

d25869a

Feat/quant config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for save_steps, save_strategy, and resume_from_checkpoint#5

Fix for save_steps, save_strategy, and resume_from_checkpoint#5
TheTsar8756 wants to merge 2 commits into
unslothai:mainfrom
TheTsar8756:main

TheTsar8756 commented Oct 18, 2024

Uh oh!

TheTsar8756 commented Oct 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TheTsar8756 commented Oct 18, 2024

Uh oh!

TheTsar8756 commented Oct 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant