Fix Hermes chat template validation error#4183
Conversation
for more information, see https://pre-commit.ci
Replace f-string triple-quoted approach with explicit newline characters for clearer string construction in the grpo_trainer patch.
for more information, see https://pre-commit.ci
* Add missing import of inspect * Update device_type.py
…nslothai#3768) * Improve error message for fast_inference and full_finetuning * Refine error message string formatting * Update unsloth/models/vision.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…nslothai#3780) * fix(trainer): import psutil to prevent NameError in _prepare_dataset Fixes unslothai#3777 * Update rl.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
* Guard optional trl.experimental.openenv usage in RL patches * Simplify optional trl.openenv import handling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…3790) * Fix is_contiguous() method call and remove duplicate imports - Fix bug in rope_embedding.py where is_contiguous was used without parentheses, causing the method object (always truthy) to be evaluated instead of calling the method. This fixes issue unslothai#3781 where fast rope backpropagation was broken for zero strided/non-contiguous tensors. - Remove duplicate `import torch` in rl.py (lines 20 and 25) - Remove duplicate `import functools` and `import types` in vision.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix Boolean value of Tensor ambiguity error in mistral.py Replace `or` operator with explicit `is None` check when getting n_items from kwargs. The `or` operator fails when the value is a Tensor because Python cannot determine the boolean value of a multi-element tensor. Fixes unslothai#3766 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update rope_embedding.py --------- Co-authored-by: yurekami <yurekami@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>
…lothai#3794) Add "corda" as an allowed value for the init_lora_weights parameter in FastLanguageModel.get_peft_model() and FastBaseModel.get_peft_model(). This enables users to use CorDA (Correlation-aware Decomposed Adaptation) initialization from PEFT, which provides an alternative LoRA initialization strategy for improved finetuning performance. Fixes unslothai#3693 Signed-off-by: majiayu000 <1835304752@qq.com>
for more information, see https://pre-commit.ci
…lothai#3811) * Fix correctness bugs in rl.py, rl_replacements.py, and vision.py 1. rl_replacements.py (lines 864, 870): Fixed undefined `nanmin`/`nanmax` functions by using `.nan_to_num(nan=inf/-inf).min()/.max()` pattern. PyTorch doesn't have torch.nanmin/nanmax, so we replace NaN values before computing min/max. 2. vision.py (line 150): Fixed bug where code checked for "input" key but then accessed kwargs["input_ids"] instead of kwargs["input"]. 3. vision.py (line 159): Fixed bug where literal string "key" was used instead of the variable `key` when accessing kwargs. 4. rl.py (lines 903, 905): Fixed non-existent `MathError` exception by replacing with `ValueError`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1. cohere.py:347-348 - Fixed wrong variable names in QK normalization. Used `Q`/`K` but variables were named `Qn`/`Kn`. This caused NameError when `use_qk_norm=True` (e.g., c4ai-command-r-plus models). 2. cohere.py:482 - Fixed wrong object reference in inference loop. Used `self.mlp` but should be `decoder_layer.mlp` since we're iterating through decoder layers. Caused AttributeError during inference. 3. falcon_h1.py:459,461 - Fixed wrong attribute names in inference path. Used `post_attention_layernorm` and `mlp` but Falcon H1 uses `pre_ff_layernorm` and `feed_forward`. Caused AttributeError during generation. 4. qwen3_moe.py:210 - Fixed wrong module path with incorrect capitalization. Used `transformers.models.Qwen3Moe` but should be `transformers.models.qwen3_moe`. Caused AttributeError when patching rotary embeddings. 5. qwen3_moe.py:239 - Fixed wrong model_patcher class. Used `FastQwen3Model` but should be `FastQwen3MoeModel` for MoE models. Caused incorrect patching for Qwen3 MoE models. 6. hf_hub.py:21-22 - Fixed floor division and missing return for billion values. Used `//` instead of `/` for millions, and had no return for values >= 1B. Caused incorrect formatting and None return for large numbers. 7. save.py:550 - Fixed self-assignment that did nothing. `sharded_ram_usage = sharded_ram_usage` should be `= max_shard_size`. Caused integer shard sizes to be ignored. 8. rl.py:562-567 - Fixed orphan string not included in length_check. The elif branch for max_seq_length validation was a standalone string expression, not concatenated to length_check. Caused silent skip of the max_seq_length > model_max_seq_length warning. 9. granite.py:49-52 - Fixed wrong model name and version in error message. Said "Gemma2" and "4.42.3" but should be "Granite" and "4.45.0".
…tmul Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FIX: weight tying for LoRA embeddings and lm_head
Gemma3 models have a large vocabulary (262144 tokens) which causes training loss to explode when using int8 embedding quantization. This fix auto-detects Gemma3 models and switches from int8-int4 (phone-deployment) to int4 weight-only QAT for stable training.
…lity Fix Gemma3 QAT training instability with int8-int4 scheme
When users load a model with fast_inference=False but then try to use vLLM-style arguments with fast_generate, they previously got confusing errors. This adds a wrapper that detects common mistakes and provides helpful guidance: - Using sampling_params: explains to use HF generate args instead - Using lora_request: explains LoRA weights are already merged - Passing text strings: shows how to tokenize input first Changes: - Add make_fast_generate_wrapper to _utils.py - Apply wrapper in llama.py when fast_inference=False - Apply wrapper in vision.py when fast_inference=False
for more information, see https://pre-commit.ci
…ai#4123) Use 16 warps for RDNA in the chunked cross-entropy forward kernel (large vocab > 65536), matching the existing CDNA optimization. Benchmarked on W7900 (gfx1100) with actual unsloth kernels (5 trials, median): - Chunked CE forward (BS=65536): 16 warps = 2.4-2.6x faster than 32 - All other kernels (LayerNorm, RoPE, SwiGLU): default heuristic is already optimal for RDNA; no modification needed. Depends on: unslothai#4109 (provides is_rdna() detection)
…ads (unslothai#4026) Fix global dequantize buffer dtype mismatch when loading multiple 4-bit models with different dtypes in the same process. Adds dtype check alongside existing None check for WEIGHT_BUFFER in both CUDA/HIP and XPU paths.
…#4034) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>
) * Fix auto padding free logic to respect user passed * Update unsloth/trainer.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Add Qwen3.5 to FORCE_FLOAT32 * fix vision encoder dtype mismatch * revert vision cast changes
updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.2 → v0.15.4](astral-sh/ruff-pre-commit@v0.15.2...v0.15.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Updated with Qwen3.5 Small models
…nslothai#4123)" (unslothai#4139) This reverts commit 4d3e7d7.
…nslothai#4136) Current arch.startswith("gfx1") incorrectly matches: - RDNA1 (gfx10xx) and RDNA2 (gfx103x): not ROCm supported - gfx1102 (RX 7600), gfx1103 (Phoenix APU): not in ROCm support matrix - gfx1150/1151/1152 (RDNA3.5 APUs): not in ROCm support matrix Replace with explicit whitelist aligned to the ROCm Linux support matrix: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html gfx1100 - RDNA3 discrete (RX 7900 series, PRO W7900/W7800) gfx1101 - RDNA3 discrete (RX 7800/7700 series, PRO W7700) gfx1200 - RDNA4 discrete (RX 9060 series) gfx1201 - RDNA4 discrete (RX 9070 series, AI PRO R9700) Mirrors the existing is_cdna() pattern. Avoids silently applying unverified Triton kernel tuning to unsupported hardware.
* Fix lm_head lora save * Fix _need_to_train_embeddings guard for lm_head LoRA targets When lm_head is already in final_modules as a LoRA target, the _need_to_train_embeddings block should not also add it to modules_to_save. This prevents dual-wrapping (LoRA + modules_to_save on the same module) which causes assertion failures downstream. Check if embed_tokens/lm_head are already being trained as LoRA targets before adding them to modules_to_save. Also prevents duplicate entries with elif guards. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add intel support for torch210 * fix for typo
…support (unslothai#4138) * fix: update GGUF save paths to use ~/.unsloth/llama.cpp with Windows support * fix: quote LLAMA_CPP_DEFAULT_DIR in fallback shell commands to handle paths with spaces * refactor: deduplicate platform-specific build instructions in quantization error message * chore: remove accidentally committed PR description file * Fix import safety and f-string bugs in save.py - H4: Add defensive try/except for LLAMA_CPP_DEFAULT_DIR and IS_WINDOWS imports with fallback defaults, so save.py works even if zoo PR unslothai#526 is not merged yet - H5: Fix Kaggle error path using plain "Error: {e}" instead of f"Error: {e}", so the actual exception is shown to users * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fixup mapper issues and resolve properly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix broken wandb import crashing unsloth startup When wandb is installed but broken (e.g., wandb < 0.19.11 with protobuf >= 6.0), the import chain unsloth -> trl -> transformers -> is_wandb_available() -> import wandb crashes with: ImportError: cannot import name 'Imports' from 'wandb.proto.wandb_telemetry_pb2' This happens because transformers' is_wandb_available() has no try/except around `import wandb`. The error propagates up and kills `from unsloth import FastLanguageModel` even though wandb is optional. Add disable_broken_wandb() following the same pattern as disable_torchcodec_if_broken(). It proactively tries importing wandb during early init, and if the import fails, patches is_wandb_available() to return False and sets WANDB_DISABLED=true. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…slothai#4148) trl/trainer/callbacks.py imports is_wandb_available from accelerate.utils, not from transformers. The original fix in unslothai#4147 only patched the transformers version, so `from trl import GRPOTrainer` still crashed via the callbacks.py -> accelerate -> wandb path. Must patch both the source module (accelerate.utils.imports) AND the re-export namespace (accelerate.utils) since Python's `from accelerate.utils import X` reads from the latter, which holds its own cached reference.
Add hermes to the list of models that bypass the add_generation_prompt check in fix_chat_template. Hermes models use ChatML templates that don't require this validation, similar to mistral and qwen3guard models.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly adds hermes to the list of models that bypass the chat template validation, fixing a RuntimeError for Hermes models that use ChatML-style templates. The change is straightforward and follows the existing pattern for models like mistral. I've added one suggestion to improve the code comment for accuracy and to use a tuple for the list of models, which is a minor style improvement.
| # Ignore mistral type models since they don't have an add_generation_prompt | ||
| if any( | ||
| s in str(getattr(tokenizer, "name_or_path", "")).lower() | ||
| for s in ["mistral", "qwen3guard"] | ||
| for s in ["mistral", "qwen3guard", "hermes"] |
There was a problem hiding this comment.
The comment on line 604 is now a bit inaccurate. While mistral models might lack add_generation_prompt, other models in this list (like hermes which uses ChatML) do have it. The more general reason for bypassing is that their templates don't need the automatic fix from fix_chat_template.
For better code style and a minor performance improvement, it's good practice to use a tuple for a fixed collection of items instead of a list.
| # Ignore mistral type models since they don't have an add_generation_prompt | |
| if any( | |
| s in str(getattr(tokenizer, "name_or_path", "")).lower() | |
| for s in ["mistral", "qwen3guard"] | |
| for s in ["mistral", "qwen3guard", "hermes"] | |
| # Ignore some models that do not need chat template fixing. | |
| if any( | |
| s in str(getattr(tokenizer, "name_or_path", "")).lower() | |
| for s in ("mistral", "qwen3guard", "hermes") |
Fixes #4150
Summary
Hermes models use ChatML-style templates that don't require the
{% if add_generation_prompt %}validation check. This PR addshermesto the list of models that bypass this check, similar to howmistralandqwen3guardare already handled.Changes
hermesto the model bypass list inload_correct_tokenizerfunction inunsloth/tokenizer_utils.pyRoot Cause
When loading a Hermes model (or a LoRA adapter trained with Hermes/ChatML template), the
fix_chat_templatefunction was raising aRuntimeErrorbecause:{% if add_generation_prompt %}in a format the auto-fixer expects_fix_chat_templatecouldn't patch it automaticallyTesting
The fix follows the existing pattern used for mistral and qwen3guard models, which have similar template structures that don't require this validation.
Diff Stats