Falcon H1 dtype float16 update#212
Merged
Merged
Conversation
appear frequently during training. To handle this situation we can force float32 when the dtype is float 16.
Datta0
pushed a commit
to Datta0/unsloth-zoo
that referenced
this pull request
Jul 24, 2025
…nslothai#212) appear frequently during training. To handle this situation we can force float32 when the dtype is float 16.
danielhanchen
added a commit
that referenced
this pull request
Sep 16, 2025
* [WIP] use vLLM for vision language models * Streamline vision vllm settings * WIP * WIP vLLM VLM * Make individual dummy model for qwen 2.5vl, llama3.2, gemma3 * fixup norm for vLLM * rework vLLM for VLMs * Cleanup more stuff * Load up remaining modules from state dict * use get_state_dict when possible * Fixup lm_head state dict fetch * add is_vision flag for differentiating VLMs * add is_vision_model flag * Cleanup more stuff * Cleanup vLLM extraction * Fixup device type * Cleanup more stuff * revert vLLM mem usage calc changes * Populate config values properly for VLMs * cleaner attribute copy and check mechanism * Patch siglip empty init * Make additional module loading memory efficient * Let the mini models be really small * Minor cleanup * cleanup vllm_utils by moving out empty model creation * Gemma3 and CausalLM fixes * Respect vLLMs conditions of max_num_batch_tokens vs max_seq_len * Restrict mm per prompt and max batch tokens * Improve config copy overs * Falcon H1 training is fp16 is unstable with the mamba kernels. NaN's (#212) appear frequently during training. To handle this situation we can force float32 when the dtype is float 16. * Fix torch compile issues (#213) * Update __init__.py * Update gradient_checkpointing.py * Update compiler.py * Update compiler.py * Fix CE Loss * Update loss_utils.py * requires_grad_ * Update compiler.py * Create gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update __init__.py * fixup * Update __init__.py * Update peft_utils.py * Update compiler.py * timm compiling * Update peft_utils.py * Update compiler.py * Update compiler.py * Update gemma.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update training_utils.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update __init__.py * Update gemma3n.py * Update gemma3n.py * Update gemma.py * More canonicalization * Update gemma.py * Safer patching * Update compiler.py * Update __init__.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update gemma.py * Update gemma.py * Unpack * Update utils.py * Update utils.py * Update utils.py * Update gemma.py * Update gemma.py * Update utils.py * Update misc.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Retry Gemma * num_items_in_batch * Update loss_utils.py * UNSLOTH_COMPILE_DISABLE * print n_items * Update compiler.py * Update common.py * revert gemma * Update gemma.py * Merge and Save - Windows safetensors mmap open file error fix (#190) * Draft-windows safetensors mmap open file error fix * change 1 * test_3 * removed import duplicates * fixed replacement comment * Update gemma.py * Update compiler.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update __init__.py * Update gemma.py * Update gemma.py * Fused CE Loss * Update compiler.py * Update loss_utils.py * compiled ce * Update gemma.py * Update gemma.py * Update __init__.py * Update gemma.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update gemma.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update pyproject.toml * Update compiler.py * Update compiler.py * Update compiler.py * Update gradient_checkpointing.py * Update gradient_checkpointing.py * Update gradient_checkpointing.py * Syntax issues * Torch compile updates * Update patching_utils.py * Update loss_utils.py * Update loss_utils.py * Update compiler.py * compiler stance * Update compiler.py * Update loss_utils.py * INFERENCE_RUNS * Update compiler.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * torch_dynamo_eval_frame * Update compiler.py * Update compiler.py * Update compiler.py * torch_compiler_set_stance * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update vllm_utils.py * Update patching_utils.py * Update __init__.py * Update pyproject.toml * Update loss_utils.py * Fix issues * Update loss_utils.py * compile options * compiler * disable multi_kernel * Update common.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update common.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * lora request * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * retry * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update __init__.py * Update llama_cpp.py * Update llama_cpp.py * Update vllm_utils.py * Update vllm_utils.py * Fix `set_stance` * Update __init__.py * Update common.py --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> * Small fix * fixup norms for causallm * Guard against args change * dont mark as grpo hidden states as dynamic * Refactor to make vision handling easier * [WIP] fixup llama vision * cleanup * 2/n mllama * fixup mllama additional layers * Fixup qwen qknorm * Pad token check and state dict changes * Patch TF protobuf incompatability * Revert "Patch TF protobuf incompatability" This reverts commit fa93268. * Fixup patch_model_and_tokenizer for VLM * reset vllm state dict changes * Cleanup logs * Fixup gemma3 local rope embedding * Fix Qwen 2.5 VL gate_up_proj vLLM vLLM merged them recently. ref jeejeelee/vllm@a71e476 * Wakeup before doing vLLM generate (#259) * Wakeup when generating if needed * Patch vllm only when standby enabled * use logger instead of print. Add license header * Increase gpu_emmory_utilisation if in standby * User friendly error message for sleep model with expandable segments * Fixup cumem init for older versions * fixup qwen vl vision rope * do not slice logits for grpo * undo changes to rl_replacements * Fix: (temporary workaround) mem usage calcl for quantized VLMs * fixup comparison attributes * compare and copy dtype * Copy buffers along with comparable attributes --------- Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
danielhanchen
added a commit
that referenced
this pull request
Sep 22, 2025
* [WIP] use vLLM for vision language models * Streamline vision vllm settings * WIP * WIP vLLM VLM * Make individual dummy model for qwen 2.5vl, llama3.2, gemma3 * fixup norm for vLLM * rework vLLM for VLMs * Cleanup more stuff * Load up remaining modules from state dict * use get_state_dict when possible * Fixup lm_head state dict fetch * add is_vision flag for differentiating VLMs * add is_vision_model flag * Cleanup more stuff * Cleanup vLLM extraction * Fixup device type * Cleanup more stuff * revert vLLM mem usage calc changes * Populate config values properly for VLMs * cleaner attribute copy and check mechanism * Patch siglip empty init * Make additional module loading memory efficient * Let the mini models be really small * Minor cleanup * cleanup vllm_utils by moving out empty model creation * Gemma3 and CausalLM fixes * Respect vLLMs conditions of max_num_batch_tokens vs max_seq_len * Restrict mm per prompt and max batch tokens * Improve config copy overs * Falcon H1 training is fp16 is unstable with the mamba kernels. NaN's (#212) appear frequently during training. To handle this situation we can force float32 when the dtype is float 16. * Fix torch compile issues (#213) * Update __init__.py * Update gradient_checkpointing.py * Update compiler.py * Update compiler.py * Fix CE Loss * Update loss_utils.py * requires_grad_ * Update compiler.py * Create gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update __init__.py * fixup * Update __init__.py * Update peft_utils.py * Update compiler.py * timm compiling * Update peft_utils.py * Update compiler.py * Update compiler.py * Update gemma.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update training_utils.py * Update gemma3n.py * Update gemma3n.py * Update gemma3n.py * Update __init__.py * Update gemma3n.py * Update gemma3n.py * Update gemma.py * More canonicalization * Update gemma.py * Safer patching * Update compiler.py * Update __init__.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update utils.py * Update gemma.py * Update gemma.py * Unpack * Update utils.py * Update utils.py * Update utils.py * Update gemma.py * Update gemma.py * Update utils.py * Update misc.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Retry Gemma * num_items_in_batch * Update loss_utils.py * UNSLOTH_COMPILE_DISABLE * print n_items * Update compiler.py * Update common.py * revert gemma * Update gemma.py * Merge and Save - Windows safetensors mmap open file error fix (#190) * Draft-windows safetensors mmap open file error fix * change 1 * test_3 * removed import duplicates * fixed replacement comment * Update gemma.py * Update compiler.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update gemma.py * Update __init__.py * Update gemma.py * Update gemma.py * Fused CE Loss * Update compiler.py * Update loss_utils.py * compiled ce * Update gemma.py * Update gemma.py * Update __init__.py * Update gemma.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update gemma.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update pyproject.toml * Update compiler.py * Update compiler.py * Update compiler.py * Update gradient_checkpointing.py * Update gradient_checkpointing.py * Update gradient_checkpointing.py * Syntax issues * Torch compile updates * Update patching_utils.py * Update loss_utils.py * Update loss_utils.py * Update compiler.py * compiler stance * Update compiler.py * Update loss_utils.py * INFERENCE_RUNS * Update compiler.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * torch_dynamo_eval_frame * Update compiler.py * Update compiler.py * Update compiler.py * torch_compiler_set_stance * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update loss_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update patching_utils.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update compiler.py * Update vllm_utils.py * Update patching_utils.py * Update __init__.py * Update pyproject.toml * Update loss_utils.py * Fix issues * Update loss_utils.py * compile options * compiler * disable multi_kernel * Update common.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * Update common.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_utils.py * lora request * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * retry * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_utils.py * Update vllm_utils.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update vllm_lora_worker_manager.py * Update __init__.py * Update llama_cpp.py * Update llama_cpp.py * Update vllm_utils.py * Update vllm_utils.py * Fix `set_stance` * Update __init__.py * Update common.py --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> * Small fix * fixup norms for causallm * Guard against args change * dont mark as grpo hidden states as dynamic * Refactor to make vision handling easier * [WIP] fixup llama vision * cleanup * 2/n mllama * fixup mllama additional layers * Fixup qwen qknorm * Pad token check and state dict changes * Patch TF protobuf incompatability * Revert "Patch TF protobuf incompatability" This reverts commit fa93268. * Fixup patch_model_and_tokenizer for VLM * reset vllm state dict changes * Cleanup logs * Fixup gemma3 local rope embedding * Fix Qwen 2.5 VL gate_up_proj vLLM vLLM merged them recently. ref jeejeelee/vllm@a71e476 * Wakeup before doing vLLM generate (#259) * Wakeup when generating if needed * Patch vllm only when standby enabled * use logger instead of print. Add license header * Increase gpu_emmory_utilisation if in standby * User friendly error message for sleep model with expandable segments * Fixup cumem init for older versions * fixup qwen vl vision rope * do not slice logits for grpo * undo changes to rl_replacements * Fix: (temporary workaround) mem usage calcl for quantized VLMs * Add mistral 3 support * Cleanup and fix for other models * fixup comparison attributes * Testing code * compare and copy dtype * more mistral changes * more mistral changes * Mistral 3 final touches * Fixup mistral3 quantization stuff * Clean up stuff --------- Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Falcon H1 training in fp16 is unstable with the mamba kernels. NaN's appear frequently during training. To handle this situation we can force float32 when the dtype is float 16.
Also needs unslothai/unsloth#3026