Skip to content

Falcon H1 dtype float16 update#212

Merged
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:falcon_compile_2
Jul 22, 2025
Merged

Falcon H1 dtype float16 update#212
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:falcon_compile_2

Conversation

@mmathew23
Copy link
Copy Markdown
Collaborator

Falcon H1 training in fp16 is unstable with the mamba kernels. NaN's appear frequently during training. To handle this situation we can force float32 when the dtype is float 16.

Also needs unslothai/unsloth#3026

appear frequently during training. To handle this situation we can force
float32 when the dtype is float 16.
@danielhanchen danielhanchen merged commit 2343341 into unslothai:main Jul 22, 2025
Datta0 pushed a commit to Datta0/unsloth-zoo that referenced this pull request Jul 24, 2025
…nslothai#212)

appear frequently during training. To handle this situation we can force
float32 when the dtype is float 16.
danielhanchen added a commit that referenced this pull request Sep 16, 2025
* [WIP] use vLLM for vision language models

* Streamline vision vllm settings

* WIP

* WIP vLLM VLM

* Make individual dummy model for qwen 2.5vl, llama3.2,
gemma3

* fixup norm for vLLM

* rework vLLM for VLMs

* Cleanup more stuff

* Load up remaining modules from state dict

* use get_state_dict when possible

* Fixup lm_head state dict fetch

* add is_vision flag for differentiating VLMs

* add is_vision_model flag

* Cleanup more stuff

* Cleanup vLLM extraction

* Fixup device type

* Cleanup more stuff

* revert vLLM mem usage calc changes

* Populate config values properly for VLMs

* cleaner attribute copy and check mechanism

* Patch siglip empty init

* Make additional module loading memory efficient

* Let the mini models be really small

* Minor cleanup

* cleanup vllm_utils by moving out empty model creation

* Gemma3 and CausalLM fixes

* Respect vLLMs conditions of max_num_batch_tokens vs max_seq_len

* Restrict mm per prompt and max batch tokens

* Improve config copy overs

* Falcon H1 training is fp16 is unstable with the mamba kernels. NaN's (#212)

appear frequently during training. To handle this situation we can force
float32 when the dtype is float 16.

* Fix torch compile issues (#213)

* Update __init__.py

* Update gradient_checkpointing.py

* Update compiler.py

* Update compiler.py

* Fix CE Loss

* Update loss_utils.py

* requires_grad_

* Update compiler.py

* Create gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update __init__.py

* fixup

* Update __init__.py

* Update peft_utils.py

* Update compiler.py

* timm compiling

* Update peft_utils.py

* Update compiler.py

* Update compiler.py

* Update gemma.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update training_utils.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update __init__.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma.py

* More canonicalization

* Update gemma.py

* Safer patching

* Update compiler.py

* Update __init__.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update gemma.py

* Update gemma.py

* Unpack

* Update utils.py

* Update utils.py

* Update utils.py

* Update gemma.py

* Update gemma.py

* Update utils.py

* Update misc.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Retry Gemma

* num_items_in_batch

* Update loss_utils.py

* UNSLOTH_COMPILE_DISABLE

* print n_items

* Update compiler.py

* Update common.py

* revert gemma

* Update gemma.py

* Merge and Save - Windows safetensors mmap open file error fix (#190)

* Draft-windows safetensors mmap open file error fix

* change 1

* test_3

* removed import duplicates

* fixed replacement comment

* Update gemma.py

* Update compiler.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update __init__.py

* Update gemma.py

* Update gemma.py

* Fused CE Loss

* Update compiler.py

* Update loss_utils.py

* compiled ce

* Update gemma.py

* Update gemma.py

* Update __init__.py

* Update gemma.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update gemma.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update pyproject.toml

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update gradient_checkpointing.py

* Update gradient_checkpointing.py

* Update gradient_checkpointing.py

* Syntax issues

* Torch compile updates

* Update patching_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update compiler.py

* compiler stance

* Update compiler.py

* Update loss_utils.py

* INFERENCE_RUNS

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* torch_dynamo_eval_frame

* Update compiler.py

* Update compiler.py

* Update compiler.py

* torch_compiler_set_stance

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update vllm_utils.py

* Update patching_utils.py

* Update __init__.py

* Update pyproject.toml

* Update loss_utils.py

* Fix issues

* Update loss_utils.py

* compile options

* compiler

* disable multi_kernel

* Update common.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update common.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* lora request

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* retry

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update __init__.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update vllm_utils.py

* Update vllm_utils.py

* Fix `set_stance`

* Update __init__.py

* Update common.py

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

* Small fix

* fixup norms for causallm

* Guard against args change

* dont mark as grpo hidden states as dynamic

* Refactor to make vision handling easier

* [WIP] fixup llama vision

* cleanup

* 2/n mllama

* fixup mllama additional layers

* Fixup qwen qknorm

* Pad token check and state dict changes

* Patch TF protobuf incompatability

* Revert "Patch TF protobuf incompatability"

This reverts commit fa93268.

* Fixup patch_model_and_tokenizer for VLM

* reset vllm state dict changes

* Cleanup logs

* Fixup gemma3 local rope embedding

* Fix Qwen 2.5 VL gate_up_proj vLLM

vLLM merged them recently. ref jeejeelee/vllm@a71e476

* Wakeup before doing vLLM generate (#259)

* Wakeup when generating if needed

* Patch vllm only when standby enabled

* use logger instead of print. Add license header

* Increase gpu_emmory_utilisation if in standby

* User friendly error message for sleep model with expandable segments

* Fixup cumem init for older versions

* fixup qwen vl vision rope

* do not slice logits for grpo

* undo changes to rl_replacements

* Fix: (temporary workaround) mem usage calcl for quantized VLMs

* fixup comparison attributes

* compare and copy dtype

* Copy buffers along with comparable attributes

---------

Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
danielhanchen added a commit that referenced this pull request Sep 22, 2025
* [WIP] use vLLM for vision language models

* Streamline vision vllm settings

* WIP

* WIP vLLM VLM

* Make individual dummy model for qwen 2.5vl, llama3.2,
gemma3

* fixup norm for vLLM

* rework vLLM for VLMs

* Cleanup more stuff

* Load up remaining modules from state dict

* use get_state_dict when possible

* Fixup lm_head state dict fetch

* add is_vision flag for differentiating VLMs

* add is_vision_model flag

* Cleanup more stuff

* Cleanup vLLM extraction

* Fixup device type

* Cleanup more stuff

* revert vLLM mem usage calc changes

* Populate config values properly for VLMs

* cleaner attribute copy and check mechanism

* Patch siglip empty init

* Make additional module loading memory efficient

* Let the mini models be really small

* Minor cleanup

* cleanup vllm_utils by moving out empty model creation

* Gemma3 and CausalLM fixes

* Respect vLLMs conditions of max_num_batch_tokens vs max_seq_len

* Restrict mm per prompt and max batch tokens

* Improve config copy overs

* Falcon H1 training is fp16 is unstable with the mamba kernels. NaN's (#212)

appear frequently during training. To handle this situation we can force
float32 when the dtype is float 16.

* Fix torch compile issues (#213)

* Update __init__.py

* Update gradient_checkpointing.py

* Update compiler.py

* Update compiler.py

* Fix CE Loss

* Update loss_utils.py

* requires_grad_

* Update compiler.py

* Create gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update __init__.py

* fixup

* Update __init__.py

* Update peft_utils.py

* Update compiler.py

* timm compiling

* Update peft_utils.py

* Update compiler.py

* Update compiler.py

* Update gemma.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update training_utils.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma3n.py

* Update __init__.py

* Update gemma3n.py

* Update gemma3n.py

* Update gemma.py

* More canonicalization

* Update gemma.py

* Safer patching

* Update compiler.py

* Update __init__.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update gemma.py

* Update gemma.py

* Unpack

* Update utils.py

* Update utils.py

* Update utils.py

* Update gemma.py

* Update gemma.py

* Update utils.py

* Update misc.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Retry Gemma

* num_items_in_batch

* Update loss_utils.py

* UNSLOTH_COMPILE_DISABLE

* print n_items

* Update compiler.py

* Update common.py

* revert gemma

* Update gemma.py

* Merge and Save - Windows safetensors mmap open file error fix (#190)

* Draft-windows safetensors mmap open file error fix

* change 1

* test_3

* removed import duplicates

* fixed replacement comment

* Update gemma.py

* Update compiler.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update gemma.py

* Update __init__.py

* Update gemma.py

* Update gemma.py

* Fused CE Loss

* Update compiler.py

* Update loss_utils.py

* compiled ce

* Update gemma.py

* Update gemma.py

* Update __init__.py

* Update gemma.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update gemma.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update pyproject.toml

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update gradient_checkpointing.py

* Update gradient_checkpointing.py

* Update gradient_checkpointing.py

* Syntax issues

* Torch compile updates

* Update patching_utils.py

* Update loss_utils.py

* Update loss_utils.py

* Update compiler.py

* compiler stance

* Update compiler.py

* Update loss_utils.py

* INFERENCE_RUNS

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* torch_dynamo_eval_frame

* Update compiler.py

* Update compiler.py

* Update compiler.py

* torch_compiler_set_stance

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update loss_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update patching_utils.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update compiler.py

* Update vllm_utils.py

* Update patching_utils.py

* Update __init__.py

* Update pyproject.toml

* Update loss_utils.py

* Fix issues

* Update loss_utils.py

* compile options

* compiler

* disable multi_kernel

* Update common.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update common.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* lora request

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* retry

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update vllm_lora_worker_manager.py

* Update __init__.py

* Update llama_cpp.py

* Update llama_cpp.py

* Update vllm_utils.py

* Update vllm_utils.py

* Fix `set_stance`

* Update __init__.py

* Update common.py

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

* Small fix

* fixup norms for causallm

* Guard against args change

* dont mark as grpo hidden states as dynamic

* Refactor to make vision handling easier

* [WIP] fixup llama vision

* cleanup

* 2/n mllama

* fixup mllama additional layers

* Fixup qwen qknorm

* Pad token check and state dict changes

* Patch TF protobuf incompatability

* Revert "Patch TF protobuf incompatability"

This reverts commit fa93268.

* Fixup patch_model_and_tokenizer for VLM

* reset vllm state dict changes

* Cleanup logs

* Fixup gemma3 local rope embedding

* Fix Qwen 2.5 VL gate_up_proj vLLM

vLLM merged them recently. ref jeejeelee/vllm@a71e476

* Wakeup before doing vLLM generate (#259)

* Wakeup when generating if needed

* Patch vllm only when standby enabled

* use logger instead of print. Add license header

* Increase gpu_emmory_utilisation if in standby

* User friendly error message for sleep model with expandable segments

* Fixup cumem init for older versions

* fixup qwen vl vision rope

* do not slice logits for grpo

* undo changes to rl_replacements

* Fix: (temporary workaround) mem usage calcl for quantized VLMs

* Add mistral 3 support

* Cleanup and fix for other models

* fixup comparison attributes

* Testing code

* compare and copy dtype

* more mistral changes

* more mistral changes

* Mistral 3 final touches

* Fixup mistral3 quantization stuff

* Clean up stuff

---------

Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants