Remove Gemma-4 from FORCE_FLOAT32 by danielhanchen · Pull Request #4875 · unslothai/unsloth

danielhanchen · 2026-04-06T14:30:33Z

Summary

Remove gemma4, and gemma4_text from the FORCE_FLOAT32 list in loader.py
Gemma-4 works correctly in both float16 and bfloat16 without forcing float32

Background

The FORCE_FLOAT32 override was forcing Gemma-4 to load in bfloat16/float32 when the user requested float16. This prevented float16 from working on Tesla T4 and other GPUs without bfloat16 support.

Testing shows that Gemma-4's activation magnitudes stay well within float16 range (max ~2080 vs fp16 max 65504). The forced float32 path was actually causing training divergence -- with it enabled, the compiled run diverged at step ~28 with grad norms collapsing to near zero and loss plateauing at ~12.4.

Test results

Training (Gemma-4 E2B, 4-bit LoRA, SFT on FineTome-100k, 100 steps, no patches):

Metric	float16	bfloat16
Final loss (step 100)	3.048	3.065
Min loss	2.389 (step 76)	2.396 (step 76)
Avg loss (last 20 steps)	3.198	3.211
Grad norms	Healthy (~3.0)	Healthy (~3.0)

Inference: float16 and bfloat16 produce identical outputs.

Companion PR

Remove Gemma-4 temporary patches unsloth-zoo#576 -- removes all Gemma-4 temporary patches from gemma4.py

Test plan

Verify float16 inference produces correct output
Verify bfloat16 inference produces correct output
Verify float16 training converges (100 steps)
Verify bfloat16 training converges (100 steps)
Verify losses match between float16 and bfloat16
Test on Tesla T4 (float16-only GPU)

Gemma-4 does not need FORCE_FLOAT32. Testing shows that both float16 and bfloat16 work correctly without the forced float32 override: - Inference: identical outputs for float16 and bfloat16 (greedy decoding) - Training (100 steps, 4-bit LoRA, SFT on FineTome-100k): - float16 final loss: 3.048 - bfloat16 final loss: 3.065 - Losses converge to within 0.02 by step 60 - Grad norms healthy and comparable for both dtypes The FORCE_FLOAT32 path was actually causing training divergence. With it enabled, the compiled float32 run diverged at step ~28 with grad norms collapsing to near zero and loss plateauing at ~12.4. Without it, both dtypes train normally. This enables float16 on Tesla T4 and other GPUs without bfloat16 support.

gemini-code-assist

Code Review

This pull request removes 'gemma4,' and 'gemma4_text' from the list of model names in unsloth/models/loader.py, which appears to manage model-specific compilation exclusions. There are no review comments to address, and I have no feedback to provide.

Gemma-4 does not need FORCE_FLOAT32. Testing shows that both float16 and bfloat16 work correctly without the forced float32 override: - Inference: identical outputs for float16 and bfloat16 (greedy decoding) - Training (100 steps, 4-bit LoRA, SFT on FineTome-100k): - float16 final loss: 3.048 - bfloat16 final loss: 3.065 - Losses converge to within 0.02 by step 60 - Grad norms healthy and comparable for both dtypes The FORCE_FLOAT32 path was actually causing training divergence. With it enabled, the compiled float32 run diverged at step ~28 with grad norms collapsing to near zero and loss plateauing at ~12.4. Without it, both dtypes train normally. This enables float16 on Tesla T4 and other GPUs without bfloat16 support.

danielhanchen requested a review from mmathew23 as a code owner April 6, 2026 14:30

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

danielhanchen merged commit 07b6fcc into main Apr 6, 2026
5 checks passed

danielhanchen deleted the gemma4-remove-force-float32 branch April 6, 2026 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove Gemma-4 from FORCE_FLOAT32#4875

Remove Gemma-4 from FORCE_FLOAT32#4875
danielhanchen merged 1 commit into
mainfrom
gemma4-remove-force-float32

danielhanchen commented Apr 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

danielhanchen commented Apr 6, 2026

Summary

Background

Test results

Companion PR

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant