[WIP] Fix confusion on Gemma #121

yundai424 · 2024-08-27T17:36:51Z

Summary

why

From config point of view, Gemma1 is doing exact GeLU [ref] but gemma 1.1 and 2 are doing approximate gelu (gemma2, gemma1)
also, gemma uses hidden_activation config field and hidden_act is ignored (gemma1 code, gemma2 code)

That being said, we should be fine to claim that all of gemma 1, 1.1 and 2 are supported. But for safety i think we can first go with 1.1 and 2 first

what

adjust the monkey patch so it works properly with gemma2 code base too
checkstyle
[TODO] add final logit softcapping for gemma2 https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L1054

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

test/convergence/test_mini_models_no_logits.py

qingquansong

LGTM! Thanks for the fix! Wondering if we could relax the checking here so we can also use Gemma1 ?

DocShotgun · 2024-08-27T18:06:18Z

src/liger_kernel/transformers/monkey_patch.py

    if fused_linear_cross_entropy:
        modeling_gemma.GemmaForCausalLM.forward = gemma_lce_forward
+        modeling_gemma2.Gemma2ForCausalLM.forward = gemma_lce_forward


It looks like the same recently added lce_forward is being used for both Gemma and Gemma2 (#111).

It appears that there is a slight difference in the forward between Gemma and Gemma2 in the modeling code in transformers, specifically regarding logit softcapping (https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L1054-L1057), which doesn't seem to be accounted for in the new lce code as far as I can tell. Would this potentially lead to incompatibility?

yeah exactly, just figured out we need to add softcapping :/ i'll have a follow up PR to clarify that right now this only works for gamma 1 and 1.1 and we'll work on changing the kernel correspondingly asap

This reverts commit 3d3b604.

This reverts commit 3d3b604. ## Summary   ## Testing Done   - Hardware Type: <BLANK> - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence

yundai424 added 2 commits August 27, 2024 10:31

fix gemma confusion

769966f

checkstyle

1c9df65

yundai424 marked this pull request as ready for review August 27, 2024 17:39

ByronHsu reviewed Aug 27, 2024

View reviewed changes

test/convergence/test_mini_models_no_logits.py Show resolved Hide resolved

Update test_mini_models_no_logits.py

8412162

yundai424 changed the title ~~Fix confusion on Gemma~~ [WIP] Fix confusion on Gemma Aug 27, 2024

qingquansong approved these changes Aug 27, 2024

View reviewed changes

lancerts merged commit 3d3b604 into main Aug 27, 2024
1 check passed

lancerts deleted the yudai/gemma branch August 27, 2024 18:04

DocShotgun reviewed Aug 27, 2024

View reviewed changes

yundai424 added a commit that referenced this pull request Aug 27, 2024

Revert "[WIP] Fix confusion on Gemma (#121)"

550a61c

This reverts commit 3d3b604.

DocShotgun mentioned this pull request Aug 27, 2024

Update supported models for Liger Kernel axolotl-ai-cloud/axolotl#1875

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix confusion on Gemma #121

[WIP] Fix confusion on Gemma #121

yundai424 commented Aug 27, 2024 •

edited

Loading

qingquansong left a comment

DocShotgun Aug 27, 2024

yundai424 Aug 27, 2024

[WIP] Fix confusion on Gemma #121

[WIP] Fix confusion on Gemma #121

Conversation

yundai424 commented Aug 27, 2024 • edited Loading

Summary

Testing Done

qingquansong left a comment

Choose a reason for hiding this comment

DocShotgun Aug 27, 2024

Choose a reason for hiding this comment

yundai424 Aug 27, 2024

Choose a reason for hiding this comment

yundai424 commented Aug 27, 2024 •

edited

Loading