Skip to content

Gemma3#2233

Merged
regisss merged 9 commits into
huggingface:mainfrom
imangohari1:ig/gemma3
Sep 19, 2025
Merged

Gemma3#2233
regisss merged 9 commits into
huggingface:mainfrom
imangohari1:ig/gemma3

Conversation

@imangohari1
Copy link
Copy Markdown
Contributor

What does this PR do?

Adds Gemma3 🚀

Tests

text-gen: CI tests

Tests are added to the CI for 3 Gemma3 model sizes. All tests are passing on both Gaudi2 and 3.

Gaudi2

======================== 3 passed in 1104.92s (0:18:24) ========================

Gaudi3

========================= 3 passed in 82.58s (0:02:22) =========================

Performance analysis: Lazy vs Eager, with and without KV cache and hpu graphs

Note

These tests are conducted on Gaudi2

Test Command performance
Lazy + hpu_graphs + kv cache
PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16
66.71125412069311 tokens/second
Lazy + hpu_graphs
PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_hpu_graphs --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16
61.44039873745102 tokens/second
eager+ kv cache
PT_HPU_LAZY_MODE=0 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it  --use_kv_cache --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16 
15.210596800130373 tokens/second
eager
PT_HPU_LAZY_MODE=0 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it  --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16
13.72176136613309 tokens/second

Multimodal prompt

Note

These tests are conducted with a modified version of gemma3 multimodal inference here

HW model size output
Gaudi3 google/gemma-3-4b-it
**Overall Impression:**

The image is a close-up, vibrant shot of a garden scene, focusing on a cluster of pink cosmos flowers and a busy bee. It has a slightly soft, natural feel, likely due to the shallow depth of field.
Gaudi3 google/gemma-3-12b-it
**Overall Impression:**

The image is a close-up shot of a vibrant garden scene, focusing on pink cosmos flowers and a busy bumblebee. The composition is natural and slightly blurred in the background, drawing attention to the flowers and the bee.
Gaudi3 google/gemma-3-27b-it
**Overall Impression:**

The image is a close-up shot of a vibrant pink cosmos flower with a bumblebee actively collecting pollen from its center. The focus is sharp on the flower and bee, with a slightly blurred background of other plants and foliage.
Gaudi2 google/gemma-3-4b-it
**Overall Impression:**

The image is a close-up, vibrant shot of a small garden scene, focusing on a cluster of pink cosmos flowers and a busy bee. It has a slightly soft, natural feel, likely captured in daylight.
Gaudi2 google/gemma-3-12b-it
**Overall Impression:**

The image is a close-up shot of a vibrant garden scene, focusing on pink cosmos flowers and a busy bumblebee. The composition is natural and slightly blurred in the background, drawing attention to the flowers and the bee.
Gaudi2 google/gemma-3-27b-it
**Overall Impression:**

The image is a close-up shot of a vibrant pink cosmos flower with a bumblebee actively foraging on it. The focus is sharp on the flower and bee, with a slightly blurred background of greenery and other flowers. It evokes a sense of nature, pollination, and the beauty of a garden.

Accuracy

Comparison to base

Note

These tests are conducted on Gaudi2, with gemma-3-4b-it and max_new_token=128

Variable without current PR with current PR
acc 0.7627856365614799 0.764417845484222
acc_norm 0.7720348204570185 0.7731229597388466
duration 489.73700710099365 83.4628518190002

Different model sizes

Note

These tests are conducted on Gaudi2, with the piqa example here

Model size Max token size Metric
gemma-3-4b-it 128 "acc,none": 0.764417845484222
gemma-3-4b-it 8192 "acc,none": 0.764417845484222
gemma-3-27b-it 128 "acc,none": 0.809575625680087
gemma-3-27b-it 8192 "acc,none": 0.809575625680087

Next

The Sliding Window Attention for this model with be enabled after merge of #2210

--
co-authored by @skavulya

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Copy Markdown
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see note on RMSNorm

key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)
value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)

query_states = self.q_norm(query_states)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imangohari1 maybe check if we can use HPU optimized FusedRMSNorm here (see ce888b1 for example)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsocek I tried this and it caused some accuracy issues. I will try this optimization at a later time.

@imangohari1
Copy link
Copy Markdown
Contributor Author

@regisss Could you please review this when you got a chance? thanks.

@imangohari1
Copy link
Copy Markdown
Contributor Author

@schoi-habana @skavulya FYI on this to review. I am working on updating it to the OH main with updated HF now but please take a look as needed. I appreciate it.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean PR! I just left one minor comment

"gemma",
"gemma2",
"gemma3",
"gemma3_text",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there checkpoints on the HF hub with gemma3_text as the model type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about the checkpoints, but there is distinct model type in the config file as gemma3_text: https://huggingface.co/google/gemma-3-4b-it/blob/main/config.json#L18

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Sep 15, 2025

There is a merge conflict to solve too

@github-actions
Copy link
Copy Markdown

The code quality check failed, please run make style.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread optimum/habana/transformers/modeling_utils.py
assert generation_config.bucket_size >= 0, "please set valid bucket_size to use bucket_internal"

if self.config.model_type == "gemma2":
if self.config.model_type == "gemma2" or self.config.model_type == "gemma3":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that Gemma-3 on Gaudi does not support static/paged (or hybrid) KV cache (same as Gemma-2), which is why we force generation_config.cache_implementation = None here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the two model's similarities, they have been treated the same way.
if you have a specific test in mind to confirm this, please share it and will look into it.

@imangohari1
Copy link
Copy Markdown
Contributor Author

There is a merge conflict to solve too

Hi @regisss
Thanks for the review. yes I am aware of this, and there are some updates needed for this PR after the HF update. I am working on resolving the conflicts and getting this model updated.

Will ping here when it is done. Thank you!

@imangohari1
Copy link
Copy Markdown
Contributor Author

imangohari1 commented Sep 16, 2025

Hi @regisss
this PR is updated with OH main, conflicts are resolved and it is ready to be merged. Please review. CC: @schoi-habana @skavulya

Note: all the tests listed in the description has ben redone upto here and they all work as expected.

Comment thread optimum/habana/transformers/models/gemma3/modeling_gemma3.py
Comment thread optimum/habana/transformers/models/gemma3/modeling_gemma3.py
Comment thread optimum/habana/transformers/models/gemma3/modeling_gemma3.py Outdated
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add the changes you brought in #2262. There is also a merge conflict to solve with main, probably linked to #2262.

Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated
Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated
Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated
---
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@imangohari1
Copy link
Copy Markdown
Contributor Author

imangohari1 commented Sep 17, 2025

Let's also add the changes you brought in #2262. There is also a merge conflict to solve with main, probably linked to #2262.

@regisss done. 1735e26
I merged the main as well and added you as co-author.

The CI tests are passing as is now.

PT_HPU_LAZY_MODE=1  RUN_SLOW=true python -m pytest tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-3-27b-it-1-False-True-False] -s -v 
.
.
.

Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1.1: ("DeepSpeed is a machine learning framework that enables you to train models with hundreds of billions or even trillions of parameters. Here's a breakdown of what it is, its key features, and how it compares to other approaches:\n\n**What is DeepSpeed?**\n\nDeveloped by Microsoft, DeepSpeed is a deep learning optimization library designed to make large-scale model training more efficient, accessible, and cost-effective. It's built on PyTorch and is open-source. It's particularly notable for enabling the training of",)


Stats:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 37.80783933134947 tokens/second
Average first token latency         = 29.88703576847911 ms
Average rest token latency          = 26.18070118378547 ms
Average end to end latency          = 2644.610726973042 ms
Memory allocated                    = 61.6 GB
Max memory allocated                = 61.6 GB
Total memory available              = 126.54 GB
Graph compilation duration          = 9.278187631978653 seconds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PASSED

=============================================================================================================================================================== 1 passed in 59.85s =====================================================================================================

Please do a final review. this PR should be all good now.

@imangohari1 imangohari1 requested a review from regisss September 17, 2025 16:37
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's just wait a bit if @schoi-habana wants to reply to your comment about "None"

Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated
@schoi-habana
Copy link
Copy Markdown
Collaborator

@regisss moving softmax_mode is going to be minor change for @imangohari1. other than that it looks good to me

@imangohari1
Copy link
Copy Markdown
Contributor Author

imangohari1 commented Sep 18, 2025

@regisss moving softmax_mode is going to be minor change for @imangohari1. other than that it looks good to me

Thanks @schoi-habana . I updated the softmax_mode def. dac020a

I ran the subset of the tests in description, including the cis, and all are passing.

@regisss please review. thank you.

@regisss regisss merged commit da97a14 into huggingface:main Sep 19, 2025
3 of 5 checks passed
astachowiczhabana pushed a commit that referenced this pull request Sep 22, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Sep 23, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Sep 25, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Sep 26, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Sep 29, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 1, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 1, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 3, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 3, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 7, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 9, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 13, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 15, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 20, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 22, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 22, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 23, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 28, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
astachowiczhabana pushed a commit that referenced this pull request Oct 29, 2025
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants