Gemma3 by imangohari1 · Pull Request #2233 · huggingface/optimum-habana

imangohari1 · 2025-08-29T00:50:13Z

What does this PR do?

Adds Gemma3 🚀

Tests

text-gen: CI tests

Tests are added to the CI for 3 Gemma3 model sizes. All tests are passing on both Gaudi2 and 3.

Gaudi2

======================== 3 passed in 1104.92s (0:18:24) ========================

Gaudi3

========================= 3 passed in 82.58s (0:02:22) =========================

Performance analysis: Lazy vs Eager, with and without KV cache and hpu graphs

Note

These tests are conducted on Gaudi2

Test	Command	performance
Lazy + hpu_graphs + kv cache	PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16	66.71125412069311 tokens/second
Lazy + hpu_graphs	PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_hpu_graphs --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16	61.44039873745102 tokens/second
eager+ kv cache	PT_HPU_LAZY_MODE=0 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_kv_cache --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16	15.210596800130373 tokens/second
eager	PT_HPU_LAZY_MODE=0 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16	13.72176136613309 tokens/second

Multimodal prompt

Note

These tests are conducted with a modified version of gemma3 multimodal inference here

HW	model size	output
Gaudi3	google/gemma-3-4b-it	Overall Impression: The image is a close-up, vibrant shot of a garden scene, focusing on a cluster of pink cosmos flowers and a busy bee. It has a slightly soft, natural feel, likely due to the shallow depth of field.
Gaudi3	google/gemma-3-12b-it	Overall Impression: The image is a close-up shot of a vibrant garden scene, focusing on pink cosmos flowers and a busy bumblebee. The composition is natural and slightly blurred in the background, drawing attention to the flowers and the bee.
Gaudi3	google/gemma-3-27b-it	Overall Impression: The image is a close-up shot of a vibrant pink cosmos flower with a bumblebee actively collecting pollen from its center. The focus is sharp on the flower and bee, with a slightly blurred background of other plants and foliage.
Gaudi2	google/gemma-3-4b-it	Overall Impression: The image is a close-up, vibrant shot of a small garden scene, focusing on a cluster of pink cosmos flowers and a busy bee. It has a slightly soft, natural feel, likely captured in daylight.
Gaudi2	google/gemma-3-12b-it	Overall Impression: The image is a close-up shot of a vibrant garden scene, focusing on pink cosmos flowers and a busy bumblebee. The composition is natural and slightly blurred in the background, drawing attention to the flowers and the bee.
Gaudi2	google/gemma-3-27b-it	Overall Impression: The image is a close-up shot of a vibrant pink cosmos flower with a bumblebee actively foraging on it. The focus is sharp on the flower and bee, with a slightly blurred background of greenery and other flowers. It evokes a sense of nature, pollination, and the beauty of a garden.

Accuracy

Comparison to base

Note

These tests are conducted on Gaudi2, with gemma-3-4b-it and max_new_token=128

Variable	without current PR	with current PR
`acc`	`0.7627856365614799`	`0.764417845484222`
`acc_norm`	`0.7720348204570185`	`0.7731229597388466`
`duration`	`489.73700710099365`	`83.4628518190002`

Different model sizes

Note

These tests are conducted on Gaudi2, with the piqa example here

Model size	Max token size	Metric
`gemma-3-4b-it`	`128`	`"acc,none": 0.764417845484222`
`gemma-3-4b-it`	`8192`	`"acc,none": 0.764417845484222`
`gemma-3-27b-it`	`128`	`"acc,none": 0.809575625680087`
`gemma-3-27b-it`	`8192`	`"acc,none": 0.809575625680087`

PT_HPU_LAZY_MODE=1  RUN_SLOW=true python -m pytest tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-3-27b-it-1-False-True-False] -s -v 
.
.
.

Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1.1: ("DeepSpeed is a machine learning framework that enables you to train models with hundreds of billions or even trillions of parameters. Here's a breakdown of what it is, its key features, and how it compares to other approaches:\n\n**What is DeepSpeed?**\n\nDeveloped by Microsoft, DeepSpeed is a deep learning optimization library designed to make large-scale model training more efficient, accessible, and cost-effective. It's built on PyTorch and is open-source. It's particularly notable for enabling the training of",)


Stats:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 37.80783933134947 tokens/second
Average first token latency         = 29.88703576847911 ms
Average rest token latency          = 26.18070118378547 ms
Average end to end latency          = 2644.610726973042 ms
Memory allocated                    = 61.6 GB
Max memory allocated                = 61.6 GB
Total memory available              = 126.54 GB
Graph compilation duration          = 9.278187631978653 seconds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PASSED

=============================================================================================================================================================== 1 passed in 59.85s =====================================================================================================

Please do a final review. this PR should be all good now.

regisss

LGTM, let's just wait a bit if @schoi-habana wants to reply to your comment about "None"

schoi-habana · 2025-09-18T16:07:29Z

@regisss moving softmax_mode is going to be minor change for @imangohari1. other than that it looks good to me

imangohari1 · 2025-09-18T17:13:17Z

@regisss moving softmax_mode is going to be minor change for @imangohari1. other than that it looks good to me

Thanks @schoi-habana . I updated the softmax_mode def. dac020a

I ran the subset of the tests in description, including the cis, and all are passing.

@regisss please review. thank you.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Gemma3

2e4681f

imangohari1 requested review from regisss and vivekgoe as code owners August 29, 2025 00:50

dsocek approved these changes Aug 29, 2025

View reviewed changes

karol-brejna-i assigned gplutop7 Sep 4, 2025

regisss reviewed Sep 15, 2025

View reviewed changes

gplutop7 reviewed Sep 15, 2025

View reviewed changes

Iman Gohari added 2 commits September 15, 2025 19:54

fea(): merged main on 2880ad4

8cbb907

fea(gemma3): upgraded to HF 4.55.4 and cleaned up

b003391

Merge branch 'huggingface:main' into ig/gemma3

eedbd66

schoi-habana reviewed Sep 16, 2025

View reviewed changes

Comment thread optimum/habana/transformers/models/gemma3/modeling_gemma3.py

Comment thread optimum/habana/transformers/models/gemma3/modeling_gemma3.py

Comment thread optimum/habana/transformers/models/gemma3/modeling_gemma3.py Outdated

regisss reviewed Sep 17, 2025

View reviewed changes

Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated

Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated

Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated

merged main and added PR#2262 chnages

1735e26

--- Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

revert unnecessary test_text_generation_example.py changes

720de26

imangohari1 requested a review from regisss September 17, 2025 16:37

Merge branch 'main' into ig/gemma3

1fbc02e

regisss reviewed Sep 18, 2025

View reviewed changes

Comment thread tests/baselines/fixture/tests/test_text_generation_example.json Outdated

remove unnecessary blank line

db271b6

minor change in softmax_mode def

dac020a

imangohari1 requested a review from schoi-habana September 18, 2025 17:13

regisss approved these changes Sep 19, 2025

View reviewed changes

regisss merged commit da97a14 into huggingface:main Sep 19, 2025
3 of 5 checks passed

astachowiczhabana pushed a commit that referenced this pull request Sep 22, 2025

Gemma3 (#2233)

4507b03

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Sep 23, 2025

Gemma3 (#2233)

7f1d0ef

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Sep 25, 2025

Gemma3 (#2233)

18e97d5

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Sep 26, 2025

Gemma3 (#2233)

ac31934

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Sep 29, 2025

Gemma3 (#2233)

aab110b

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 1, 2025

Gemma3 (#2233)

14f0c04

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 1, 2025

Gemma3 (#2233)

de54868

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 3, 2025

Gemma3 (#2233)

acf0d9d

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 3, 2025

Gemma3 (#2233)

51aa34a

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 7, 2025

Gemma3 (#2233)

b54ef93

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 9, 2025

Gemma3 (#2233)

26c59f8

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 13, 2025

Gemma3 (#2233)

341c4ca

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 15, 2025

Gemma3 (#2233)

61bcdc1

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 20, 2025

Gemma3 (#2233)

dde6a82

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 22, 2025

Gemma3 (#2233)

a8f8ed9

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 22, 2025

Gemma3 (#2233)

e3d44ba

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 23, 2025

Gemma3 (#2233)

e6f8682

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 28, 2025

Gemma3 (#2233)

6aa19b3

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

astachowiczhabana pushed a commit that referenced this pull request Oct 29, 2025

Gemma3 (#2233)

da8d021

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

                   "gemma",
                   "gemma2",
+                  "gemma3",
+                  "gemma3_text",

+                      key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)
+                      value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)
+                      query_states = self.q_norm(query_states)

                               assert generation_config.bucket_size >= 0, "please set valid bucket_size to use bucket_internal"
-                      if self.config.model_type == "gemma2":
+                      if self.config.model_type == "gemma2" or self.config.model_type == "gemma3":

Conversation

imangohari1 commented Aug 29, 2025

What does this PR do?

Tests

text-gen: CI tests

Gaudi2

Gaudi3

Performance analysis: Lazy vs Eager, with and without KV cache and hpu graphs

Multimodal prompt

Accuracy

Comparison to base

Different model sizes

Next

Before submitting

Uh oh!

dsocek left a comment

Choose a reason for hiding this comment

Uh oh!

dsocek Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

imangohari1 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

imangohari1 commented Sep 8, 2025

Uh oh!

imangohari1 commented Sep 9, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

imangohari1 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

regisss commented Sep 15, 2025

Uh oh!

github-actions Bot commented Sep 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2025

Uh oh!

Uh oh!

gplutop7 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

imangohari1 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

imangohari1 commented Sep 15, 2025

Uh oh!

imangohari1 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

regisss left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imangohari1 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

schoi-habana commented Sep 18, 2025

Uh oh!

imangohari1 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

imangohari1 commented Sep 16, 2025 •

edited

Loading

regisss left a comment •

edited

Loading

imangohari1 commented Sep 17, 2025 •

edited

Loading

imangohari1 commented Sep 18, 2025 •

edited

Loading