Gemma3#2233
Conversation
dsocek
left a comment
There was a problem hiding this comment.
LGTM, see note on RMSNorm
| key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2) | ||
| value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2) | ||
|
|
||
| query_states = self.q_norm(query_states) |
There was a problem hiding this comment.
@imangohari1 maybe check if we can use HPU optimized FusedRMSNorm here (see ce888b1 for example)
There was a problem hiding this comment.
@dsocek I tried this and it caused some accuracy issues. I will try this optimization at a later time.
|
@regisss Could you please review this when you got a chance? thanks. |
|
@schoi-habana @skavulya FYI on this to review. I am working on updating it to the OH main with updated HF now but please take a look as needed. I appreciate it. |
regisss
left a comment
There was a problem hiding this comment.
Very clean PR! I just left one minor comment
| "gemma", | ||
| "gemma2", | ||
| "gemma3", | ||
| "gemma3_text", |
There was a problem hiding this comment.
Are there checkpoints on the HF hub with gemma3_text as the model type?
There was a problem hiding this comment.
not sure about the checkpoints, but there is distinct model type in the config file as gemma3_text: https://huggingface.co/google/gemma-3-4b-it/blob/main/config.json#L18
|
There is a merge conflict to solve too |
|
The code quality check failed, please run |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| assert generation_config.bucket_size >= 0, "please set valid bucket_size to use bucket_internal" | ||
|
|
||
| if self.config.model_type == "gemma2": | ||
| if self.config.model_type == "gemma2" or self.config.model_type == "gemma3": |
There was a problem hiding this comment.
Can you confirm that Gemma-3 on Gaudi does not support static/paged (or hybrid) KV cache (same as Gemma-2), which is why we force generation_config.cache_implementation = None here?
There was a problem hiding this comment.
Based on the two model's similarities, they have been treated the same way.
if you have a specific test in mind to confirm this, please share it and will look into it.
Hi @regisss Will ping here when it is done. Thank you! |
|
Hi @regisss Note: all the tests listed in the description has ben redone upto here and they all work as expected. |
--- Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@regisss done. 1735e26 The CI tests are passing as is now. Please do a final review. this PR should be all good now. |
regisss
left a comment
There was a problem hiding this comment.
LGTM, let's just wait a bit if @schoi-habana wants to reply to your comment about "None"
|
@regisss moving softmax_mode is going to be minor change for @imangohari1. other than that it looks good to me |
Thanks @schoi-habana . I updated the I ran the subset of the tests in description, including the cis, and all are passing. @regisss please review. thank you. |
Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
What does this PR do?
Adds Gemma3 🚀
Tests
text-gen: CI tests
Tests are added to the CI for 3 Gemma3 model sizes. All tests are passing on both Gaudi2 and 3.
Gaudi2
Gaudi3
Performance analysis: Lazy vs Eager, with and without KV cache and hpu graphs
Note
These tests are conducted on Gaudi2
PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16PT_HPU_LAZY_MODE=1 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_hpu_graphs --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16PT_HPU_LAZY_MODE=0 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --use_kv_cache --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16PT_HPU_LAZY_MODE=0 python examples/text-generation/run_generation.py --model_name_or_path google/gemma-3-4b-it --max_new_tokens 100 --do_sample --prompt "DeepSpeed is a machine learning framework" --sdp_on_bf16Multimodal prompt
Note
These tests are conducted with a modified version of gemma3 multimodal inference here
Accuracy
Comparison to base
Note
These tests are conducted on Gaudi2, with
gemma-3-4b-itandmax_new_token=128acc0.76278563656147990.764417845484222acc_norm0.77203482045701850.7731229597388466duration489.7370071009936583.4628518190002Different model sizes
Note
These tests are conducted on Gaudi2, with the
piqaexample heregemma-3-4b-it128"acc,none": 0.764417845484222gemma-3-4b-it8192"acc,none": 0.764417845484222gemma-3-27b-it128"acc,none": 0.809575625680087gemma-3-27b-it8192"acc,none": 0.809575625680087Next
The Sliding Window Attention for this model with be enabled after merge of #2210
--
co-authored by @skavulya
Before submitting