Skip to content

Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3#36787

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
hmellor:fix-gemma
Mar 11, 2026
Merged

Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3#36787
DarkLight1337 merged 3 commits intovllm-project:mainfrom
hmellor:fix-gemma

Conversation

@hmellor
Copy link
Copy Markdown
Member

@hmellor hmellor commented Mar 11, 2026

Gemma3Model.forward accepts pre-scaled inputs_embeds which are scaled by Gemma3Model.embed_input_ids.

Before this PR GemmaModel and Gemma2Model did the scaling inside forward.

After this PR the scaling for the earlier Gemma variants happens in embed_input_ids. This is consistent with:

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@DarkLight1337
Copy link
Copy Markdown
Member

Btw there are some other models that do a similar thing as well. How should we handle them?

@hmellor
Copy link
Copy Markdown
Member Author

hmellor commented Mar 11, 2026

This was only really needed for the basic correctness test because it uses HF to generate the embeds and then passes them to vLLM. So the change in behaviour on the HF side was a problem.

For the rest of vLLM where we use vLLM to generate the embeds this should be a non-issue.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 11, 2026 13:23
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the embedding scaling for Gemma and Gemma 2 models to align with Gemma 3 and recent changes in the transformers library. The scaling logic is correctly moved from the forward method to embed_input_ids. The accompanying test changes are also appropriate, but they contain an incorrect version string for transformers which should be corrected for accuracy and maintainability.

Comment thread tests/basic_correctness/test_basic_correctness.py
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor requested a review from ywang96 as a code owner March 11, 2026 15:12
@DarkLight1337 DarkLight1337 merged commit 65986db into vllm-project:main Mar 11, 2026
53 checks passed
@hmellor hmellor deleted the fix-gemma branch March 12, 2026 09:28
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…ect#36787)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…ect#36787)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…ect#36787)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants