Skip to content

Fix accessing final norm for Gemma-3 models#1687

Merged
kunal-vaishnavi merged 2 commits into
mainfrom
kvaishnavi/gemma3-v4.55
Aug 20, 2025
Merged

Fix accessing final norm for Gemma-3 models#1687
kunal-vaishnavi merged 2 commits into
mainfrom
kvaishnavi/gemma3-v4.55

Conversation

@kunal-vaishnavi
Copy link
Copy Markdown
Contributor

Description

This PR fixes how the final norm is identified for the Gemma-3 models. It works with the latest version of Hugging Face's transformers (v4.55.2).

Motivation and Context

Previous versions of transformers would modify the class structure for the Gemma-3 models as breaking changes. Since transformers has landed on a stable way to load multi-modal models with AutoModelForCausalLM for now, the current approach is to identify the path to model.model.language_model.norm for the Gemma-3 models that are multi-modal.

Gemma-3 1B's final norm is accessible at model.model.norm while Gemma-3 4B's final norm is accessible at model.model.language_model.norm. For PEFT's decoder-only models, the core model is accessible at model.base_model.model and the final norm is usually accessible at model.base_model.model.model.norm.

We can read the parent-most class name to identify whether a model is from PEFT or not. One advantage with this approach is that it allows any adaptations in the path to the final norm of a Transformers model to still be found in the PEFT version of that model.

Comment thread src/python/py/models/builder.py Fixed
nenad1002
nenad1002 previously approved these changes Aug 19, 2025
Comment thread src/python/py/models/builder.py
Comment thread src/python/py/models/builder.py
@kunal-vaishnavi kunal-vaishnavi merged commit bee5dca into main Aug 20, 2025
14 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the kvaishnavi/gemma3-v4.55 branch August 20, 2025 16:43
kunal-vaishnavi added a commit that referenced this pull request Sep 18, 2025
### Description

This PR replaces any references to `NvTensorRtRtx` with `trt-rtx` except
in the GenAI config file.

### Motivation and Context

This abbreviation reduces [potential
bugs](#1687 (comment))
that can be raised while using the EP name in the model builder. It also
preserves the original intention to keep EP names short and brief.
kunal-vaishnavi added a commit that referenced this pull request Dec 20, 2025
### Description

This PR adds a tutorial to show how to create the ONNX models for
Gemma-3 vision (4B, 12B, 27B).

### Motivation and Context

This PR requires the changes from the following PRs.
- #1374
- #1687
- #1701
- #1786

This PR also resolves the following issues.
- #1329
- #1536
- #1655
- #1698
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants