Skip to content

Untangle config inheritance#41541

Merged
ArthurZucker merged 65 commits intohuggingface:mainfrom
zucchini-nlp:config-inheritance
Jan 16, 2026
Merged

Untangle config inheritance#41541
ArthurZucker merged 65 commits intohuggingface:mainfrom
zucchini-nlp:config-inheritance

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Oct 13, 2025

What does this PR do?

As per title, deletes base config attributes that are not actually universal for all models (e.g. token ids used only for text models and tying can be done only if model has embedding layers)

After this PR, we need to clean up generation-related params from config classes (#41695), and then we can easily turn config classes to a dataclass. Using dataclass is one of the requirements for hf hub type validation which I am working on currently

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

zucchini-nlp commented Oct 17, 2025

Oh no, I caused 100+ merge conflicts with another PR 🙃

@zucchini-nlp zucchini-nlp changed the title [WIP] Untangle config inheritance Untangle config inheritance Oct 17, 2025
Comment on lines -191 to -192
pad_token_id: Optional[int] = None,
bos_token_id: Optional[int] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modular file shows that these were supposed to be deleted with del bos_token_id, but the conversion script could not handle it correctly

@zucchini-nlp
Copy link
Member Author

Full CI didn't show any test errors, so I will be wrapping this up today-tmrw

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, aimv2, albert, align, altclip, apertus, arcee, aria, audioflamingo3, auto, bamba, bark, bart, bert, bert_generation, big_bird

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41541&sha=0e9d3d

@ArthurZucker ArthurZucker merged commit aec03e5 into huggingface:main Jan 16, 2026
20 of 25 checks passed
@zucchini-nlp
Copy link
Member Author

Thanks, now I will go back to the hub validation and making configs a dataclass 🎉

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* remove from base

* delete

* fetcher fix

* missing values

* update

* is decoder missing

* forgot to add

* add special tokens with default `None` in text models

* fsmt has unused subconfig, fix it!

* update

* fix

* add missig token id defaults

* fix more tests

* tie_word_embeddings

* tiny fixes

* more test fixes

* fix docstrings

* fix copies

* fix style?

* fix copied again

* fix copies

* fix examples

* delete left over print stmt

* splitnter

* this defi will fix a bunch decoder-only models

* make style

* fix copies

* not all models are supposed to have an attr for `tie_word_embeddings`!

* comment out

* fix

* more fixes

* fix copies

* docstring and non-model tests

* update

* fix repo consistency

* style

* fix

* revert

* remove unused attr

* fix repo

* fix test

* fix a few tests, more tests

* fix gemma & llava

* style

* gemma3n also

* new models as well

* skip the test

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
githubnemo pushed a commit to githubnemo/transformers that referenced this pull request Feb 2, 2026
PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).
vaibhavjindal added a commit to linkedin/Liger-Kernel that referenced this pull request Feb 3, 2026
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
Fix multiple failing monkey_patch tests for transformers v5. The tests
failed because of
huggingface/transformers#41541.

These changes are backward compatible.

This change fixes the following failing tests:
```
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_for_conditional_generation - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_text - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_llama4_for_conditional_generation - AttributeError: 'Llama4Config' object has no attribute 'pad_token_id'
FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_glm4v - AttributeError: 'Glm4vTextConfig' object has no attribute 'pad_token_id'
```


Fixes #1059.

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence
ArthurZucker pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
tarekziade pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
tarekziade pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
ArthurZucker pushed a commit that referenced this pull request Feb 5, 2026
* Fix T5 v1.1 detection

PR #41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
* Fix T5 v1.1 detection

PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things)
which subtly broke detection of T5 v1.1 vs. original detection. As a
consequence, decoder output scaling was always applied, regardless of
T5 version.

This is resolved by using the correct value for `tie_word_embeddings`.

**Testing:**

This was not covered by the tests since the tests instantiate the config
once and modify attributes on the config. This is problematic since all
the decision logic is happening in `T5Config.__init__`. This was addressed
by having a specific `get_config_v1_1` method that initializes the
config as if it were coming from a v1.1 model (e.g., flan-t5).

* Make repo consistent

* Make repo consistent

* mt5 isn't copied from t5 anymore

---------

Co-authored-by: nemo <git@ningu.net>
Co-authored-by: raushan <raushan@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants