Untangle config inheritance by zucchini-nlp · Pull Request #41541 · huggingface/transformers

zucchini-nlp · 2025-10-13T09:57:13Z

What does this PR do?

As per title, deletes base config attributes that are not actually universal for all models (e.g. token ids used only for text models and tying can be done only if model has embedding layers)

After this PR, we need to clean up generation-related params from config classes (#41695), and then we can easily turn config classes to a dataclass. Using dataclass is one of the requirements for hf hub type validation which I am working on currently

HuggingFaceDocBuilderDev · 2025-10-13T10:06:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-10-17T12:58:24Z

Oh no, I caused 100+ merge conflicts with another PR 🙃

zucchini-nlp · 2025-10-17T15:25:33Z

src/transformers/models/aimv2/configuration_aimv2.py

-        pad_token_id: Optional[int] = None,
-        bos_token_id: Optional[int] = None,


modular file shows that these were supposed to be deleted with del bos_token_id, but the conversion script could not handle it correctly

zucchini-nlp · 2026-01-15T09:40:51Z

Full CI didn't show any test errors, so I will be wrapping this up today-tmrw

github-actions · 2026-01-16T08:31:03Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, aimv2, albert, align, altclip, apertus, arcee, aria, audioflamingo3, auto, bamba, bark, bart, bert, bert_generation, big_bird

github-actions · 2026-01-16T08:39:53Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41541&sha=0e9d3d

zucchini-nlp · 2026-01-16T09:50:54Z

Thanks, now I will go back to the hub validation and making configs a dataclass 🎉

* remove from base * delete * fetcher fix * missing values * update * is decoder missing * forgot to add * add special tokens with default `None` in text models * fsmt has unused subconfig, fix it! * update * fix * add missig token id defaults * fix more tests * tie_word_embeddings * tiny fixes * more test fixes * fix docstrings * fix copies * fix style? * fix copied again * fix copies * fix examples * delete left over print stmt * splitnter * this defi will fix a bunch decoder-only models * make style * fix copies * not all models are supposed to have an attr for `tie_word_embeddings`! * comment out * fix * more fixes * fix copies * docstring and non-model tests * update * fix repo consistency * style * fix * revert * remove unused attr * fix repo * fix test * fix a few tests, more tests * fix gemma & llava * style * gemma3n also * new models as well * skip the test --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5).

## Summary  Fix multiple failing monkey_patch tests for transformers v5. The tests failed because of huggingface/transformers#41541. These changes are backward compatible. This change fixes the following failing tests: ``` FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_for_conditional_generation - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_qwen3_vl_moe_text - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_llama4_for_conditional_generation - AttributeError: 'Llama4Config' object has no attribute 'pad_token_id' FAILED test/transformers/test_monkey_patch.py::test_apply_liger_kernel_to_instance_for_glm4v - AttributeError: 'Glm4vTextConfig' object has no attribute 'pad_token_id' ``` Fixes #1059.  ## Testing Done   - Hardware Type: <BLANK> - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence

* Fix T5 v1.1 detection PR #41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>

* Fix T5 v1.1 detection PR huggingface#41541 refactored `tie_word_embeddings` handling (among other things) which subtly broke detection of T5 v1.1 vs. original detection. As a consequence, decoder output scaling was always applied, regardless of T5 version. This is resolved by using the correct value for `tie_word_embeddings`. **Testing:** This was not covered by the tests since the tests instantiate the config once and modify attributes on the config. This is problematic since all the decision logic is happening in `T5Config.__init__`. This was addressed by having a specific `get_config_v1_1` method that initializes the config as if it were coming from a v1.1 model (e.g., flan-t5). * Make repo consistent * Make repo consistent * mt5 isn't copied from t5 anymore --------- Co-authored-by: nemo <git@ningu.net> Co-authored-by: raushan <raushan@huggingface.co>

zucchini-nlp added 2 commits October 10, 2025 17:13

remove from base

a3a8726

delete

7569f17

zucchini-nlp added 20 commits October 13, 2025 12:55

fetcher fix

83db459

missing values

66225ab

update

6d322fa

is decoder missing

cd1c645

forgot to add

a744da3

add special tokens with default None in text models

212e609

fsmt has unused subconfig, fix it!

c45264c

update

589a776

Merge branch 'main' into config-inheritance

cb03af1

Merge branch 'main' into config-inheritance

a7ea9dc

fix

9520541

add missig token id defaults

a94cd75

fix more tests

338558c

tie_word_embeddings

0e6f6f7

tiny fixes

fb0c58d

more test fixes

05699a7

fix docstrings

a913528

fix copies

87e610d

fix style?

ad1930e

rebase main

57b1736

zucchini-nlp added 3 commits October 17, 2025 15:00

fix copied again

74a4a46

merge main

9afe474

fix copies

f79588e

zucchini-nlp changed the title ~~[WIP] Untangle config inheritance~~ Untangle config inheritance Oct 17, 2025

fix examples

7d3c3cf

zucchini-nlp commented Oct 17, 2025

View reviewed changes

zucchini-nlp added 6 commits January 15, 2026 11:34

merge main

2d4da5f

new models as well

840e8ea

skip the test

5cc58bd

Merge branch 'main' into config-inheritance

9909482

Merge branch 'main' into config-inheritance

b2c7337

Merge branch 'main' into config-inheritance

0e9d3d2

ArthurZucker merged commit aec03e5 into huggingface:main Jan 16, 2026
20 of 25 checks passed

Sai-Suraj-27 mentioned this pull request Jan 23, 2026

Fix failing tests due to no attribute pad_token_id #43453

Merged

5 tasks

vasqu mentioned this pull request Jan 29, 2026

fix: AttributeError for Qwen3_omni_moe #43593

Merged

5 tasks

githubnemo mentioned this pull request Feb 2, 2026

Fix two issues introduced in AutoGPTQ deprecation huggingface/peft#3014

Merged

githubnemo mentioned this pull request Feb 2, 2026

Fix T5 v1.1 detection #43681

Merged

This was referenced Feb 3, 2026

Fix pad token id issue for transformers v5 support linkedin/Liger-Kernel#1058

Merged

Transformers v5 monkey_patch test failures because of pad_token_id linkedin/Liger-Kernel#1059

Closed

hmellor mentioned this pull request Feb 10, 2026

Make JAIS compatible with Transformers v5 vllm-project/vllm#34264

Merged

JustinTong0323 mentioned this pull request Feb 14, 2026

Upgrade transformers==5.3.0 sgl-project/sglang#17784

Open

21 tasks

bailey-coding mentioned this pull request Feb 24, 2026

[BUG: Breaking] 'SuryaDecoderConfig' object has no attribute 'pad_token_id' datalab-to/surya#484

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Untangle config inheritance#41541

Untangle config inheritance#41541
ArthurZucker merged 65 commits intohuggingface:mainfrom
zucchini-nlp:config-inheritance

zucchini-nlp commented Oct 13, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 13, 2025

Uh oh!

zucchini-nlp commented Oct 17, 2025 •

edited

Loading

Uh oh!

zucchini-nlp Oct 17, 2025

Uh oh!

zucchini-nlp commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

zucchini-nlp commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		pad_token_id: Optional[int] = None,
		bos_token_id: Optional[int] = None,

Conversation

zucchini-nlp commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 13, 2025

Uh oh!

zucchini-nlp commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

zucchini-nlp commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zucchini-nlp commented Oct 13, 2025 •

edited

Loading

zucchini-nlp commented Oct 17, 2025 •

edited

Loading