:rotating_light: Validate config attributes by zucchini-nlp · Pull Request #41250 · huggingface/transformers

zucchini-nlp · 2025-10-01T11:50:47Z

What does this PR do?

As per title. Continues from #40793 and supersedes #36534

NOTE: config classes can't accept positional args anymore! I don't think anyone would use pos args anyway but marring the PR as breaking

Note

High Risk
Refactors PreTrainedConfig and many model config classes to @dataclass + huggingface_hub @strict validation, which can change initialization/serialization behavior and reject previously-accepted configs. Also enforces save-time validation and updates defaults/deprecations (e.g., use_return_dict), risking backward-compatibility across model loading and downstream integrations.

Overview
Adds strict config validation. PreTrainedConfig is converted to a @dataclass with huggingface_hub’s @strict, introduces built-in validators (architecture consistency, special token id ranges, layer type checks, output_attentions vs attn_implementation), and runs validate() automatically on save_pretrained.

Modernizes and standardizes model configs. Many model configuration classes are migrated from custom __init__ logic to dataclass fields + __post_init__, moving compatibility logic (e.g., defaulting sub-configs, key/value casting for JSON) into post-init and adding model-specific validate_architecture where needed.

API/behavior tweaks. Deprecates use_return_dict in favor of return_dict (and updates multiple model forward paths accordingly), adjusts RoPE validation ignore-key handling, narrows AutoTokenizer fallback exception handling, and bumps the minimum huggingface-hub requirement to >=1.5.0.

^{Written by Cursor Bugbot for commit 07095f3. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2025-10-01T11:59:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/models/bart/configuration_bart.py

zucchini-nlp · 2025-12-10T13:12:32Z

Blocked by #41541 (comment) for now

zucchini-nlp · 2026-02-03T13:12:25Z

Tieme to revive this branch

…mote code, tmrw

zucchini-nlp · 2026-02-04T17:05:43Z

Nice, much better and easy to maintain BC with remote code now!

ArthurZucker

Very very nice!

ArthurZucker · 2026-02-05T07:51:48Z

src/transformers/configuration_utils.py

+        # Keys are always strings in JSON so convert ids to int here for id2label and pruned_heads
+        if self.id2label is None:
+            self._create_id_label_maps(kwargs.get("num_labels", 2))
+        else:
+            if kwargs.get("num_labels") is not None and len(self.id2label) != kwargs.get("num_labels"):
+                logger.warning(
+                    f"You passed `num_labels={kwargs.get('num_labels')}` which is incompatible to "
+                    f"the `id2label` map of length `{len(self.id2label)}`."
+                )
+            self.id2label = {int(key): value for key, value in self.id2label.items()}


is it a good time to get rid of these general attributes and only have them for models that actually require them?

src/transformers/configuration_utils.py

src/transformers/models/bart/configuration_bart.py

src/transformers/models/esm/configuration_esm.py

…GLM-5 nightly tests Transformers PR huggingface/transformers#41250 (merged Mar 16) converts PretrainedConfig subclasses to @DataClass via __init_subclass__, which breaks sglang's DeepseekVL2Config (non-default field ordering) and prevents the server from starting at all. Remove `pip install git+https://github.com/huggingface/transformers.git` from all Qwen 3.5 and GLM-5 CI jobs (MI30x, MI35x, ROCm 7.0 and 7.2). Use the stable transformers shipped in the docker image instead, matching all other nightly jobs (Grok2, DeepSeek-V3.2, etc.). Keep mistral-common and lm-eval[api] for Qwen 3.5 tests that need them.

@strict

- Remove torch_dtype="auto" from docs (now default) - Simplify modular_sarvam_mla.py to only override defaults that differ from DeepseekV3Config (no __init__, no workarounds) - Add @strict(accept_kwargs=True) for config validation (huggingface#41250) - Regenerate configuration_sarvam_mla.py with dataclass fields and __post_init__ pattern - Hub config.json changes needed: remove head_dim/q_head_dim, change rope_scaling.type to "yarn", update architectures Made-with: Cursor

Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.

Transformers PR huggingface/transformers#41250 (merged Mar 16) converts PretrainedConfig subclasses to @DataClass via __init_subclass__, which breaks sglang's DeepseekVL2Config and prevents the server from starting. For Qwen 3.5: remove git+transformers entirely — stable version in the docker image is sufficient (verified passing). For GLM-5: pin to commit 96f807a33b75 (last commit before the breaking change) since GLM-5 needs the glm_moe_dsa model type which is only in the transformers dev branch, not in stable releases yet.

tomaarsen · 2026-03-26T18:12:17Z

I'm getting failures here due to this PR:

from transformers import AutoConfig

config = AutoConfig.from_pretrained("tiny-random/glm-4v")
print(type(config))

Traceback (most recent call last):
  File "[sic]\demo_glm4v_config.py", line 4, in <module>
    config = AutoConfig.from_pretrained("tiny-random/glm-4v")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[sic]\src\transformers\models\auto\configuration_auto.py", line 1484, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[sic]\src\transformers\configuration_utils.py", line 757, in from_dict
    config = cls(**config_dict)
             ^^^^^^^^^^^^^^^^^^
  File "[sic]\transformers\Lib\site-packages\huggingface_hub\dataclasses.py", line 280, in init_with_validate
    cls.validate(self)  # type: ignore [attr-defined]
    ^^^^^^^^^^^^^^^^^^
  File "[sic]\transformers\Lib\site-packages\huggingface_hub\dataclasses.py", line 255, in validate
    validator(self)
  File "[sic]\src\transformers\modeling_rope_utils.py", line 723, in validate_rope
    validation_fn(rope_parameters, ignore_keys=self.ignore_keys_at_rope_validation)
  File "[sic]\src\transformers\modeling_rope_utils.py", line 733, in _validate_default_rope_parameters
    self._check_received_keys(rope_type, received_keys, required_keys, ignore_keys=ignore_keys)
  File "[sic]\src\transformers\modeling_rope_utils.py", line 921, in _check_received_keys
    raise KeyError(f"Missing required keys in `rope_parameters` for 'rope_type'='{rope_type}': {missing_keys}")
KeyError: "Missing required keys in `rope_parameters` for 'rope_type'='default': {'rope_theta'}"

Note that the remote config uses the old-style top-level rope parameters, including rope_theta: https://huggingface.co/tiny-random/glm-4v/blob/main/config.json#L28-L37

Tom Aarsen

Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.

imstevenpmwork · 2026-03-27T15:57:50Z

Hey @zucchini-nlp 👋 Just a quick heads-up that the changes in this PR (shipped in the 5.4.0 release) introduced a breaking change for us. Specifically, the introduction of the @dataclass ... decorators to the PreTrainedConfig class.

Over at lerobot, we were decorating the classes that inherit from PreTrainedConfig with @dataclass, but now given that PretrainedConfig.__init_subclass__ applies it, on the second pass the field(init=False) objects have already been consumed, so the fields appear as non-default arguments after the parent's default arguments. Just wanted to flag this to save some debugging time in case anyone else is suddenly staring at a red CI!

~~Maybe we can add it into the Release Notes: subclasses of PretrainedConfig should no longer be decorated with dataclass.~~ -> My bad, it was already included in the release notes, thanks!

Cheers!

zucchini-nlp · 2026-03-27T16:19:40Z

Interesting, I haven't thought that users might be already using a dataclass. TIL that we have subclasses which are dataclasses and also some that are pydantic BaseClasses

Yea, I think it is the best to remove the decorator. We've been making all subclasses a dataclass since it makes little sense for them to be non-dataclass objects after this PR

Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.

@strict

Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@strict

Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fang Han <fhan0520@gmail.com>

initial commit

2ecd9f5

zucchini-nlp added 9 commits October 1, 2025 19:00

just push for now

5758b7c

maybe not do it for all models, lets see how many models fail now

e9f940b

update

684e799

lets see what esle fails now

43eb47c

nit

868cac6

merge main

b7db732

style

17739ff

delete rope validation

19a2ba1

bart

6095a39

ArthurZucker reviewed Dec 10, 2025

View reviewed changes

src/transformers/models/bart/configuration_bart.py Outdated Show resolved Hide resolved

zucchini-nlp added 5 commits February 3, 2026 14:29

rebase

26892c1

make style

bfe2998

provate rope valid for now, hub complains

d039be1

more updates

a82d894

i love backwards compatibility! Let's check if this will work with re…

b7b0492

…mote code, tmrw

zucchini-nlp mentioned this pull request Feb 4, 2026

Pass kwargs to post init in dataclasses huggingface/huggingface_hub#3771

Merged

pin hf hub 1.4.0

b9aec45

ArthurZucker approved these changes Feb 5, 2026

View reviewed changes

zucchini-nlp added 7 commits February 6, 2026 15:30

merge main

7edc1a2

want to check tests

40d2128

why do we even keep use_return_dict from 6 hyear ago?

e241202

special eos token can be a list in many cases, fix type hints

7b24e38

batch

b79200f

batch

b4e93e3

batch

f011bd4

zucchini-nlp restored the config-validation branch March 16, 2026 19:08

This was referenced Mar 17, 2026

[AMD] CI: stop installing transformers from git main in Qwen 3.5 and GLM-5 nightly tests sgl-project/sglang#20748

Closed

[AMD] Fix CI: update transformers in Qwen 3.5 and GLM-5 nightly tests sgl-project/sglang#20750

Merged

BenjaminBossan mentioned this pull request Mar 17, 2026

FIX Deal with missing attribute on model config huggingface/peft#3109

Merged

hmellor mentioned this pull request Mar 17, 2026

Fix Mistral yarn warning in Transformers v5 vllm-project/vllm#37292

Open

hmellor mentioned this pull request Mar 18, 2026

Fix models which use layer_type_validation for Transformers v5 vllm-project/vllm#37398

Merged

Sai-Suraj-27 mentioned this pull request Mar 18, 2026

Add Jina-Embeddings-V3 Model #44251

Merged

5 tasks

Cyrilvallez mentioned this pull request Mar 18, 2026

Fix and re-run modular converter on examples #44833

Merged

Sai-Suraj-27 mentioned this pull request Mar 19, 2026

Fix few issues in Qwen_3_Omni_Moe #44848

Merged

5 tasks

Rocketknight1 mentioned this pull request Mar 20, 2026

NemotronH implementation can't load NemotronH checkpoints! #44863

Open

4 tasks

tomaarsen mentioned this pull request Mar 26, 2026

tiny-random glm4v configuration can't load due to config validation changes #45030

Closed

Maximellerbach mentioned this pull request Mar 27, 2026

fix(deps): breaking change from transformers 5.4.0 huggingface/lerobot#3231

Merged

HanFa mentioned this pull request Mar 27, 2026

[Transformers v5] HCXVisionForCausalLM vllm-project/vllm#38387

Open

HanFa mentioned this pull request Mar 29, 2026

[Transformers v5] Vendor HCXVisionConfig for compatibility vllm-project/vllm#38447

Open

NanoCode012 mentioned this pull request Mar 30, 2026

feat: move to uv first axolotl-ai-cloud/axolotl#3545

Open

2 tasks

Rocketknight1 mentioned this pull request Mar 30, 2026

v5.4.0 breaks PretrainedConfig field in pydantic model #45070

Open

4 tasks

hmellor mentioned this pull request Apr 1, 2026

[Transformers v5] SarvamMLAForCausalLM vllm-project/vllm#38734

Open

Zelys-DFKH mentioned this pull request Apr 2, 2026

[Transformers v5] Add SarvamMLAConfig to fix SarvamMLAForCausalLM (#38734) vllm-project/vllm#38767

Closed

Vikrantpalle mentioned this pull request Apr 2, 2026

Fix sarvam forward compatibility with transformers v5 vllm-project/vllm#38804

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨 Validate config attributes#41250

🚨 Validate config attributes#41250
zucchini-nlp merged 114 commits intohuggingface:mainfrom
zucchini-nlp:config-validation

zucchini-nlp commented Oct 1, 2025 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

Uh oh!

zucchini-nlp commented Dec 10, 2025

Uh oh!

zucchini-nlp commented Feb 3, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomaarsen commented Mar 26, 2026

Uh oh!

imstevenpmwork commented Mar 27, 2026 •

edited

Loading

Uh oh!

zucchini-nlp commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zucchini-nlp commented Oct 1, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

Uh oh!

zucchini-nlp commented Dec 10, 2025

Uh oh!

zucchini-nlp commented Feb 3, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomaarsen commented Mar 26, 2026

Uh oh!

imstevenpmwork commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zucchini-nlp commented Oct 1, 2025 •

edited by cursor bot

Loading

imstevenpmwork commented Mar 27, 2026 •

edited

Loading