Skip to content

🚨 Validate config attributes#41250

Merged
zucchini-nlp merged 114 commits intohuggingface:mainfrom
zucchini-nlp:config-validation
Mar 16, 2026
Merged

🚨 Validate config attributes#41250
zucchini-nlp merged 114 commits intohuggingface:mainfrom
zucchini-nlp:config-validation

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Oct 1, 2025

What does this PR do?

As per title. Continues from #40793 and supersedes #36534

NOTE: config classes can't accept positional args anymore! I don't think anyone would use pos args anyway but marring the PR as breaking


Note

High Risk
Refactors PreTrainedConfig and many model config classes to @dataclass + huggingface_hub @strict validation, which can change initialization/serialization behavior and reject previously-accepted configs. Also enforces save-time validation and updates defaults/deprecations (e.g., use_return_dict), risking backward-compatibility across model loading and downstream integrations.

Overview
Adds strict config validation. PreTrainedConfig is converted to a @dataclass with huggingface_hub’s @strict, introduces built-in validators (architecture consistency, special token id ranges, layer type checks, output_attentions vs attn_implementation), and runs validate() automatically on save_pretrained.

Modernizes and standardizes model configs. Many model configuration classes are migrated from custom __init__ logic to dataclass fields + __post_init__, moving compatibility logic (e.g., defaulting sub-configs, key/value casting for JSON) into post-init and adding model-specific validate_architecture where needed.

API/behavior tweaks. Deprecates use_return_dict in favor of return_dict (and updates multiple model forward paths accordingly), adjusts RoPE validation ignore-key handling, narrows AutoTokenizer fallback exception handling, and bumps the minimum huggingface-hub requirement to >=1.5.0.

Written by Cursor Bugbot for commit 07095f3. This will update automatically on new commits. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Blocked by #41541 (comment) for now

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Tieme to revive this branch

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Nice, much better and easy to maintain BC with remote code now!

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice!

Comment on lines +196 to +205
# Keys are always strings in JSON so convert ids to int here for id2label and pruned_heads
if self.id2label is None:
self._create_id_label_maps(kwargs.get("num_labels", 2))
else:
if kwargs.get("num_labels") is not None and len(self.id2label) != kwargs.get("num_labels"):
logger.warning(
f"You passed `num_labels={kwargs.get('num_labels')}` which is incompatible to "
f"the `id2label` map of length `{len(self.id2label)}`."
)
self.id2label = {int(key): value for key, value in self.id2label.items()}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a good time to get rid of these general attributes and only have them for models that actually require them?

@zucchini-nlp zucchini-nlp restored the config-validation branch March 16, 2026 19:08
michaelzhang-ai added a commit to michaelzhang-ai/sglang that referenced this pull request Mar 17, 2026
…GLM-5 nightly tests

Transformers PR huggingface/transformers#41250 (merged Mar 16) converts
PretrainedConfig subclasses to @DataClass via __init_subclass__, which
breaks sglang's DeepseekVL2Config (non-default field ordering) and
prevents the server from starting at all.

Remove `pip install git+https://github.com/huggingface/transformers.git`
from all Qwen 3.5 and GLM-5 CI jobs (MI30x, MI35x, ROCm 7.0 and 7.2).
Use the stable transformers shipped in the docker image instead, matching
all other nightly jobs (Grok2, DeepSeek-V3.2, etc.).

Keep mistral-common and lm-eval[api] for Qwen 3.5 tests that need them.
aashay-sarvam added a commit to aashay-sarvam/transformers that referenced this pull request Mar 17, 2026
- Remove torch_dtype="auto" from docs (now default)
- Simplify modular_sarvam_mla.py to only override defaults that differ
  from DeepseekV3Config (no __init__, no workarounds)
- Add @strict(accept_kwargs=True) for config validation (huggingface#41250)
- Regenerate configuration_sarvam_mla.py with dataclass fields and
  __post_init__ pattern
- Hub config.json changes needed: remove head_dim/q_head_dim, change
  rope_scaling.type to "yarn", update architectures

Made-with: Cursor
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Mar 17, 2026
Resolves the failing tests on transformers main branch.

After the change in
huggingface/transformers#41250, the
num_hidden_layers attribute is no longer part of the model config when
serialized to a dict. The _prepare_prompt_learning_config function was
using this attribute. Therefore, we now pass the config before
converting it into a dict and extract the attribute from it.
michaelzhang-ai added a commit to sgl-project/sglang that referenced this pull request Mar 17, 2026
Transformers PR huggingface/transformers#41250 (merged Mar 16) converts
PretrainedConfig subclasses to @DataClass via __init_subclass__, which
breaks sglang's DeepseekVL2Config and prevents the server from starting.

For Qwen 3.5: remove git+transformers entirely — stable version in the
docker image is sufficient (verified passing).

For GLM-5: pin to commit 96f807a33b75 (last commit before the breaking
change) since GLM-5 needs the glm_moe_dsa model type which is only in
the transformers dev branch, not in stable releases yet.
@Sai-Suraj-27 Sai-Suraj-27 mentioned this pull request Mar 18, 2026
5 tasks
@tomaarsen
Copy link
Copy Markdown
Member

I'm getting failures here due to this PR:

from transformers import AutoConfig

config = AutoConfig.from_pretrained("tiny-random/glm-4v")
print(type(config))
Traceback (most recent call last):
  File "[sic]\demo_glm4v_config.py", line 4, in <module>
    config = AutoConfig.from_pretrained("tiny-random/glm-4v")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[sic]\src\transformers\models\auto\configuration_auto.py", line 1484, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[sic]\src\transformers\configuration_utils.py", line 757, in from_dict
    config = cls(**config_dict)
             ^^^^^^^^^^^^^^^^^^
  File "[sic]\transformers\Lib\site-packages\huggingface_hub\dataclasses.py", line 280, in init_with_validate
    cls.validate(self)  # type: ignore [attr-defined]
    ^^^^^^^^^^^^^^^^^^
  File "[sic]\transformers\Lib\site-packages\huggingface_hub\dataclasses.py", line 255, in validate
    validator(self)
  File "[sic]\src\transformers\modeling_rope_utils.py", line 723, in validate_rope
    validation_fn(rope_parameters, ignore_keys=self.ignore_keys_at_rope_validation)
  File "[sic]\src\transformers\modeling_rope_utils.py", line 733, in _validate_default_rope_parameters
    self._check_received_keys(rope_type, received_keys, required_keys, ignore_keys=ignore_keys)
  File "[sic]\src\transformers\modeling_rope_utils.py", line 921, in _check_received_keys
    raise KeyError(f"Missing required keys in `rope_parameters` for 'rope_type'='{rope_type}': {missing_keys}")
KeyError: "Missing required keys in `rope_parameters` for 'rope_type'='default': {'rope_theta'}"

Note that the remote config uses the old-style top-level rope parameters, including rope_theta: https://huggingface.co/tiny-random/glm-4v/blob/main/config.json#L28-L37

  • Tom Aarsen

BenjaminBossan added a commit to huggingface/peft that referenced this pull request Mar 27, 2026
Resolves the failing tests on transformers main branch.

After the change in
huggingface/transformers#41250, the
num_hidden_layers attribute is no longer part of the model config when
serialized to a dict. The _prepare_prompt_learning_config function was
using this attribute. Therefore, we now pass the config before
converting it into a dict and extract the attribute from it.
@imstevenpmwork
Copy link
Copy Markdown
Contributor

imstevenpmwork commented Mar 27, 2026

Hey @zucchini-nlp 👋 Just a quick heads-up that the changes in this PR (shipped in the 5.4.0 release) introduced a breaking change for us. Specifically, the introduction of the @dataclass ... decorators to the PreTrainedConfig class.

Over at lerobot, we were decorating the classes that inherit from PreTrainedConfig with @dataclass, but now given that PretrainedConfig.__init_subclass__ applies it, on the second pass the field(init=False) objects have already been consumed, so the fields appear as non-default arguments after the parent's default arguments. Just wanted to flag this to save some debugging time in case anyone else is suddenly staring at a red CI!

Maybe we can add it into the Release Notes: subclasses of PretrainedConfig should no longer be decorated with dataclass. -> My bad, it was already included in the release notes, thanks!

Cheers!

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Interesting, I haven't thought that users might be already using a dataclass. TIL that we have subclasses which are dataclasses and also some that are pydantic BaseClasses

Yea, I think it is the best to remove the decorator. We've been making all subclasses a dataclass since it makes little sense for them to be non-dataclass objects after this PR

nludd25 pushed a commit to nludd25/peft that referenced this pull request Mar 28, 2026
Resolves the failing tests on transformers main branch.

After the change in
huggingface/transformers#41250, the
num_hidden_layers attribute is no longer part of the model config when
serialized to a dict. The _prepare_prompt_learning_config function was
using this attribute. Therefore, we now pass the config before
converting it into a dict and extract the attribute from it.
HanFa added a commit to HanFa/vllm that referenced this pull request Mar 29, 2026
Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5
compatibility. The upstream remote code config does not handle empty
initialization (text_config=None), which breaks v5's @strict config
validation added in huggingface/transformers#41250.

Fixes: vllm-project#38387

TODO: Remove vendored config once HyperCLOVAX is upstreamed to
transformers. Tracking PR: huggingface/transformers#44956

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HanFa added a commit to HanFa/vllm that referenced this pull request Mar 31, 2026
Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5
compatibility. The upstream remote code config does not handle empty
initialization (text_config=None), which breaks v5's @strict config
validation added in huggingface/transformers#41250.

Fixes: vllm-project#38387

TODO: Remove vendored config once HyperCLOVAX is upstreamed to
transformers. Tracking PR: huggingface/transformers#44956

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fang Han <fhan0520@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants