🚨 Validate config attributes#41250
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Blocked by #41541 (comment) for now |
|
Tieme to revive this branch |
|
Nice, much better and easy to maintain BC with remote code now! |
| # Keys are always strings in JSON so convert ids to int here for id2label and pruned_heads | ||
| if self.id2label is None: | ||
| self._create_id_label_maps(kwargs.get("num_labels", 2)) | ||
| else: | ||
| if kwargs.get("num_labels") is not None and len(self.id2label) != kwargs.get("num_labels"): | ||
| logger.warning( | ||
| f"You passed `num_labels={kwargs.get('num_labels')}` which is incompatible to " | ||
| f"the `id2label` map of length `{len(self.id2label)}`." | ||
| ) | ||
| self.id2label = {int(key): value for key, value in self.id2label.items()} |
There was a problem hiding this comment.
is it a good time to get rid of these general attributes and only have them for models that actually require them?
…GLM-5 nightly tests Transformers PR huggingface/transformers#41250 (merged Mar 16) converts PretrainedConfig subclasses to @DataClass via __init_subclass__, which breaks sglang's DeepseekVL2Config (non-default field ordering) and prevents the server from starting at all. Remove `pip install git+https://github.com/huggingface/transformers.git` from all Qwen 3.5 and GLM-5 CI jobs (MI30x, MI35x, ROCm 7.0 and 7.2). Use the stable transformers shipped in the docker image instead, matching all other nightly jobs (Grok2, DeepSeek-V3.2, etc.). Keep mistral-common and lm-eval[api] for Qwen 3.5 tests that need them.
- Remove torch_dtype="auto" from docs (now default) - Simplify modular_sarvam_mla.py to only override defaults that differ from DeepseekV3Config (no __init__, no workarounds) - Add @strict(accept_kwargs=True) for config validation (huggingface#41250) - Regenerate configuration_sarvam_mla.py with dataclass fields and __post_init__ pattern - Hub config.json changes needed: remove head_dim/q_head_dim, change rope_scaling.type to "yarn", update architectures Made-with: Cursor
Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.
Transformers PR huggingface/transformers#41250 (merged Mar 16) converts PretrainedConfig subclasses to @DataClass via __init_subclass__, which breaks sglang's DeepseekVL2Config and prevents the server from starting. For Qwen 3.5: remove git+transformers entirely — stable version in the docker image is sufficient (verified passing). For GLM-5: pin to commit 96f807a33b75 (last commit before the breaking change) since GLM-5 needs the glm_moe_dsa model type which is only in the transformers dev branch, not in stable releases yet.
|
I'm getting failures here due to this PR: from transformers import AutoConfig
config = AutoConfig.from_pretrained("tiny-random/glm-4v")
print(type(config))Note that the remote config uses the old-style top-level rope parameters, including rope_theta: https://huggingface.co/tiny-random/glm-4v/blob/main/config.json#L28-L37
|
Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.
|
Hey @zucchini-nlp 👋 Just a quick heads-up that the changes in this PR (shipped in the 5.4.0 release) introduced a breaking change for us. Specifically, the introduction of the Over at lerobot, we were decorating the classes that inherit from
Cheers! |
|
Interesting, I haven't thought that users might be already using a dataclass. TIL that we have subclasses which are dataclasses and also some that are pydantic BaseClasses Yea, I think it is the best to remove the decorator. We've been making all subclasses a |
Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.
Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vendor the HyperCLOVAX Vision config into vLLM to fix transformers v5 compatibility. The upstream remote code config does not handle empty initialization (text_config=None), which breaks v5's @strict config validation added in huggingface/transformers#41250. Fixes: vllm-project#38387 TODO: Remove vendored config once HyperCLOVAX is upstreamed to transformers. Tracking PR: huggingface/transformers#44956 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fang Han <fhan0520@gmail.com>
What does this PR do?
As per title. Continues from #40793 and supersedes #36534
NOTE: config classes can't accept positional args anymore! I don't think anyone would use pos args anyway but marring the PR as breaking
Note
High Risk
Refactors
PreTrainedConfigand many model config classes to@dataclass+huggingface_hub@strictvalidation, which can change initialization/serialization behavior and reject previously-accepted configs. Also enforces save-time validation and updates defaults/deprecations (e.g.,use_return_dict), risking backward-compatibility across model loading and downstream integrations.Overview
Adds strict config validation.
PreTrainedConfigis converted to a@dataclasswithhuggingface_hub’s@strict, introduces built-in validators (architecture consistency, special token id ranges, layer type checks,output_attentionsvsattn_implementation), and runsvalidate()automatically onsave_pretrained.Modernizes and standardizes model configs. Many model configuration classes are migrated from custom
__init__logic to dataclass fields +__post_init__, moving compatibility logic (e.g., defaulting sub-configs, key/value casting for JSON) into post-init and adding model-specificvalidate_architecturewhere needed.API/behavior tweaks. Deprecates
use_return_dictin favor ofreturn_dict(and updates multiple model forward paths accordingly), adjusts RoPE validation ignore-key handling, narrows AutoTokenizer fallback exception handling, and bumps the minimumhuggingface-hubrequirement to>=1.5.0.Written by Cursor Bugbot for commit 07095f3. This will update automatically on new commits. Configure here.