fix(rope): read original_max_position_embeddings from yarn validator's argument#45887
Merged
zucchini-nlp merged 4 commits intoMay 12, 2026
Merged
Conversation
…s argument
`_validate_yarn_rope_parameters` is called by `validate_rope` once per
per-attention-type sub-dict, with the sub-dict passed as the `rope_parameters`
argument. The `factor` consistency check inside the function however reads
`original_max_position_embeddings` from `self.rope_parameters[...]` instead
of from the argument, which raises `KeyError` for any config that keeps the
nested `{full_attention, sliding_attention, ...}` shape — the per-type
sub-dicts are inside one of those keys, not at the top level.
Other rope validators in the same file (`_validate_default_rope_parameters`,
`_validate_linear_rope_parameters`, etc.) all read from the function argument,
so this matches their pattern.
3612aa3 to
fa4ef81
Compare
Contributor
Author
|
@zucchini-nlp Thanks for the review! Test added and no, a human :) I used Claude Code only as a drafting aid. |
Comment on lines
+139
to
+153
| def test_yarn_validation_with_per_attention_type_nested_rope(self): | ||
| """A yarn entry inside nested per-attention-type `rope_parameters` validates cleanly.""" | ||
| config = LlamaConfig() | ||
| config.layer_types = ["full_attention", "sliding_attention"] | ||
| config.rope_parameters = { | ||
| "full_attention": { | ||
| "rope_type": "yarn", | ||
| "rope_theta": 10000.0, | ||
| "factor": 2.0, | ||
| "original_max_position_embeddings": int(config.max_position_embeddings / 2.0), | ||
| }, | ||
| "sliding_attention": {"rope_type": "default", "rope_theta": 10000.0}, | ||
| } | ||
| config.validate_rope() | ||
|
|
Member
There was a problem hiding this comment.
Niiice, lets extend it to all rope types. Above we have a test for validation without layer types, so we can mimic but this time add config.layer_types
Contributor
Author
There was a problem hiding this comment.
Extended in 4a08efc, mirroring test_rope_validation's two loops (missing-key and exclusive-param) with config.layer_types set.
fa4ef81 to
1654714
Compare
1654714 to
4a08efc
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
jp1924
pushed a commit
to jp1924/transformers
that referenced
this pull request
May 18, 2026
…s argument (huggingface#45887) * fix(rope): read original_max_position_embeddings from yarn validator's argument `_validate_yarn_rope_parameters` is called by `validate_rope` once per per-attention-type sub-dict, with the sub-dict passed as the `rope_parameters` argument. The `factor` consistency check inside the function however reads `original_max_position_embeddings` from `self.rope_parameters[...]` instead of from the argument, which raises `KeyError` for any config that keeps the nested `{full_attention, sliding_attention, ...}` shape — the per-type sub-dicts are inside one of those keys, not at the top level. Other rope validators in the same file (`_validate_default_rope_parameters`, `_validate_linear_rope_parameters`, etc.) all read from the function argument, so this matches their pattern. * test(rope): mirror test_rope_validation for per-attention-type nested rope_parameters * test(rope): apply ruff format to nested-rope test --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
_validate_yarn_rope_parametersis called byvalidate_ropeonce per per-attention-type sub-dict, with the sub-dict passed as therope_parametersargument. Thefactorconsistency check inside the function however readsoriginal_max_position_embeddingsfromself.rope_parameters[...]instead of from the argument:This raises
KeyErrorfor any config that keeps the nested{full_attention: ..., sliding_attention: ...}shape — the per-type sub-dict containingoriginal_max_position_embeddingsis inside one of those top-level keys, not at the top level.The fix is to read from the function argument that
validate_ropealready populates correctly.Why no in-tree model hits this today
Searched
src/transformers/models/*/configuration_*.pyfor the nested shape (grep -l '"full_attention":'):None of them combine the nested shape with
rope_type=yarn, so the bug stays dormant inside the repo. It surfaces for downstream models that do — e.g. one I encountered usesyarnforfull_attentionlayers anddefaultforsliding_attentionto apply YaRN only to global-attention layers while keeping unscaled RoPE on sliding-attention layers (Gemma3-style split with the YaRN twist).Reproducer
Before the fix:
After the fix:
validate_ropereturns cleanly (only the existing factor-mismatch info-warning fires, which is unrelated and preserved).Changes
src/transformers/modeling_rope_utils.py: 1-character change —self.rope_parameters→rope_parametersinside_validate_yarn_rope_parameters. All sibling validators in the same file (_validate_default_rope_parameters,_validate_linear_rope_parameters,_validate_dynamic_rope_parameters,_validate_longrope_rope_parameters,_validate_llama3_rope_parameters) already read from the argument, so this brings yarn in line.Tests
No new test added — the reproducer requires constructing a custom config with the nested shape, and there is no existing test fixture in
tests/utils/test_modeling_rope_utils.pythat exercises that path (no in-tree model uses nested + yarn). Happy to add a test if you'd prefer; let me know which directory you'd want it in (tests/utils/or alongside one of the gemma3/modernbert configs that use the nested shape).I ran the reproducer above against
main(KeyError) and against this branch (clean) to confirm.AI assistance disclosure
I used Claude Code to help draft this PR and the reproducer. I diagnosed the bug myself from a downstream model load failure, verified the fix in-place with the reproducer above, and reviewed the change line-by-line.
Who can review?
@Cyrilvallez @ArthurZucker — RoPE / config validation