fix(rope): read original_max_position_embeddings from yarn validator's argument by bzantium · Pull Request #45887 · huggingface/transformers

bzantium · 2026-05-11T08:07:56Z

What does this PR do?

_validate_yarn_rope_parameters is called by validate_rope once per per-attention-type sub-dict, with the sub-dict passed as the rope_parameters argument. The factor consistency check inside the function however reads original_max_position_embeddings from self.rope_parameters[...] instead of from the argument:

def _validate_yarn_rope_parameters(self, rope_parameters: dict, ignore_keys=None):
    ...
    original_max_position_embeddings = self.rope_parameters["original_max_position_embeddings"]
    #                                  ^^^^^^^^^^^^^^^^^^^^
    # Full nested dict here, not the per-type sub-dict.

This raises KeyError for any config that keeps the nested {full_attention: ..., sliding_attention: ...} shape — the per-type sub-dict containing original_max_position_embeddings is inside one of those top-level keys, not at the top level.

The fix is to read from the function argument that validate_rope already populates correctly.

Why no in-tree model hits this today

Searched src/transformers/models/*/configuration_*.py for the nested shape (grep -l '"full_attention":'):

gemma3, gemma3n, gemma4, laguna, modernbert, modernbert_decoder, t5gemma2

None of them combine the nested shape with rope_type=yarn, so the bug stays dormant inside the repo. It surfaces for downstream models that do — e.g. one I encountered uses yarn for full_attention layers and default for sliding_attention to apply YaRN only to global-attention layers while keeping unscaled RoPE on sliding-attention layers (Gemma3-style split with the YaRN twist).

Reproducer

from transformers import PreTrainedConfig


class _NestedRopeConfig(PreTrainedConfig):
    model_type = "_repro"

    def __init__(self, **kwargs):
        self.layer_types = ["full_attention", "sliding_attention"]
        self.num_hidden_layers = 2
        self.max_position_embeddings = 35000
        self.head_dim = 128
        self.hidden_size = 1280
        self.num_attention_heads = 32
        nested = {
            "full_attention": {
                "rope_type": "yarn",
                "rope_theta": 10000.0,
                "factor": 40.0,
                "original_max_position_embeddings": 4096,
            },
            "sliding_attention": {
                "rope_type": "default",
                "rope_theta": 10000.0,
            },
        }
        self.rope_parameters = nested
        # Snapshot before super().__init__ so convert_rope_params_to_dict
        # cannot pollute the top level with a `rope_theta` sibling key.
        snapshot = {k: dict(v) for k, v in nested.items()}
        super().__init__(**kwargs)
        self.rope_parameters = snapshot


_NestedRopeConfig().validate_rope()

Before the fix:

KeyError: 'original_max_position_embeddings'
  at src/transformers/modeling_rope_utils.py:879

After the fix: validate_rope returns cleanly (only the existing factor-mismatch info-warning fires, which is unrelated and preserved).

Changes

src/transformers/modeling_rope_utils.py: 1-character change — self.rope_parameters → rope_parameters inside _validate_yarn_rope_parameters. All sibling validators in the same file (_validate_default_rope_parameters, _validate_linear_rope_parameters, _validate_dynamic_rope_parameters, _validate_longrope_rope_parameters, _validate_llama3_rope_parameters) already read from the argument, so this brings yarn in line.

Tests

No new test added — the reproducer requires constructing a custom config with the nested shape, and there is no existing test fixture in tests/utils/test_modeling_rope_utils.py that exercises that path (no in-tree model uses nested + yarn). Happy to add a test if you'd prefer; let me know which directory you'd want it in (tests/utils/ or alongside one of the gemma3/modernbert configs that use the nested shape).

I ran the reproducer above against main (KeyError) and against this branch (clean) to confirm.

AI assistance disclosure

I used Claude Code to help draft this PR and the reproducer. I diagnosed the bug myself from a downstream model load failure, verified the fix in-place with the reproducer above, and reviewed the change line-by-line.

Who can review?

@Cyrilvallez @ArthurZucker — RoPE / config validation

…s argument `_validate_yarn_rope_parameters` is called by `validate_rope` once per per-attention-type sub-dict, with the sub-dict passed as the `rope_parameters` argument. The `factor` consistency check inside the function however reads `original_max_position_embeddings` from `self.rope_parameters[...]` instead of from the argument, which raises `KeyError` for any config that keeps the nested `{full_attention, sliding_attention, ...}` shape — the per-type sub-dicts are inside one of those keys, not at the top level. Other rope validators in the same file (`_validate_default_rope_parameters`, `_validate_linear_rope_parameters`, etc.) all read from the function argument, so this matches their pattern.

zucchini-nlp

Good catch, can you adjust tests in tests/utils/test_modeling_rope_utils (if you aren't an AI code agent)?

bzantium · 2026-05-11T13:05:41Z

@zucchini-nlp Thanks for the review! Test added and no, a human :) I used Claude Code only as a drafting aid.

zucchini-nlp · 2026-05-11T15:48:35Z

+    def test_yarn_validation_with_per_attention_type_nested_rope(self):
+        """A yarn entry inside nested per-attention-type `rope_parameters` validates cleanly."""
+        config = LlamaConfig()
+        config.layer_types = ["full_attention", "sliding_attention"]
+        config.rope_parameters = {
+            "full_attention": {
+                "rope_type": "yarn",
+                "rope_theta": 10000.0,
+                "factor": 2.0,
+                "original_max_position_embeddings": int(config.max_position_embeddings / 2.0),
+            },
+            "sliding_attention": {"rope_type": "default", "rope_theta": 10000.0},
+        }
+        config.validate_rope()
+


Niiice, lets extend it to all rope types. Above we have a test for validation without layer types, so we can mimic but this time add config.layer_types

Extended in 4a08efc, mirroring test_rope_validation's two loops (missing-key and exclusive-param) with config.layer_types set.

… rope_parameters

zucchini-nlp

Thank you!

HuggingFaceDocBuilderDev · 2026-05-12T07:55:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…s argument (huggingface#45887) * fix(rope): read original_max_position_embeddings from yarn validator's argument `_validate_yarn_rope_parameters` is called by `validate_rope` once per per-attention-type sub-dict, with the sub-dict passed as the `rope_parameters` argument. The `factor` consistency check inside the function however reads `original_max_position_embeddings` from `self.rope_parameters[...]` instead of from the argument, which raises `KeyError` for any config that keeps the nested `{full_attention, sliding_attention, ...}` shape — the per-type sub-dicts are inside one of those keys, not at the top level. Other rope validators in the same file (`_validate_default_rope_parameters`, `_validate_linear_rope_parameters`, etc.) all read from the function argument, so this matches their pattern. * test(rope): mirror test_rope_validation for per-attention-type nested rope_parameters * test(rope): apply ruff format to nested-rope test --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

zucchini-nlp reviewed May 11, 2026

View reviewed changes

bzantium force-pushed the fix-yarn-validator-nested-rope branch from 3612aa3 to fa4ef81 Compare May 11, 2026 13:05

zucchini-nlp reviewed May 11, 2026

View reviewed changes

bzantium force-pushed the fix-yarn-validator-nested-rope branch from fa4ef81 to 1654714 Compare May 12, 2026 05:54

test(rope): mirror test_rope_validation for per-attention-type nested…

4a08efc

… rope_parameters

bzantium force-pushed the fix-yarn-validator-nested-rope branch from 1654714 to 4a08efc Compare May 12, 2026 05:54

test(rope): apply ruff format to nested-rope test

5bd9811

zucchini-nlp approved these changes May 12, 2026

View reviewed changes

Merge branch 'main' into fix-yarn-validator-nested-rope

4751236

zucchini-nlp added this pull request to the merge queue May 12, 2026

Merged via the queue into huggingface:main with commit dfcb1a3 May 12, 2026
29 checks passed

bzantium deleted the fix-yarn-validator-nested-rope branch May 14, 2026 00:55

Xarbirus mentioned this pull request Jun 1, 2026

model: add Mellum architecture ggml-org/llama.cpp#23966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rope): read original_max_position_embeddings from yarn validator's argument#45887

fix(rope): read original_max_position_embeddings from yarn validator's argument#45887
zucchini-nlp merged 4 commits into
huggingface:mainfrom
bzantium:fix-yarn-validator-nested-rope

bzantium commented May 11, 2026

Uh oh!

zucchini-nlp left a comment •

edited

Loading

Uh oh!

bzantium commented May 11, 2026 •

edited

Loading

Uh oh!

zucchini-nlp May 11, 2026

Uh oh!

bzantium May 12, 2026

Uh oh!

zucchini-nlp left a comment •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bzantium commented May 11, 2026

What does this PR do?

Why no in-tree model hits this today

Reproducer

Changes

Tests

AI assistance disclosure

Who can review?

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bzantium commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp May 11, 2026

Choose a reason for hiding this comment

Uh oh!

bzantium May 12, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zucchini-nlp left a comment •

edited

Loading

bzantium commented May 11, 2026 •

edited

Loading

zucchini-nlp left a comment •

edited

Loading