Skip to content

Fix vllm cis#45139

Merged
ArthurZucker merged 29 commits intomainfrom
fix-vllm-cis
Apr 8, 2026
Merged

Fix vllm cis#45139
ArthurZucker merged 29 commits intomainfrom
fix-vllm-cis

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

What does this PR do?

More fixes

This is very important as most embeddings are tied, and they are huge params (vocabularies are often 256k), so
running inits on them is very costly."""
for tied_param in self.all_tied_weights_keys.keys():
for tied_param in getattr(self, "all_tied_weights_keys", {}).keys():
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix remote code

token_value = self._special_tokens_map.get(key_without_id)
# Use __dict__.get to avoid recursive __getattr__ when _special_tokens_map
# is not yet initialized (e.g. during fast tokenizer __init__)
token_value = self.__dict__.get("_special_tokens_map", {}).get(key_without_id)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same easy fix for remote code

Comment thread src/transformers/modeling_rope_utils.py Outdated
# from the model config. You can append new {'rope_type': callable} pairs to this rope_parameters to enable custom RoPE
# parameterizations, as long as the callable has the same signature.
ROPE_INIT_FUNCTIONS = {
"default": _compute_default_rope_parameters,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote code BC?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zucchini-nlp

It is true that remote code won't have it, but we likely would also need to refactor a lot of models, seems risky; especially for models that do have a different default init so we need to check if some code exists first and then use it as fallback

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker marked this pull request as ready for review April 2, 2026 15:15
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

run-slow: aimv2, altclip, audioflamingo3, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, clip

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aimv2", "models/altclip", "models/audioflamingo3", "models/bart", "models/beit", "models/bert", "models/bert_generation", "models/big_bird", "models/bigbird_pegasus", "models/biogpt", "models/blenderbot", "models/blenderbot_small", "models/blip", "models/bridgetower", "models/camembert", "models/clip"]
quantizations: []

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, most is nit except for 2

  • Whisper/audio flamingo seems weird to me re consuming bsz / leaving bsz optionally
  • RoPE default init --> does this still call for example ernie vl default compute; would likely need to check def _init_weights here what happens

Comment thread src/transformers/models/audioflamingo3/modeling_audioflamingo3.py Outdated
Comment thread src/transformers/models/audioflamingo3/modeling_audioflamingo3.py
Comment thread src/transformers/models/dinat/modeling_dinat.py
Comment thread src/transformers/models/emu3/modeling_emu3.py Outdated
Comment thread src/transformers/models/eomt/modeling_eomt.py Outdated
Comment thread src/transformers/models/vjepa2/modeling_vjepa2.py Outdated
Comment thread src/transformers/models/whisper/modeling_whisper.py Outdated
Comment thread src/transformers/modeling_rope_utils.py Outdated
# from the model config. You can append new {'rope_type': callable} pairs to this rope_parameters to enable custom RoPE
# parameterizations, as long as the callable has the same signature.
ROPE_INIT_FUNCTIONS = {
"default": _compute_default_rope_parameters,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zucchini-nlp

It is true that remote code won't have it, but we likely would also need to refactor a lot of models, seems risky; especially for models that do have a different default init so we need to check if some code exists first and then use it as fallback

def _validate_default_rope_parameters(self, rope_parameters: dict, ignore_keys: set | None = None):
required_keys = {"rope_type", "rope_theta"}
required_keys = {"rope_type"}
optional_keys = {"rope_theta"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely needs to be done elsewhere then too? I think all need rope theta?

class ConfigSubclassKwOnlyTest(unittest.TestCase):
"""Test that config subclasses with non-default fields following parent default fields
no longer raise TypeError (fixed by kw_only=True in __init_subclass__). Regression
test for https://github.com/huggingface/transformers/issues/XXXX."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any issue there we could link or an example?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3c8d59b1 workflow commit (merge commit)
PR 08893759 branch commit (from PR)
main a594e09e base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

Comment thread tests/utils/test_modeling_rope_utils.py Outdated
Comment on lines +526 to +529
config = LlamaConfig()
config.rope_parameters = {"rope_type": "default", "rope_theta": 10000.0, "unknown_param": 1}
with self.assertLogs("transformers.modeling_rope_utils", level="WARNING"):
config.validate_rope()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zucchini-nlp IDK if that's okay with you?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, tbh i thought it was already a warning, instead of error in the past. Maybe smth changed recently

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

run-slow: aimv2, altclip, audioflamingo3, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, clip

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aimv2", "models/altclip", "models/audioflamingo3", "models/bart", "models/beit", "models/bert", "models/bert_generation", "models/big_bird", "models/bigbird_pegasus", "models/biogpt", "models/blenderbot", "models/blenderbot_small", "models/blip", "models/bridgetower", "models/camembert", "models/clip"]
quantizations: []

Comment on lines +804 to +805
required_keys = {"rope_type"}
optional_keys = {"rope_theta"}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this kinda defeats the point of validation because a RoPE dict with no theta isn't valid for our modules

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but we always default to default_theta if its not there no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation always happens after teh defaults are set, so ideally it shouldn't raise an error. Do we know why the theta was missing?

Comment on lines +42 to +43
if rope_type == "default":
continue # "default" is always valid with just rope_theta
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default isn't in all_rope_types anyway, no?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 5a8a8309 workflow commit (merge commit)
PR f924677b branch commit (from PR)
main 1897dd05 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

ArthurZucker and others added 3 commits April 8, 2026 07:56
- Add _compute_default_rope_parameters and register as ROPE_INIT_FUNCTIONS["default"]
- Accept ignore_keys kwarg in validate_rope for backward compat with vllm
- Remove RopeDefaultTypeTest class (redundant with existing tests)
- Fix test_rope_validation to skip "default" rope type in KeyError checks

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ion docstrings

rope_theta defaults to default_theta when omitted from serialized configs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, altclip, audioflamingo3, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, clip

Copy link
Copy Markdown
Collaborator Author

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked:

class RotaryEmbeddingConfigMixin:
    """
    A Mixin containing the functionality to standardize and validate RoPE parameters.
    """

    default_theta = 10_000.0

faire to say its optional

@ArthurZucker ArthurZucker merged commit d081c71 into main Apr 8, 2026
28 of 30 checks passed
@ArthurZucker ArthurZucker deleted the fix-vllm-cis branch April 8, 2026 11:19
ArthurZucker added a commit that referenced this pull request Apr 9, 2026
* fixes

* safe linspace?

* up

* up

* remove batch

* nits

* style

* fix repo

* more

* update

* update

* revert

* up

* fix copy

* work?

* adress some of the comments

* update

* fixes

* fix-repo!

* styling post merge

* don't pass default

* up

* up

* push

* fix vllm compat: add default rope type and ignore_keys support

- Add _compute_default_rope_parameters and register as ROPE_INIT_FUNCTIONS["default"]
- Accept ignore_keys kwarg in validate_rope for backward compat with vllm
- Remove RopeDefaultTypeTest class (redundant with existing tests)
- Fix test_rope_validation to skip "default" rope type in KeyError checks

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* revert rope utils additions, keep only test removal

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* docs: mark rope_theta as optional in RopeParameters and compute function docstrings

rope_theta defaults to default_theta when omitted from serialized configs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
bigshanedogg pushed a commit to bigshanedogg/transformers that referenced this pull request Apr 9, 2026
* fixes

* safe linspace?

* up

* up

* remove batch

* nits

* style

* fix repo

* more

* update

* update

* revert

* up

* fix copy

* work?

* adress some of the comments

* update

* fixes

* fix-repo!

* styling post merge

* don't pass default

* up

* up

* push

* fix vllm compat: add default rope type and ignore_keys support

- Add _compute_default_rope_parameters and register as ROPE_INIT_FUNCTIONS["default"]
- Accept ignore_keys kwarg in validate_rope for backward compat with vllm
- Remove RopeDefaultTypeTest class (redundant with existing tests)
- Fix test_rope_validation to skip "default" rope type in KeyError checks

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* revert rope utils additions, keep only test removal

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* docs: mark rope_theta as optional in RopeParameters and compute function docstrings

rope_theta defaults to default_theta when omitted from serialized configs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Transformers v5] IsaacForConditionalGeneration

4 participants