Conversation
8318200 to
f36ea56
Compare
| This is very important as most embeddings are tied, and they are huge params (vocabularies are often 256k), so | ||
| running inits on them is very costly.""" | ||
| for tied_param in self.all_tied_weights_keys.keys(): | ||
| for tied_param in getattr(self, "all_tied_weights_keys", {}).keys(): |
There was a problem hiding this comment.
fix remote code
| token_value = self._special_tokens_map.get(key_without_id) | ||
| # Use __dict__.get to avoid recursive __getattr__ when _special_tokens_map | ||
| # is not yet initialized (e.g. during fast tokenizer __init__) | ||
| token_value = self.__dict__.get("_special_tokens_map", {}).get(key_without_id) |
There was a problem hiding this comment.
same easy fix for remote code
| # from the model config. You can append new {'rope_type': callable} pairs to this rope_parameters to enable custom RoPE | ||
| # parameterizations, as long as the callable has the same signature. | ||
| ROPE_INIT_FUNCTIONS = { | ||
| "default": _compute_default_rope_parameters, |
There was a problem hiding this comment.
remote code BC?
There was a problem hiding this comment.
It is true that remote code won't have it, but we likely would also need to refactor a lot of models, seems risky; especially for models that do have a different default init so we need to check if some code exists first and then use it as fallback
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: aimv2, altclip, audioflamingo3, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, clip |
|
This comment contains models: ["models/aimv2", "models/altclip", "models/audioflamingo3", "models/bart", "models/beit", "models/bert", "models/bert_generation", "models/big_bird", "models/bigbird_pegasus", "models/biogpt", "models/blenderbot", "models/blenderbot_small", "models/blip", "models/bridgetower", "models/camembert", "models/clip"] |
vasqu
left a comment
There was a problem hiding this comment.
Lgtm, most is nit except for 2
- Whisper/audio flamingo seems weird to me re consuming bsz / leaving bsz optionally
- RoPE default init --> does this still call for example ernie vl default compute; would likely need to check def _init_weights here what happens
| # from the model config. You can append new {'rope_type': callable} pairs to this rope_parameters to enable custom RoPE | ||
| # parameterizations, as long as the callable has the same signature. | ||
| ROPE_INIT_FUNCTIONS = { | ||
| "default": _compute_default_rope_parameters, |
There was a problem hiding this comment.
It is true that remote code won't have it, but we likely would also need to refactor a lot of models, seems risky; especially for models that do have a different default init so we need to check if some code exists first and then use it as fallback
| def _validate_default_rope_parameters(self, rope_parameters: dict, ignore_keys: set | None = None): | ||
| required_keys = {"rope_type", "rope_theta"} | ||
| required_keys = {"rope_type"} | ||
| optional_keys = {"rope_theta"} |
There was a problem hiding this comment.
likely needs to be done elsewhere then too? I think all need rope theta?
| class ConfigSubclassKwOnlyTest(unittest.TestCase): | ||
| """Test that config subclasses with non-default fields following parent default fields | ||
| no longer raise TypeError (fixed by kw_only=True in __init_subclass__). Regression | ||
| test for https://github.com/huggingface/transformers/issues/XXXX.""" |
There was a problem hiding this comment.
Any issue there we could link or an example?
| config = LlamaConfig() | ||
| config.rope_parameters = {"rope_type": "default", "rope_theta": 10000.0, "unknown_param": 1} | ||
| with self.assertLogs("transformers.modeling_rope_utils", level="WARNING"): | ||
| config.validate_rope() |
There was a problem hiding this comment.
cc @zucchini-nlp IDK if that's okay with you?
There was a problem hiding this comment.
yep, tbh i thought it was already a warning, instead of error in the past. Maybe smth changed recently
|
run-slow: aimv2, altclip, audioflamingo3, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, clip |
|
This comment contains models: ["models/aimv2", "models/altclip", "models/audioflamingo3", "models/bart", "models/beit", "models/bert", "models/bert_generation", "models/big_bird", "models/bigbird_pegasus", "models/biogpt", "models/blenderbot", "models/blenderbot_small", "models/blip", "models/bridgetower", "models/camembert", "models/clip"] |
| required_keys = {"rope_type"} | ||
| optional_keys = {"rope_theta"} |
There was a problem hiding this comment.
yep, this kinda defeats the point of validation because a RoPE dict with no theta isn't valid for our modules
There was a problem hiding this comment.
yeah but we always default to default_theta if its not there no?
There was a problem hiding this comment.
validation always happens after teh defaults are set, so ideally it shouldn't raise an error. Do we know why the theta was missing?
| if rope_type == "default": | ||
| continue # "default" is always valid with just rope_theta |
There was a problem hiding this comment.
default isn't in all_rope_types anyway, no?
- Add _compute_default_rope_parameters and register as ROPE_INIT_FUNCTIONS["default"] - Accept ignore_keys kwarg in validate_rope for backward compat with vllm - Remove RopeDefaultTypeTest class (redundant with existing tests) - Fix test_rope_validation to skip "default" rope type in KeyError checks Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ion docstrings rope_theta defaults to default_theta when omitted from serialized configs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, altclip, audioflamingo3, bart, beit, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, clip |
ArthurZucker
left a comment
There was a problem hiding this comment.
Checked:
class RotaryEmbeddingConfigMixin:
"""
A Mixin containing the functionality to standardize and validate RoPE parameters.
"""
default_theta = 10_000.0faire to say its optional
* fixes * safe linspace? * up * up * remove batch * nits * style * fix repo * more * update * update * revert * up * fix copy * work? * adress some of the comments * update * fixes * fix-repo! * styling post merge * don't pass default * up * up * push * fix vllm compat: add default rope type and ignore_keys support - Add _compute_default_rope_parameters and register as ROPE_INIT_FUNCTIONS["default"] - Accept ignore_keys kwarg in validate_rope for backward compat with vllm - Remove RopeDefaultTypeTest class (redundant with existing tests) - Fix test_rope_validation to skip "default" rope type in KeyError checks Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * revert rope utils additions, keep only test removal Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * docs: mark rope_theta as optional in RopeParameters and compute function docstrings rope_theta defaults to default_theta when omitted from serialized configs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fixes * safe linspace? * up * up * remove batch * nits * style * fix repo * more * update * update * revert * up * fix copy * work? * adress some of the comments * update * fixes * fix-repo! * styling post merge * don't pass default * up * up * push * fix vllm compat: add default rope type and ignore_keys support - Add _compute_default_rope_parameters and register as ROPE_INIT_FUNCTIONS["default"] - Accept ignore_keys kwarg in validate_rope for backward compat with vllm - Remove RopeDefaultTypeTest class (redundant with existing tests) - Fix test_rope_validation to skip "default" rope type in KeyError checks Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * revert rope utils additions, keep only test removal Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * docs: mark rope_theta as optional in RopeParameters and compute function docstrings rope_theta defaults to default_theta when omitted from serialized configs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
What does this PR do?
More fixes