🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255
🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255ArthurZucker merged 25 commits intohuggingface:mainfrom
rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
rotary_partial_emb to RopeParams and delete unnecessary code 🔪 rotary_partial_emb to RopeParams and delete unnecessary code 🔪
| def get_standardized_rope_params(config): | ||
| """ | ||
| Helper to standardize the config's rope params field by ensuring the params are defined for each | ||
| later type. For old model the fn will duplicate a single rope param in each layer type (backward compatibility) | ||
| """ | ||
| rope_parameters = getattr(config, "rope_parameters", None) | ||
| layer_types = getattr(config, "layer_types", None) | ||
| if rope_theta is None: | ||
| rope_theta = getattr(config, "rope_theta", None) | ||
| rope_parameters = getattr(config, "rope_parameters", {}) | ||
|
|
There was a problem hiding this comment.
could have been simplified if we make a few assumption, and we can make assumptions because only 2 models have a nested rope parameterization
src/transformers/models/efficientloftr/configuration_efficientloftr.py
Outdated
Show resolved
Hide resolved
ArthurZucker
left a comment
There was a problem hiding this comment.
In general if we can put stuff in PreTrainedConfig I am also happy, but fine this way as well
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…loftr.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
RotaryEmbeddingConfigMixin is my Christmas gift! ty its a lot better
|
run-slow: llama, gemma3, qwen2, qwen2_vl, mistral, mixtral, modernbert, llava |
rotary_partial_emb to RopeParams and delete unnecessary code 🔪 rotary_partial_emb to RopeParams and delete unnecessary code 🔪
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere, cohere2, csm, cwm, dbrx, deepseek_v2, deepseek_v3, dia, diffllama |
|
This comment contains models: ["models/gemma3", "models/llama", "models/llava", "models/mistral", "models/mixtral", "models/modernbert", "models/qwen2", "models/qwen2_vl"] |
|
Doc-builder and weight tying tests will fail but are not related |
CI Results✅ No failing test specific to this PR 🎉 ! |
… 🔪 (huggingface#42255) * tmp * batch push * maybe better pop and break, and we'll have one theta per config in the rope dict * update a few models? * fix tests that are easu first * dont overwrite if already present!!! * partial rotary factor * more fixes to the god of fixes * setdefault * fix copies * Update src/transformers/modeling_rope_utils.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Update src/transformers/models/efficientloftr/configuration_efficientloftr.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * attempt one * update all models * fix tests * fix tests * oops * fix slow tests with nested rope models * fix copies * deal with circular import and move the mixin to base config class * fix copies * fix a few tests * update the migration guide --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
… 🔪 (huggingface#42255) * tmp * batch push * maybe better pop and break, and we'll have one theta per config in the rope dict * update a few models? * fix tests that are easu first * dont overwrite if already present!!! * partial rotary factor * more fixes to the god of fixes * setdefault * fix copies * Update src/transformers/modeling_rope_utils.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Update src/transformers/models/efficientloftr/configuration_efficientloftr.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * attempt one * update all models * fix tests * fix tests * oops * fix slow tests with nested rope models * fix copies * deal with circular import and move the mixin to base config class * fix copies * fix a few tests * update the migration guide --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
What does this PR do?
To finalize the work on rope config, I am moving
rotary_partial_embto rope parameter dict as well. Along with it, I did some clean-up on standardization because we can make a few assumptions with the models we haveNote, PR is breaking BC completely and users will no longer have access to
config.rope_thetasince I pop it from config kwargs manually. That way is more clear imo than having two rope thetas in different places