-
Notifications
You must be signed in to change notification settings - Fork 294
[dependencies] Upgrade transformers to >=5.0.0,<=5.3.0 #1426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5c18a9a
a6e1d12
204988c
087e1be
e7c3008
af9b1de
c91c976
e9a5317
1489733
996ee9b
a0f4c99
06c18f9
fb6c861
6e29207
18e6731
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -288,6 +288,14 @@ def init_configs( | |
| if hasattr(provider, "q_lora_rank") and hasattr(hf_config, "q_lora_rank"): | ||
| provider.q_lora_rank = hf_config.q_lora_rank | ||
|
|
||
| # Workaround for transformers v5 moving rope_theta into rope_parameters | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious why this is needed, since megatron-bridge already updated NVIDIA-NeMo/Megatron-Bridge#2068 -- if this is still needed, should we raise an issue against megatron-bridge so we can remove this workaround going forward?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| # (previously it was a top-level config attribute). megatron-bridge's | ||
| # CONFIG_MAPPING reads config.rope_theta which no longer exists in v5, | ||
| # causing it to fall back to the default rotary_base of 10000. | ||
| rope_params = getattr(hf_config, "rope_parameters", None) or getattr(hf_config, "rope_scaling", None) | ||
| if isinstance(rope_params, dict) and "rope_theta" in rope_params: | ||
| provider.rotary_base = rope_params["rope_theta"] | ||
|
|
||
| provider.tensor_model_parallel_size = megatron_config.tensor_model_parallel_size | ||
| provider.pipeline_model_parallel_size = megatron_config.pipeline_model_parallel_size | ||
| provider.pipeline_dtype = torch.bfloat16 if bf16 else torch.float32 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious do you know why upgrading transformers necessitates this change? Seems a little surprising :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the claude writeup of why this was needed:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing :)