Conversation
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
📝 WalkthroughWalkthroughThe pull request simplifies YARN rope scaling parameter handling by narrowing the rope_scaling field mappings to a single factor mapping in the conversion module and removing related configuration fields from provider classes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/megatron/bridge/models/conversion/model_bridge.py (1)
498-508:⚠️ Potential issue | 🟠 MajorRound-trip conversion breaks for YARN rope scaling: mismatched field names between mapping and provider definitions.
YARN_ROPE_SCALING_MAPPING(line 290) maps"factor"→"rotary_scaling_factor"(withoutyarn_prefix), but provider classes likegpt_oss_provider.pydefine the field asyarn_rotary_scaling_factor(with prefix). This creates two failure modes:
Native YARN provider → HF conversion: Line 499 correctly guards by checking
yarn_rotary_scaling_factor, but lines 505–506 then attempt to readrotary_scaling_factorwhich doesn't exist on YARN providers, sorope_scaling["factor"]remains unset.HF → provider → HF round-trip: Forward conversion (line 386) sets
rotary_scaling_factorviasetattr, but reverse conversion at line 499 checks foryarn_rotary_scaling_factor, so the entire rope_scaling block is skipped.The fix is to update
YARN_ROPE_SCALING_MAPPINGto use the correct provider field name.Proposed fix
YARN_ROPE_SCALING_MAPPING = [ - ("factor", "rotary_scaling_factor"), + ("factor", "yarn_rotary_scaling_factor"), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/model_bridge.py` around lines 498 - 508, YARN rope-scaling round-trip fails because YARN_ROPE_SCALING_MAPPING maps HF keys to provider attribute names without the required "yarn_" prefix; update the mapping so the provider-side names match the actual provider attributes (e.g., use "yarn_rotary_scaling_factor" instead of "rotary_scaling_factor") so the detection check in model_bridge.py (the yarn_rotary_scaling_factor guard), the loop that reads getattr(provider, megatron_key, None), and the forward conversion that uses setattr all reference the same provider attribute names and the hf_config["rope_scaling"] keys are populated consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/megatron/bridge/models/conversion/model_bridge.py`:
- Around line 498-508: YARN rope-scaling round-trip fails because
YARN_ROPE_SCALING_MAPPING maps HF keys to provider attribute names without the
required "yarn_" prefix; update the mapping so the provider-side names match the
actual provider attributes (e.g., use "yarn_rotary_scaling_factor" instead of
"rotary_scaling_factor") so the detection check in model_bridge.py (the
yarn_rotary_scaling_factor guard), the loop that reads getattr(provider,
megatron_key, None), and the forward conversion that uses setattr all reference
the same provider attribute names and the hf_config["rope_scaling"] keys are
populated consistently.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit