-
Notifications
You must be signed in to change notification settings - Fork 205
add qwen3 to megatron conversion #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Abdalgader Abubaker <[email protected]>
|
Thank you for the contribution! It looks like the NeMo submodule change touched a lot of unrelated files. Would it be possible to revert those unrelated changes and only keep the relevant ones? |
|
Thanks @ashors1 for reviewing --- my apologies seems I mistakenly merged other changes. Now, I raised a new PR (#14378) in the NeMo submodule with only the relevant changes. Also, closed the wrong PR. |
|
|
||
| exporter_cls = HFQwen2Exporter | ||
|
|
||
| elif hf_config.model_type == "qwen3": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| elif hf_config.model_type == "qwen3": | |
| elif hf_config.model_type in ("qwen3", "qwen3_moe"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, actually I just tested this and it looks like this doesn't work with MoE models at the moment. Could we extend this to support MoE? If too much work, we could always add MoE support in a follow up PR
|
@abdalgader-a let me know if you'd like to extend this PR to support Qwen3 MoE. I'm happy to help out if not |
|
@ashors1 -- let me try to extend it. In case it takes long and much time needed we can raise another PR. I'll get back on this asap. |
|
@abdalgader-a I went ahead and started this process becuase we received some internal requests for the qwen3 moe exporter: https://github.com/NVIDIA-NeMo/RL/tree/ashors/qwen3-moe-export. I can raise the new PR and I'll be sure to add you as co-author. |
|
now worries @ashors1! I can help out in the new PR too. I'll let you to choose either merge this PR or have them all in the new one. |
|
@abdalgader-a I went ahead and opened a new PR which adds MoE export support on top of your commits: #873. Please take a look. If this looks good to you, let's close this PR and focus on #873. |
IMPORTANT:
The NeMo submodule should be updated as in this PR: NVIDIA-NeMo/NeMo#14378
What does this PR do ?
This PR allow to export qwen3 model type from megatron format to HF
Test
Tested locally via run the megatron conversion script on Qwen3-1.7B.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information