-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix(qwenmoe): Fix SP issue in Qwen Moe #2741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2741 +/- ##
==========================================
Coverage ? 29.08%
==========================================
Files ? 334
Lines ? 55916
Branches ? 0
==========================================
Hits ? 16261
Misses ? 39655
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| shared_expert_output = self.shared_expert(hidden_states) | ||
| shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output | ||
| shared_expert_output = self.shared_expert(residuals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
share expert是非splayer,输入不做gather,相当于每一张卡都只跑了部分数据,TPlayer的输入应该是完整的序列。share expert的输入也应该是gather的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Bug fixes
PR changes
Description
修复Qwen2/3 Mode系列模型开启SP并行Hang住问题