Fix dpsk-r1-fp4 tp8 by reverting two commits (#13162 and #13341)#13348
Fix dpsk-r1-fp4 tp8 by reverting two commits (#13162 and #13341)#13348Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Conversation
…-project#13162) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
…project#13341)" This reverts commit 0d116b9.
Summary of ChangesHello @Qiaolin-Yu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on resolving a bug related to Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request appears to be a revert to fix issues with DeepSeek models, specifically related to expert location mapping and MoE kernel parameters.
The changes to expert_location.py aim to correct a bug in how the nearest physical expert is handled. However, the new logic introduces a different issue where the condition to update the expert list is never met, making the optimization ineffective. I've provided a suggestion to fix this.
The other changes consistently add the tile_tokens_dim parameter to MoE kernel calls across several files. This seems to be a necessary fix to align with an underlying API change, likely from the reverted commits. These changes look correct.
Overall, the PR is a step in the right direction, but the expert location logic needs a further fix to work as intended.
| mapped_physical_experts = logical_to_all_physical_map[layer_id][ | ||
| logical_expert_id | ||
| ] | ||
| if ( | ||
| nearest_expert != -1 | ||
| and nearest_expert not in mapped_physical_experts | ||
| ): | ||
| mapped_physical_experts[0] = nearest_expert |
There was a problem hiding this comment.
The condition nearest_expert not in mapped_physical_experts will always be False when nearest_expert != -1. This is because _find_nearest_expert is called with mapped_physical_experts as candidate_physical_expert_ids, and it returns an element from that list (or -1). As a result, this if block is effectively dead code, and the nearest expert is never prioritized.
This defeats the purpose of finding the nearest expert for locality optimization.
To correctly prioritize the nearest expert, it should be moved to the front of the mapped_physical_experts list. A swap with the first element is a safe way to do this while preserving the list length and avoiding duplicates.
| mapped_physical_experts = logical_to_all_physical_map[layer_id][ | |
| logical_expert_id | |
| ] | |
| if ( | |
| nearest_expert != -1 | |
| and nearest_expert not in mapped_physical_experts | |
| ): | |
| mapped_physical_experts[0] = nearest_expert | |
| mapped_physical_experts = logical_to_all_physical_map[layer_id][ | |
| logical_expert_id | |
| ] | |
| if nearest_expert != -1: | |
| # The `_find_nearest_expert` function returns an expert from `mapped_physical_experts`. | |
| # We should move it to the front to prioritize it for dispatch. | |
| try: | |
| current_idx = mapped_physical_experts.index(nearest_expert) | |
| if current_idx > 0: | |
| # Swap with the first element to preserve list length and avoid duplicates. | |
| mapped_physical_experts[0], mapped_physical_experts[current_idx] = ( | |
| mapped_physical_experts[current_idx], | |
| mapped_physical_experts[0], | |
| ) | |
| except ValueError: | |
| # This path should ideally not be taken. | |
| # It implies `_find_nearest_expert` returned an expert not in the candidate list. | |
| # As a fallback, we replace the first expert. | |
| mapped_physical_experts[0] = nearest_expert |
|
Tested B200 CI on a different machine works, will merge this PR for now |
… and sgl-project#13341) (sgl-project#13348)" This reverts commit 78a4b44.
Motivation
Before this pr, this command will hang/fail.
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist