[NPU][Doc] Update GLM-5 docs, enabling deepep by default#23708
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the Ascend NPU GLM5 documentation by removing specific environment variables and increasing the maximum batch size for CUDA graphs. A review comment correctly identifies that the 'deepep' backend is incompatible with Ascend NPU hardware and suggests using 'ascend_fuseep' instead, while also recommending the removal of the 'deepep-mode' flag.
| --moe-a2a-backend deepep \ | ||
| --deepep-mode auto \ |
There was a problem hiding this comment.
For Ascend NPU, the optimized MoE All-to-All backend is ascend_fuseep. The deepep backend is specifically designed for NVIDIA GPUs using the deep_ep library and will not work on NPU. Additionally, --deepep-mode is not used by the ascend_fuseep backend and should be removed.
| --moe-a2a-backend deepep \ | |
| --deepep-mode auto \ | |
| --moe-a2a-backend ascend_fuseep \ |
|
/tag-and-rerun-ci |
|
Hi @cen121212 , we've moved our documentations under |
Motivation
When DeepEP is not enabled, there can be accuracy issues, so DeepEP is enabled by default.
Modifications
docs/platforms/ascend/ascend_npu_glm5_examples.md
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci