Revert "support mtp with deepseek r1 nvfp4 model (#13115)"#14790
Revert "support mtp with deepseek r1 nvfp4 model (#13115)"#14790HaiShaw merged 1 commit intosgl-project:amd_morifrom
Conversation
This reverts commit ea17737.
Summary of ChangesHello @Duyi-Wang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request performs a full revert of a previous commit (ea17737) that introduced an optimization for Multi-Tenant Paging (MTP) with the DeepSeek R1 NVFP4 model. The rollback is necessary because the optimization was found to conflict with the existing MORI integration code. The revert primarily removes specific speculative Mixture-of-Experts (MoE) All-to-All (A2A) backend configurations and related batch processing adjustments. The team plans to reintroduce this functionality in a future commit once the conflict with MORI integration is resolved. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request reverts a previous commit related to MTP support for DeepSeek models. While most of the changes are removals of code, I've identified a few issues that need attention.
Most critically, a change in python/sglang/srt/models/deepseek_v2.py removes a None check, which could lead to a TypeError at runtime.
Additionally, there are a couple of inconsistencies:
- The command-line argument choices for
--moe-a2a-backendinpython/sglang/srt/server_args.pydo not match the updated documentation. - There's a minor formatting issue in the documentation table in
docs/advanced_features/server_arguments.md.
Please review the detailed comments for suggestions on how to address these points.
| has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any( | ||
| forward_batch.extend_prefix_lens_cpu | ||
| ) | ||
| has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu) |
There was a problem hiding this comment.
This change introduces a potential TypeError. The extend_prefix_lens_cpu attribute on forward_batch can be None, and calling any(None) will raise an exception. The previous implementation correctly handled this by checking for None before calling any(). Please restore the None check to prevent this runtime error.
| has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu) | |
| has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any(forward_batch.extend_prefix_lens_cpu) |
| | `--speculative-attention-mode` | Attention backend for speculative decoding operations (both target verify and draft extend). Can be one of 'prefill' (default) or 'decode'. | `prefill` | `prefill`, `decode` | | ||
| | `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | `None` | | | ||
| | `--speculative-moe-a2a-backend` | MOE A2A backend for EAGLE speculative decoding, see --moe-a2a-backend for options. Same as moe a2a backend if unset. | `None` | | | ||
| | `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | None | |
There was a problem hiding this comment.
The formatting for this line seems inconsistent with the rest of the table. The Defaults value should be wrapped in backticks (e.g., `None`), and an empty Options column should be present to maintain table alignment.
| | `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | None | | |
| | `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | `None` | | |
| "--moe-a2a-backend", | ||
| type=str, | ||
| choices=MOE_A2A_BACKEND_CHOICES, | ||
| choices=["none", "deepep", "mooncake", "ascend_fuseep"], |
There was a problem hiding this comment.
There's an inconsistency between the argument choices defined here and the documentation. The documentation (docs/advanced_features/server_arguments.md) was updated to only list none and deepep as options for --moe-a2a-backend, but this list still includes mooncake and ascend_fuseep. To ensure consistency, please update the choices here to match the documentation.
| choices=["none", "deepep", "mooncake", "ascend_fuseep"], | |
| choices=["none", "deepep"], |
Yes, that’s how it should be. For now, this's just a temporary workaround. The changes are being pushed to the amd_mori branch for a quick run. When we merge back to main, we’ll include a proper fix. |
|
Oh I just thought you were pushing to the main branch. If you are pushing to amd branch then it looks good to me |
This reverts commit ea17737.
Motivation
This optimization conflicts with the current MORI integration code, so we are rolling it back. A fix for the MORI integration will be included in a future commit to main.
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist