Revert "support mtp with deepseek r1 nvfp4 model (#13115)" by Duyi-Wang · Pull Request #14790 · sgl-project/sglang

Duyi-Wang · 2025-12-10T05:37:41Z

This reverts commit ea17737.

Motivation

This optimization conflicts with the current MORI integration code, so we are rolling it back. A fix for the MORI integration will be included in a future commit to main.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

This reverts commit ea17737.

gemini-code-assist · 2025-12-10T05:38:08Z

Summary of Changes

Hello @Duyi-Wang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request performs a full revert of a previous commit (ea17737) that introduced an optimization for Multi-Tenant Paging (MTP) with the DeepSeek R1 NVFP4 model. The rollback is necessary because the optimization was found to conflict with the existing MORI integration code. The revert primarily removes specific speculative Mixture-of-Experts (MoE) All-to-All (A2A) backend configurations and related batch processing adjustments. The team plans to reintroduce this functionality in a future commit once the conflict with MORI integration is resolved.

Highlights

Revert of Speculative MoE A2A Backend: The primary change is the complete removal of the speculative_moe_a2a_backend argument, its associated configuration, global variables, utility functions, and context management from the codebase. This functionality was introduced in the reverted commit.
Simplified Batch Processing Logic: Several files saw simplifications in how batch metadata and forward modes are handled. This includes removing checks and re-initializations related to batch_size in attention backend logic and streamlining conditions for prepare_mlp_sync_batch.
Documentation and Argument Cleanup: The documentation for server arguments (docs/advanced_features/server_arguments.md) was updated to reflect the removal of --speculative-moe-a2a-backend. Additionally, certain MoE A2A backend choices (mooncake, ascend_fuseep) were removed from the available options for --moe-a2a-backend.
Refactored MoE Backend Checks: Utility functions that previously relied on get_moe_a2a_backend() were updated to directly check server_args.moe_a2a_backend string values, simplifying the logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request reverts a previous commit related to MTP support for DeepSeek models. While most of the changes are removals of code, I've identified a few issues that need attention.

Most critically, a change in python/sglang/srt/models/deepseek_v2.py removes a None check, which could lead to a TypeError at runtime.

Additionally, there are a couple of inconsistencies:

The command-line argument choices for --moe-a2a-backend in python/sglang/srt/server_args.py do not match the updated documentation.
There's a minor formatting issue in the documentation table in docs/advanced_features/server_arguments.md.

Please review the detailed comments for suggestions on how to address these points.

gemini-code-assist · 2025-12-10T05:40:51Z

python/sglang/srt/models/deepseek_v2.py

-        has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any(
-            forward_batch.extend_prefix_lens_cpu
-        )
+        has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu)


This change introduces a potential TypeError. The extend_prefix_lens_cpu attribute on forward_batch can be None, and calling any(None) will raise an exception. The previous implementation correctly handled this by checking for None before calling any(). Please restore the None check to prevent this runtime error.

Suggested change

has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu)

has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any(forward_batch.extend_prefix_lens_cpu)

gemini-code-assist · 2025-12-10T05:40:51Z

docs/advanced_features/server_arguments.md

 | `--speculative-attention-mode` | Attention backend for speculative decoding operations (both target verify and draft extend). Can be one of 'prefill' (default) or 'decode'. | `prefill` | `prefill`, `decode` |
-| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | `None` | |
-| `--speculative-moe-a2a-backend` | MOE A2A backend for EAGLE speculative decoding, see --moe-a2a-backend for options. Same as moe a2a backend if unset. | `None` | |
+| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | None |


The formatting for this line seems inconsistent with the rest of the table. The Defaults value should be wrapped in backticks (e.g., `None`), and an empty Options column should be present to maintain table alignment.

Suggested change

| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | None |

| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | `None` | |

gemini-code-assist · 2025-12-10T05:40:51Z

python/sglang/srt/server_args.py

            "--moe-a2a-backend",
            type=str,
-            choices=MOE_A2A_BACKEND_CHOICES,
+            choices=["none", "deepep", "mooncake", "ascend_fuseep"],


There's an inconsistency between the argument choices defined here and the documentation. The documentation (docs/advanced_features/server_arguments.md) was updated to only list none and deepep as options for --moe-a2a-backend, but this list still includes mooncake and ascend_fuseep. To ensure consistency, please update the choices here to match the documentation.

Suggested change

choices=["none", "deepep", "mooncake", "ascend_fuseep"],

choices=["none", "deepep"],

Duyi-Wang · 2025-12-10T06:08:33Z

The changes in this PR is used in other places, so cannot be directly reverted. Is it possible to create a fixing PR?

Yes, that’s how it should be. For now, this's just a temporary workaround. The changes are being pushed to the amd_mori branch for a quick run. When we merge back to main, we’ll include a proper fix.

Fridge003 · 2025-12-10T06:19:26Z

Oh I just thought you were pushing to the main branch. If you are pushing to amd branch then it looks good to me

Revert "support mtp with deepseek r1 nvfp4 model (sgl-project#13115)"

b7ae2e9

This reverts commit ea17737.

Duyi-Wang requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, hnyls2002, ispobock and merrymercy as code owners December 10, 2025 05:37

github-actions bot added documentation Improvements or additions to documentation deepseek blackwell SM100/SM120 labels Dec 10, 2025

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

HaiShaw merged commit 78fff1b into sgl-project:amd_mori Dec 10, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "support mtp with deepseek r1 nvfp4 model (#13115)"#14790

Revert "support mtp with deepseek r1 nvfp4 model (#13115)"#14790
HaiShaw merged 1 commit intosgl-project:amd_morifrom
Duyi-Wang:amd_mori_revert

Duyi-Wang commented Dec 10, 2025

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 10, 2025

Uh oh!

gemini-code-assist bot Dec 10, 2025

Uh oh!

gemini-code-assist bot Dec 10, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Duyi-Wang commented Dec 10, 2025

Uh oh!

Fridge003 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu)
	has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any(forward_batch.extend_prefix_lens_cpu)

	\| `--speculative-moe-runner-backend` \| MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. \| None \|
	\| `--speculative-moe-runner-backend` \| MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. \| `None` \| \|

	choices=["none", "deepep", "mooncake", "ascend_fuseep"],
	choices=["none", "deepep"],

Conversation

Duyi-Wang commented Dec 10, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Duyi-Wang commented Dec 10, 2025

Uh oh!

Fridge003 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants