Skip to content

Revert "support mtp with deepseek r1 nvfp4 model (#13115)"#14790

Merged
HaiShaw merged 1 commit intosgl-project:amd_morifrom
Duyi-Wang:amd_mori_revert
Dec 10, 2025
Merged

Revert "support mtp with deepseek r1 nvfp4 model (#13115)"#14790
HaiShaw merged 1 commit intosgl-project:amd_morifrom
Duyi-Wang:amd_mori_revert

Conversation

@Duyi-Wang
Copy link
Contributor

This reverts commit ea17737.

Motivation

This optimization conflicts with the current MORI integration code, so we are rolling it back. A fix for the MORI integration will be included in a future commit to main.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@github-actions github-actions bot added documentation Improvements or additions to documentation deepseek blackwell SM100/SM120 labels Dec 10, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Duyi-Wang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request performs a full revert of a previous commit (ea17737) that introduced an optimization for Multi-Tenant Paging (MTP) with the DeepSeek R1 NVFP4 model. The rollback is necessary because the optimization was found to conflict with the existing MORI integration code. The revert primarily removes specific speculative Mixture-of-Experts (MoE) All-to-All (A2A) backend configurations and related batch processing adjustments. The team plans to reintroduce this functionality in a future commit once the conflict with MORI integration is resolved.

Highlights

  • Revert of Speculative MoE A2A Backend: The primary change is the complete removal of the speculative_moe_a2a_backend argument, its associated configuration, global variables, utility functions, and context management from the codebase. This functionality was introduced in the reverted commit.
  • Simplified Batch Processing Logic: Several files saw simplifications in how batch metadata and forward modes are handled. This includes removing checks and re-initializations related to batch_size in attention backend logic and streamlining conditions for prepare_mlp_sync_batch.
  • Documentation and Argument Cleanup: The documentation for server arguments (docs/advanced_features/server_arguments.md) was updated to reflect the removal of --speculative-moe-a2a-backend. Additionally, certain MoE A2A backend choices (mooncake, ascend_fuseep) were removed from the available options for --moe-a2a-backend.
  • Refactored MoE Backend Checks: Utility functions that previously relied on get_moe_a2a_backend() were updated to directly check server_args.moe_a2a_backend string values, simplifying the logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous commit related to MTP support for DeepSeek models. While most of the changes are removals of code, I've identified a few issues that need attention.

Most critically, a change in python/sglang/srt/models/deepseek_v2.py removes a None check, which could lead to a TypeError at runtime.

Additionally, there are a couple of inconsistencies:

  • The command-line argument choices for --moe-a2a-backend in python/sglang/srt/server_args.py do not match the updated documentation.
  • There's a minor formatting issue in the documentation table in docs/advanced_features/server_arguments.md.

Please review the detailed comments for suggestions on how to address these points.

has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any(
forward_batch.extend_prefix_lens_cpu
)
has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change introduces a potential TypeError. The extend_prefix_lens_cpu attribute on forward_batch can be None, and calling any(None) will raise an exception. The previous implementation correctly handled this by checking for None before calling any(). Please restore the None check to prevent this runtime error.

Suggested change
has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu)
has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any(forward_batch.extend_prefix_lens_cpu)

| `--speculative-attention-mode` | Attention backend for speculative decoding operations (both target verify and draft extend). Can be one of 'prefill' (default) or 'decode'. | `prefill` | `prefill`, `decode` |
| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | `None` | |
| `--speculative-moe-a2a-backend` | MOE A2A backend for EAGLE speculative decoding, see --moe-a2a-backend for options. Same as moe a2a backend if unset. | `None` | |
| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | None |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The formatting for this line seems inconsistent with the rest of the table. The Defaults value should be wrapped in backticks (e.g., `None`), and an empty Options column should be present to maintain table alignment.

Suggested change
| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | None |
| `--speculative-moe-runner-backend` | MOE backend for EAGLE speculative decoding, see --moe-runner-backend for options. Same as moe runner backend if unset. | `None` | |

"--moe-a2a-backend",
type=str,
choices=MOE_A2A_BACKEND_CHOICES,
choices=["none", "deepep", "mooncake", "ascend_fuseep"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency between the argument choices defined here and the documentation. The documentation (docs/advanced_features/server_arguments.md) was updated to only list none and deepep as options for --moe-a2a-backend, but this list still includes mooncake and ascend_fuseep. To ensure consistency, please update the choices here to match the documentation.

Suggested change
choices=["none", "deepep", "mooncake", "ascend_fuseep"],
choices=["none", "deepep"],

Fridge003

This comment was marked as outdated.

@Duyi-Wang
Copy link
Contributor Author

The changes in this PR is used in other places, so cannot be directly reverted. Is it possible to create a fixing PR?

Yes, that’s how it should be. For now, this's just a temporary workaround. The changes are being pushed to the amd_mori branch for a quick run. When we merge back to main, we’ll include a proper fix.

@Fridge003
Copy link
Collaborator

Oh I just thought you were pushing to the main branch. If you are pushing to amd branch then it looks good to me

@HaiShaw HaiShaw merged commit 78fff1b into sgl-project:amd_mori Dec 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 deepseek documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants