-
Notifications
You must be signed in to change notification settings - Fork 5k
Revert "support mtp with deepseek r1 nvfp4 model (#13115)" #14790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2520,9 +2520,7 @@ def forward_normal_chunked_kv_prepare( | |||||
| ) | ||||||
|
|
||||||
| def forward_normal_chunked_kv_core(self, q, k, v, forward_batch): | ||||||
| has_extend_prefix = forward_batch.extend_prefix_lens_cpu is not None and any( | ||||||
| forward_batch.extend_prefix_lens_cpu | ||||||
| ) | ||||||
| has_extend_prefix = any(forward_batch.extend_prefix_lens_cpu) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change introduces a potential
Suggested change
|
||||||
| # Only initialize the info once | ||||||
| if has_extend_prefix and forward_batch.num_prefix_chunks is None: | ||||||
| forward_batch.prepare_chunked_prefix_cache_info(q.device) | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -168,8 +168,6 @@ | |||||
| "cutlass", | ||||||
| ] | ||||||
|
|
||||||
| MOE_A2A_BACKEND_CHOICES = ["none", "deepep", "mooncake", "ascend_fuseep"] | ||||||
|
|
||||||
| MAMBA_SSM_DTYPE_CHOICES = ["float32", "bfloat16"] | ||||||
|
|
||||||
|
|
||||||
|
|
@@ -397,7 +395,6 @@ class ServerArgs: | |||||
| speculative_token_map: Optional[str] = None | ||||||
| speculative_attention_mode: str = "prefill" | ||||||
| speculative_moe_runner_backend: Optional[str] = None | ||||||
| speculative_moe_a2a_backend: Optional[str] = None | ||||||
|
|
||||||
| # Speculative decoding (ngram) | ||||||
| speculative_ngram_min_match_window_size: int = 1 | ||||||
|
|
@@ -3039,13 +3036,6 @@ def add_cli_args(parser: argparse.ArgumentParser): | |||||
| default=ServerArgs.speculative_moe_runner_backend, | ||||||
| help="Choose the runner backend for MoE in speculative decoding.", | ||||||
| ) | ||||||
| parser.add_argument( | ||||||
| "--speculative-moe-a2a-backend", | ||||||
| type=str, | ||||||
| choices=MOE_A2A_BACKEND_CHOICES, | ||||||
| default=ServerArgs.speculative_moe_a2a_backend, | ||||||
| help="Choose the backend for MoE A2A in speculative decoding", | ||||||
| ) | ||||||
|
|
||||||
| # Speculative decoding (ngram) | ||||||
| parser.add_argument( | ||||||
|
|
@@ -3104,7 +3094,7 @@ def add_cli_args(parser: argparse.ArgumentParser): | |||||
| parser.add_argument( | ||||||
| "--moe-a2a-backend", | ||||||
| type=str, | ||||||
| choices=MOE_A2A_BACKEND_CHOICES, | ||||||
| choices=["none", "deepep", "mooncake", "ascend_fuseep"], | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's an inconsistency between the argument choices defined here and the documentation. The documentation (
Suggested change
|
||||||
| default=ServerArgs.moe_a2a_backend, | ||||||
| help="Choose the backend for MoE A2A.", | ||||||
| ) | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting for this line seems inconsistent with the rest of the table. The
Defaultsvalue should be wrapped in backticks (e.g.,`None`), and an emptyOptionscolumn should be present to maintain table alignment.