-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
[Model Refactoring] Migrate DeepSeek V4 to vllm/models/ [1/N] #43004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
87ae4ca
7b685c3
8264ba7
75ca963
b5903fd
22a286a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| """DeepSeek V4 model — hardware-isolated entry point. | ||
|
|
||
| The actual implementation lives under ``nvidia/`` and ``amd/``; this module | ||
| picks the right one for the current platform and re-exports the public | ||
| classes used by the model registry and quantization config lookup. | ||
| """ | ||
|
|
||
| from typing import TYPE_CHECKING | ||
|
|
||
| from vllm.platforms import current_platform | ||
|
|
||
| from .quant_config import DeepseekV4FP8Config | ||
|
|
||
| # Pick the per-platform implementation. The NVIDIA branch is the static | ||
| # default that mypy sees; the ROCm branch overrides it at runtime and is | ||
| # kept type-compatible via ``# type: ignore[assignment]``. | ||
| if TYPE_CHECKING or not current_platform.is_rocm(): | ||
| from .nvidia.deepseek_v4 import DeepseekV4ForCausalLM | ||
| from .nvidia.deepseek_v4_mtp import DeepSeekV4MTP | ||
| else: | ||
| from .amd.deepseek_v4 import DeepseekV4ForCausalLM # type: ignore[assignment] | ||
| from .amd.deepseek_v4_mtp import DeepSeekV4MTP # type: ignore[assignment] | ||
|
|
||
| __all__ = [ | ||
| "DeepSeekV4MTP", | ||
| "DeepseekV4FP8Config", | ||
| "DeepseekV4ForCausalLM", | ||
| ] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../nvidia/deepseek_v4.py | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We would want to have amd specific version of fused moe and attention class right?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes for attention, but not sure about MoE. For this PR, I didn't include the changes to keep it small. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../nvidia/deepseek_v4_mtp.py |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extract this away once we have more and more models written in this way right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will do it later.