Skip to content

Enable vLLM PP and EP for DSV3 #908

@yuki-97

Description

@yuki-97

Currently we use TP64 for DSV3 inference which is not optimal, needs to enable PP and EP.

  1. PP support: fix: fix async vllm nccl fail on dsv3 tp16pp2 and non-colocated on single node #898
  2. EP support:
    1. add vLLM enable-expert-parallel: feat: add vllm enable_expert_parallel #997
    2. move DP into vllm: feat: support DP inside vLLM for EP #1081
    3. add pplx (for single-node), DeepEP (for multi-node): chore: add DeepEP dependencies #1045

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions