Enable vLLM PP and EP for DSV3

Currently we use TP64 for DSV3 inference which is not optimal, needs to enable PP and EP.

1. PP support: https://github.com/NVIDIA-NeMo/RL/pull/898
2. EP support:
    1. add vLLM enable-expert-parallel: https://github.com/NVIDIA-NeMo/RL/pull/997
    2. move DP into vllm: https://github.com/NVIDIA-NeMo/RL/pull/1081
    3. add ~~pplx (for single-node),~~ DeepEP (for multi-node): https://github.com/NVIDIA-NeMo/RL/pull/1045

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable vLLM PP and EP for DSV3 #908

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable vLLM PP and EP for DSV3 #908

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions