feat: enable EPLB for NVFP4 compressed-tensors ML3 checkpoint#35187
feat: enable EPLB for NVFP4 compressed-tensors ML3 checkpoint#35187hypdeb wants to merge 0 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables EPLB for NVFP4 compressed tensors and fixes an issue where Mistral Large 3 was not recognized as a Mixture-of-Experts model.
The changes in vllm/model_executor/layers/fused_moe/router/base_router.py to relax EPLB state validation during initialization and in vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py to enable EPLB support for NVFP4 are correct and well-implemented.
In vllm/model_executor/models/pixtral.py, the changes correctly proxy the MoE interface from the underlying language model. However, the current implementation of copying attributes is fragile and could lead to stale state. I've suggested a more robust approach using @property decorators to ensure the wrapper always reflects the true state of the language model.
098d2e7 to
ecca0b6
Compare
| def _apply_eplb_mapping(self, topk_ids: torch.Tensor) -> torch.Tensor: | ||
| """Apply EPLB mapping to convert logical expert IDs to physical expert IDs.""" | ||
| if self.enable_eplb: | ||
| if self.enable_eplb and self._is_eplb_state_ready(): |
There was a problem hiding this comment.
Don't we check the same in _validate_eplb_state? Do the asserts make sense if we do the same readiness check? Maybe we need to return readiness flag in validate and actuall skip EPLB mapping until eplb state is ready
There was a problem hiding this comment.
I've adjusted the modelling of EPLB state in the router to be more explicit, which removes the needs for these asserts. There are now three possible states: None, uninitialized, initialized.
There was a problem hiding this comment.
Scope of changes is larger. I will test and mark the PR as ready when it's done.
979778a to
33a4089
Compare
|
While testing, discovered that the PR is effectively blocked by: #32564. Without it, I would have to add some hacks. |
|
This pull request has merge conflicts that must be resolved before it can be |
58c9344 to
0f64eea
Compare
7150b15 to
cc303ce
Compare
0ed8338 to
48e376a
Compare
Purpose
Allow enabling
eplbfor the NVFP4 MoE compressed-tensors path.Also, fix Mistral Large 3 not being recognized as an MoE model.
In the process, the following additional changes were made:
Test Plan
Testing end-to-end with EPLB enabled.
Test Result
TODO