Skip to content

[Performance] Add --enable-ep-weight-filter CLI option#37351

Merged
esmeetu merged 1 commit intomainfrom
opt-ep-weights-filter
Mar 18, 2026
Merged

[Performance] Add --enable-ep-weight-filter CLI option#37351
esmeetu merged 1 commit intomainfrom
opt-ep-weights-filter

Conversation

@esmeetu
Copy link
Member

@esmeetu esmeetu commented Mar 17, 2026

Summary

Usage

vllm serve model \
  --enable-expert-parallel \
  --enable-ep-weight-filter

Without --enable-ep-weight-filter, loading behavior is identical to main.

Test plan

  • vllm serve without --enable-ep-weight-filter — no behavior change
  • vllm serve --enable-expert-parallel --enable-ep-weight-filter on per-expert MoE — correct loading, reduced I/O
  • Non-MoE model with flag — no effect
  • 3D fused-expert model with flag — no effect (filter returns None)

🤖 Generated with Claude Code

Add opt-in flag to skip non-local expert weights during model loading
when expert parallelism is active. Each rank only reads its own expert
shard from disk, reducing storage I/O for MoE models with per-expert
weight tensors.

Signed-off-by: esmeetu <esmeetu@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an opt-in command-line flag --enable-ep-weight-filter to optimize model loading for Mixture-of-Experts models with expert parallelism. The changes correctly add the new configuration option and integrate it into the model loading logic. My main feedback is to add a validation check to ensure enable_expert_parallel is active when enable_ep_weight_filter is used, to prevent silent failures from misconfiguration and improve user experience.

"""Whether the deployed model is MoE (if known)."""
enable_expert_parallel: bool = False
"""Use expert parallelism instead of tensor parallelism for MoE layers."""
enable_ep_weight_filter: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To improve robustness and prevent user confusion from misconfiguration, it's a good practice to validate that enable_expert_parallel is enabled when enable_ep_weight_filter is used. Currently, if a user enables enable_ep_weight_filter without enable_expert_parallel, it will fail silently.

Consider adding a validation check in the _validate_parallel_config method of this class, similar to how enable_eplb is validated. This would raise an error for invalid combinations.

Example:

if self.enable_ep_weight_filter and not self.enable_expert_parallel:
    raise ValueError(
        "enable_expert_parallel must be True to use enable_ep_weight_filter."
    )

@esmeetu esmeetu added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026
@khluu khluu added this to the v0.18.0 cherry picks milestone Mar 18, 2026
@esmeetu esmeetu merged commit 761e0aa into main Mar 18, 2026
69 checks passed
@esmeetu esmeetu deleted the opt-ep-weights-filter branch March 18, 2026 01:36
khluu pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 761e0aa)
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
maoxx241 pushed a commit to maoxx241/vllm that referenced this pull request Mar 24, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 761e0aa)
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…37351)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants