Skip to content

[EPLB] Offline eplb support#26176

Open
PatrykSaffer wants to merge 41 commits intovllm-project:mainfrom
PatrykSaffer:patryk/offline-eplb
Open

[EPLB] Offline eplb support#26176
PatrykSaffer wants to merge 41 commits intovllm-project:mainfrom
PatrykSaffer:patryk/offline-eplb

Conversation

@PatrykSaffer
Copy link
Copy Markdown
Contributor

@PatrykSaffer PatrykSaffer commented Oct 3, 2025

Purpose

Offline eplb- make it possible to rearrange experts once at the start time.
This allows rebalancing experts when engine starts. With static experts are rebalanced only once when engine starts,
this allows better experts balancedness without metrics tracking/rearrangement overhead.

EPLB has more options now:

  • load_initial_load_window and load_path:
    Whether to use initial load window and corresponding path.
  • save_load_window and load_path:
    Whether to save load window and corresponding path.
  • static:
    Whether to actively rebalance experts during runtime.

Test Plan

Integration test test_eplb_offline.py added.

Test Result

Tests pass

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
@mergify mergify Bot added deepseek Related to DeepSeek models qwen Related to Qwen models v1 labels Oct 3, 2025
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Oct 4, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @PatrykSaffer.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Oct 4, 2025
@PatrykSaffer PatrykSaffer changed the title Patryk/offline eplb Offline eplb support Oct 6, 2025
@PatrykSaffer PatrykSaffer changed the title Offline eplb support [EPLB] Offline eplb support Oct 6, 2025
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
PatrykSaffer and others added 2 commits December 4, 2025 10:59
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
@PatrykSaffer PatrykSaffer requested a review from abmfy December 4, 2025 13:59
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions Bot added the stale Over 90 days of inactivity label Mar 6, 2026
# Conflicts:
#	vllm/config/parallel.py
#	vllm/distributed/eplb/eplb_state.py
#	vllm/model_executor/layers/fused_moe/fused_moe.py
#	vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
#	vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py
#	vllm/model_executor/layers/fused_moe/layer.py
#	vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py
#	vllm/model_executor/layers/quantization/awq_marlin.py
#	vllm/model_executor/layers/quantization/bitsandbytes.py
#	vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
#	vllm/model_executor/layers/quantization/experts_int8.py
#	vllm/model_executor/layers/quantization/fp8.py
#	vllm/model_executor/layers/quantization/gguf.py
#	vllm/model_executor/layers/quantization/gptq_marlin.py
#	vllm/model_executor/layers/quantization/ipex_quant.py
#	vllm/model_executor/layers/quantization/modelopt.py
#	vllm/model_executor/layers/quantization/moe_wna16.py
#	vllm/model_executor/layers/quantization/mxfp4.py
#	vllm/model_executor/layers/quantization/quark/quark_moe.py
#	vllm/model_executor/layers/quantization/rtn.py
#	vllm/model_executor/models/qwen3_moe.py
#	vllm/v1/worker/gpu_model_runner.py
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
@PatrykSaffer
Copy link
Copy Markdown
Contributor Author

Hello @abmfy could you please review?

@github-actions github-actions Bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Mar 11, 2026
arpera added a commit to arpera/vllm that referenced this pull request May 8, 2026
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector multi-modality Related to multi-modality (#4194) performance Performance-related issues qwen Related to Qwen models speculative-decoding tool-calling unstale Recieved activity after being labelled stale v1

Projects

Status: No status
Status: To Triage

Development

Successfully merging this pull request may close these issues.

4 participants