Skip to content

[EPLB] Fix balancedness metric computation and add verbose reporting#39178

Open
arpera wants to merge 10 commits intovllm-project:mainfrom
arpera:artem/fix-eplb-balancedness
Open

[EPLB] Fix balancedness metric computation and add verbose reporting#39178
arpera wants to merge 10 commits intovllm-project:mainfrom
arpera:artem/fix-eplb-balancedness

Conversation

@arpera
Copy link
Copy Markdown
Contributor

@arpera arpera commented Apr 7, 2026

Purpose

This PR fixes the balancedness metric computation in EPLB to correctly reflect the load imbalance per MoE layer.

Previously, balancedness was calculated by computing the average and max load across layers (dim=0) instead of across ranks (dim=1), and then summing those values.

This PR changes the balancedness metric to:

  1. Compute the max/mean ratio of tokens per rank independently for each MoE layer.
  2. Average these per-layer ratios to produce the final balancedness metric.

This change ensures the logged metric accurately represents the average severity of the bottleneck at each MoE layer.

Additional changes in this PR:

  • Added a log_balancedness_verbose configuration flag (default: False). When enabled, EPLB logs a detailed multi-line report per logging interval, which includes a per-layer / per-rank token table to help debug expert routing.
  • Documented the log_balancedness_verbose flag in docs/serving/expert_parallel_deployment.md.
  • Updated the docs/serving/expert_parallel_deployment.md documentation to include previously added EPLB configuration fields that were missing from the docs:
    • log_balancedness_interval (originally introduced in #29499)
    • communicator (introduced in #33176)

Test Plan

Manual vLLM local launches to verify the logs.

Validation Result

Should not affect production performance since balancedness is calculated and printed only when the log_balancedness option is set in the vLLM config.

Example of the new verbose logging output:

eplb_dump
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

The comment says "for each layer: (mean load across ranks) / (max load
across ranks)" but the code was using dim=0 (averaging/maxing across
layers) instead of dim=1 (across ranks within each layer).

Fix to match the documented intent: compute mean/max across EP ranks
for each MoE layer independently, then average the per-layer ratios
over active layers.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 7, 2026

Hi @arpera, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the balancedness calculation in eplb_state.py to use an average of per-layer ratios instead of a global ratio. The feedback suggests using float64 for these calculations to prevent potential precision loss as model depth or batch sizes increase.

Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Compute per-layer mean and max in float64 to prevent precision loss
when summing token counts across many MoE layers.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented Apr 7, 2026

Valid points. Would it be ok if I heavily refactor this logging function and instead avg_tokens, max_tokens, and balancedness introduce more information about each layer? For example, I'll print hottest and coldest experts on current step, tokens' distribution among ranks on each layer on one step, distribution of experts among ranks, etc. I mean there won't be any one aggregated metric (like balancedness now), but instead I'll output detailed info about step. @ilmarkov What do you think about this idea?

@tlrmchlsmth
Copy link
Copy Markdown
Member

Valid points. Would it be ok if I heavily refactor this logging function and instead avg_tokens, max_tokens, and balancedness introduce more information about each layer? For example, I'll print hottest and coldest experts on current step, tokens' distribution among ranks on each layer on one step, distribution of experts among ranks, etc. I mean there won't be any one aggregated metric (like balancedness now), but instead I'll output detailed info about step. @ilmarkov What do you think about this idea?

I think that would be great to expose. Maybe guard it behind a verbosity that we set in the eplb config?

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented Apr 7, 2026

Maybe guard it behind a verbosity that we set in the eplb config?

Yes, great! Could you tell me please one more detail what is the name of this flag in the eplb config?

@tlrmchlsmth
Copy link
Copy Markdown
Member

Could you tell me please one more detail what is the name of this flag in the eplb config?

OH, sorry for the confusion - I was suggesting you add a flag to the eplb config (verbose_logging?)

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented Apr 7, 2026

Yes, I was also thinking about this idea. Now work in progress, stay tuned.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 9, 2026

Documentation preview: https://vllm--39178.org.readthedocs.build/en/39178/

@mergify mergify Bot added the documentation Improvements or additions to documentation label Apr 9, 2026
@arpera arpera changed the title [EPLB] Fix balancedness computation: use per-layer mean/max across ranks [EPLB] Fix balancedness metric computation and add verbose reporting Apr 9, 2026
Copy link
Copy Markdown
Contributor

@ilmarkov ilmarkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is still in progress — apologies if some of my comments are on things you're already planning to change. Happy to re-review once you're ready.

Nice touch with the ANSI heat coloring. Note though that in practice most users won't see it: in production, vLLM logs are typically collected from containers (Docker, Kubernetes). I would suggest to consider saving verbose stats in some structured format (maybe not in stdout/stderr) to ease an analysis.

Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Comment thread vllm/distributed/eplb/eplb_utils.py Outdated
Comment thread vllm/distributed/eplb/eplb_utils.py Outdated
Comment thread vllm/distributed/eplb/eplb_state.py
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@vadiklyutiy
Copy link
Copy Markdown
Collaborator

Friendly ping

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented Apr 30, 2026

@ilmarkov @tlrmchlsmth, PR is ready for a final review. Please have a look

Copy link
Copy Markdown
Contributor

@ilmarkov ilmarkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I'd still insist on saving the verbose log into a file for easier analysis. Also added some comments on improving the logged info.

Comment thread vllm/distributed/eplb/eplb_state.py
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Documentation preview: https://vllm--39178.org.readthedocs.build/en/39178/

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Hi @arpera, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented Apr 30, 2026

@ilmarkov I presume if everything else is ok then let us approve this PR. Issue from mergify is wrong, see 41377

Copy link
Copy Markdown
Contributor

@ilmarkov ilmarkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update!

The non-verbose part changes look good to me. In the verbose, I'd need some changes if we still want this piece after expert_load_dump_dir introduction. Although, I don't see the case we manually check stderr given that we have machine-parsable dumping with the same frequency. Maybe, for single node debugging only.

@tlrmchlsmth what do you think?

Comment thread vllm/distributed/eplb/eplb_state.py Outdated
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 6, 2026

@vadiklyutiy, can we merge this? I think I fixed all the issues and change is ready for merge now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants