Skip to content

[EPLB] Add log Info for moe_load Imbalance Ratio#4482

Merged
weijinqian0 merged 6 commits intovllm-project:mainfrom
dsxsteven:main_1127_moeload_logInfo
Dec 8, 2025
Merged

[EPLB] Add log Info for moe_load Imbalance Ratio#4482
weijinqian0 merged 6 commits intovllm-project:mainfrom
dsxsteven:main_1127_moeload_logInfo

Conversation

@dsxsteven
Copy link
Copy Markdown
Contributor

@dsxsteven dsxsteven commented Nov 27, 2025

What this PR does / why we need it?

Add log Info for MOE_load Imbalance Ratio

Does this PR introduce any user-facing change?

No

How was this patch tested?

Signed-off-by: daishixun <dsxsteven@sina.com>
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@dsxsteven dsxsteven changed the title log info for moe_load imbalance ratio [EPLB] Add Log Info for MOE_load Imbalance Ratio Nov 27, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logging for the Mixture-of-Experts (MoE) load imbalance ratio, which is a useful metric for performance analysis. The implementation correctly calculates the peak-to-average load ratio for each layer and then provides a summary across all layers. The logic is executed only on rank 0 to prevent excessive logging in a distributed setup. I have identified a minor robustness issue that could, under specific circumstances, lead to a ZeroDivisionError. I've provided a suggestion to make the code more resilient.

Comment thread vllm_ascend/eplb/eplb_updator.py Outdated
Comment on lines +201 to +207
if len(self.moe_imbalance_dict) == 0:
logger.info("[MOE_load_stats] No data available.")
return

values = list(self.moe_imbalance_dict.values())

avg_imbalance = sum(values) / len(values)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential for a ZeroDivisionError on line 207. Although self.moe_imbalance_dict is checked for emptiness on line 201, if it were to be modified by another thread between the check and when values is used, it could become empty, leading to a crash. While this is not an issue with the current single-threaded usage, it's safer to fetch the values first and then check for emptiness. This makes the code more robust against future changes and potential concurrency, and is also more idiomatic Python.

Suggested change
if len(self.moe_imbalance_dict) == 0:
logger.info("[MOE_load_stats] No data available.")
return
values = list(self.moe_imbalance_dict.values())
avg_imbalance = sum(values) / len(values)
values = list(self.moe_imbalance_dict.values())
if not values:
logger.info("[MOE_load_stats] No data available.")
return
avg_imbalance = sum(values) / len(values)

@dsxsteven dsxsteven changed the title [EPLB] Add Log Info for MOE_load Imbalance Ratio [EPLB] Add log Info for MOE_load Imbalance Ratio Nov 27, 2025
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
Signed-off-by: daishixun <dsxsteven@sina.com>
@dsxsteven dsxsteven changed the title [EPLB] Add log Info for MOE_load Imbalance Ratio [EPLB] Add log Info for moe_load Imbalance Ratio Nov 27, 2025
@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 4, 2025
@weijinqian0 weijinqian0 merged commit 96ea0e0 into vllm-project:main Dec 8, 2025
16 of 18 checks passed
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
### What this PR does / why we need it?
Add log Info for MOE_load Imbalance Ratio

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

---------

Signed-off-by: daishixun <dsxsteven@sina.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
weijinqian0 added a commit to weijinqian0/vllm-ascend that referenced this pull request Dec 9, 2025
### What this PR does / why we need it?
Add log Info for MOE_load Imbalance Ratio

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

---------

Signed-off-by: daishixun <dsxsteven@sina.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 10, 2025
### What this PR does / why we need it?
Add log Info for MOE_load Imbalance Ratio

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

---------

Signed-off-by: daishixun <dsxsteven@sina.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 10, 2025
### What this PR does / why we need it?
Add log Info for MOE_load Imbalance Ratio

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

---------

Signed-off-by: daishixun <dsxsteven@sina.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
@dsxsteven dsxsteven deleted the main_1127_moeload_logInfo branch March 10, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants