[CI]Add EPLB CI. by offline893 · Pull Request #3568 · vllm-project/vllm-ascend

offline893 · 2025-10-20T14:55:16Z

What this PR does / why we need it?

1.Add eplb ci to check the change of eplb feature.
2.Add param checking of eplb params.

Does this PR introduce any user-facing change?

How was this patch tested?

Qwen in A3.

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: offline0806 <3337230449@qq.com>

# Conflicts: # vllm_ascend/ops/common_fused_moe.py # vllm_ascend/torchair/ops/torchair_fused_moe.py # vllm_ascend/worker/model_runner_v1.py

Signed-off-by: offline0806 <3337230449@qq.com>

nce. Signed-off-by: offline0806 <3337230449@qq.com>

Signed-off-by: offline0806 <3337230449@qq.com>

wangxiyuan · 2025-10-21T08:45:46Z

vllm_ascend/patch/platform/patch_common/__init__.py

+                                     "EXPERT_MAP_RECORD", "false") == "true"
        if dynamic_eplb:
            import vllm_ascend.patch.platform.patch_common.patch_multiproc_executor  # noqa
+            logger.warning(


logger.info

Revised according to the review comments.

wangxiyuan · 2025-10-21T08:46:27Z

vllm_ascend/worker/model_runner_v1.py

            dtype=torch.bool,
            device=self.device,
        )
+        self.dynamic_eplb = self.ascend_config.dynamic_eplb


duplicate with L482

Revised according to the review comments.

wangxiyuan · 2025-10-21T08:47:21Z

vllm_ascend/worker/model_runner_v1.py

        )
+        self.dynamic_eplb = self.ascend_config.dynamic_eplb
+        self.expert_map_record_path = self.ascend_config.expert_map_record_path
+        EPLBParamUtils.check_dynamic_eplb(self.ascend_config.dynamic_eplb)


if self.ascend_config.dynamic_eplb: EPLBParamUtils.check_dynamic_eplb(self.ascend_config.dynamic_eplb)

Revised according to the review comments.

github-actions · 2025-10-21T09:38:32Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-10-21T11:04:20Z

tests/e2e/nightly/models/test_deepseek_r1_w8a8_eplb.py

+        str(tp_size), "--data-parallel-size",
+        str(dp_size), "--port",
+        str(port), "--max-model-len", "36864", "--max-num-batched-tokens",
+        "36864", "--block-size", "128", "--trust-remote-code", "quantization",


--quantization

Revised according to the review comments.

Signed-off-by: offline0806 <3337230449@qq.com>

# Conflicts: # .github/workflows/vllm_ascend_test_nightly.yaml # vllm_ascend/patch/platform/patch_common/__init__.py

Signed-off-by: offline0806 <3337230449@qq.com>

Yikun

I don't know why this PR can be merged. @wangxiyuan

This PR remove many exsiting tests like qwen3-32b-in8-a3, qwen3-32b-in8-a2
It was merged without test pass, so if no special reason, I prefer to revert this PR.

wangxiyuan · 2025-10-22T01:10:14Z

this is a mistake by rebase. @offline0806 will create a new PR to add it back.
test failure due to other problem, not releated to this PR. and it has been fixed on main now.

### What this PR does / why we need it? 1.Add eplb ci to check the change of eplb feature. 2.Add param checking of eplb params. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Qwen in A3. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>

### What this PR does / why we need it? 1.Add eplb ci to check the change of eplb feature. 2.Add param checking of eplb params. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Qwen in A3. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>

### What this PR does / why we need it? 1.Add eplb ci to check the change of eplb feature. 2.Add param checking of eplb params. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Qwen in A3. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>

### What this PR does / why we need it? 1.Add eplb ci to check the change of eplb feature. 2.Add param checking of eplb params. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Qwen in A3. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>

### What this PR does / why we need it? 1.Add eplb ci to check the change of eplb feature. 2.Add param checking of eplb params. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Qwen in A3. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>

offline0806 added 30 commits October 9, 2025 16:24

[bugfix]change log2phy map to npu.

3cf7290

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]add patch to test_torch_air_init.

e2d177c

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]add patch to test torch air fused moe.

a73699a

Signed-off-by: offline0806 <3337230449@qq.com>

[]�t]add return value to patch.

08a22fa

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]add patch of torch.npu.is_availible to mock dis env.

44d09dd

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]add pytest.fixture to test torch air fused moe.

6d43490

Signed-off-by: offline0806 <3337230449@qq.com>

[bugfix]init expert map of mtp layer when using expert_map_path.

edb7593

Signed-off-by: offline0806 <3337230449@qq.com>

[bugfix]add switch to enable whether using gmm swiglu or gmm.

c836dd1

Signed-off-by: offline0806 <3337230449@qq.com>

[bugfix]fix nz tensor can not use gmm swiglu by clone buffer tensor.

702857c

Signed-off-by: offline0806 <3337230449@qq.com>

[bugfix]clone src tensor when prepare p2p task.

c0e7d7c

Signed-off-by: offline0806 <3337230449@qq.com>

[bugfix]add log2phy to apply.

549eab4

Signed-off-by: offline0806 <3337230449@qq.com>

[log]change d2d log level to warning.

4cc5cf0

Signed-off-by: offline0806 <3337230449@qq.com>

Merge remote-tracking branch 'upstream_gitee/main' into main_1009

befd600

[log]add log when init expert map of mtp.

a349723

Signed-off-by: offline0806 <3337230449@qq.com>

[ci]fix pre lint.

404a765

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]mock tensor to expert param per layer.

9ef551c

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]change test generate task.

1cb817c

Signed-off-by: offline0806 <3337230449@qq.com>

[[ut]fix ci.

4dade01

Signed-off-by: offline0806 <3337230449@qq.com>

[ut]fix ut.

acc6aef

Signed-off-by: offline0806 <3337230449@qq.com>

[[ut]fix ut.

976ba59

Signed-off-by: offline0806 <3337230449@qq.com>

[log]add log info when init new expert_map.

fbaecb9

Signed-off-by: offline0806 <3337230449@qq.com>

[log]add log info when init new expert_map.

4b357ad

Signed-off-by: offline0806 <3337230449@qq.com>

[env]add tmp env to enable eplb.

4d13920

Signed-off-by: offline0806 <3337230449@qq.com>

Merge remote-tracking branch 'upstream_gitee/main' into main_1009

59a4b92

# Conflicts: # vllm_ascend/ops/common_fused_moe.py # vllm_ascend/torchair/ops/torchair_fused_moe.py # vllm_ascend/worker/model_runner_v1.py

[EPLB]record expert map without dynamic eplb.

dfd431b

Signed-off-by: offline0806 <3337230449@qq.com>

[doc]change doc of eplb.

45084b8

Signed-off-by: offline0806 <3337230449@qq.com>

Merge remote-tracking branch 'upstream_gitee/main' into main_1009

83e926b

[[EPLB]Merge the judegement for dynamic eplb and expert map record path.

4cae86f

Signed-off-by: offline0806 <3337230449@qq.com>

[[ci]fix ci.

8260dac

Signed-off-by: offline0806 <3337230449@qq.com>

[log]if expert map is tensor,logonce.

6cf8676

nce. Signed-off-by: offline0806 <3337230449@qq.com>

wangxiyuan added the ready-for-test start test by label for PR label Oct 21, 2025

offline0806 added 5 commits October 21, 2025 12:23

[CI]change eplb ci to nightly.

8662b0f

Signed-off-by: offline0806 <3337230449@qq.com>

[CI]fix pre commit.

3939a52

Signed-off-by: offline0806 <3337230449@qq.com>

[CI]fix pre commit.

d048543

Signed-off-by: offline0806 <3337230449@qq.com>

[CI]fix pre commit.

6606e35

Signed-off-by: offline0806 <3337230449@qq.com>

[CI]fix pre commit.

16581b3

Signed-off-by: offline0806 <3337230449@qq.com>

MengqingCao added the ready read for review label Oct 21, 2025

wangxiyuan reviewed Oct 21, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Oct 21, 2025

wangxiyuan reviewed Oct 21, 2025

View reviewed changes

offline0806 added 3 commits October 21, 2025 20:46

[CI]fix ci.

1e213e7

Signed-off-by: offline0806 <3337230449@qq.com>

Merge remote-tracking branch 'upstream_gitee/main' into main_1009

3f2ef47

# Conflicts: # .github/workflows/vllm_ascend_test_nightly.yaml # vllm_ascend/patch/platform/patch_common/__init__.py

[patch]change patch of v1 executor.

ef0c94f

Signed-off-by: offline0806 <3337230449@qq.com>

github-actions bot removed the merge-conflicts label Oct 21, 2025

offline0806 added 3 commits October 21, 2025 21:16

[CI]fix pre commit.

6f52bbd

Signed-off-by: offline0806 <3337230449@qq.com>

[CI]fix ci.

033eb79

Signed-off-by: offline0806 <3337230449@qq.com>

[ci[ci]skip mooncake ut.

1325287

Signed-off-by: offline0806 <3337230449@qq.com>

wangxiyuan approved these changes Oct 21, 2025

View reviewed changes

wangxiyuan merged commit e916265 into vllm-project:main Oct 21, 2025
16 of 17 checks passed

Yikun reviewed Oct 21, 2025

View reviewed changes

MrZ20 mentioned this pull request Mar 2, 2026

[Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml #6503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI]Add EPLB CI.#3568

[CI]Add EPLB CI.#3568
wangxiyuan merged 77 commits intovllm-project:mainfrom
offline893:main_1009

offline893 commented Oct 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

wangxiyuan Oct 21, 2025

Uh oh!

offline893 Oct 21, 2025

Uh oh!

wangxiyuan Oct 21, 2025

Uh oh!

offline893 Oct 21, 2025

Uh oh!

wangxiyuan Oct 21, 2025

Uh oh!

offline893 Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

wangxiyuan Oct 21, 2025

Uh oh!

offline893 Oct 21, 2025

Uh oh!

Uh oh!

Yikun left a comment •

edited

Loading

Uh oh!

wangxiyuan commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

offline893 commented Oct 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yikun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

offline893 commented Oct 20, 2025 •

edited by github-actions bot

Loading

Yikun left a comment •

edited

Loading