[CI] Add Async Eplb nightly CI tests#29385
Conversation
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
There was a problem hiding this comment.
Code Review
This pull request introduces nightly CI tests for asynchronous EPLB for the DeepSeek-V2-Lite and Qwen3-Next models. The changes are well-structured, adding new test scripts and updating the Buildkite pipeline. However, I've identified a critical issue in the new test scripts: the use_async flag in the EPLB configuration is incorrectly passed as a string ("true") instead of a boolean (true). This will prevent the async feature from being enabled, meaning the tests would not cover the intended functionality. I've provided suggestions to correct this in the review comments.
| --data-parallel-size 2 \ | ||
| --enable-expert-parallel \ | ||
| --enable-eplb \ | ||
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ |
There was a problem hiding this comment.
The use_async parameter in the JSON for --eplb-config is specified as the string "true". For it to be correctly parsed as a boolean value by Pydantic, it should be the JSON boolean literal true (without quotes). With the current value, the async feature will not be enabled, which defeats the purpose of this test.
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ | |
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":true}' \ |
| --tensor-parallel-size 4 \ | ||
| --enable-expert-parallel \ | ||
| --enable-eplb \ | ||
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ |
There was a problem hiding this comment.
The use_async parameter in the JSON for --eplb-config is specified as the string "true". For it to be correctly parsed as a boolean value by Pydantic, it should be the JSON boolean literal true (without quotes). With the current value, the async feature will not be enabled, which defeats the purpose of this test.
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":"true"}' \ | |
| --eplb-config '{"window_size":200,"step_interval":600,"use_async":true}' \ |
|
@tlrmchlsmth @DarkLight1337 PTAL, thx |
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
|
Ci failed test_eagle_correctness is not related about this pr. |
|
Merging, sorry for the delay! |
Waiting to be merged |
|
I don't have permission to force-merge this PR, need to wait for @simon-mo @WoosukKwon @youkaichao |
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
|
@DarkLight1337 CI passed, PTAL, thx. |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Thank you for adding the test!
| --data-parallel-size 2 \ | ||
| --enable-expert-parallel \ | ||
| --enable-eplb \ | ||
| --eplb-config '{"window_size":200,"step_interval":600}' \ |
There was a problem hiding this comment.
Nice catch, not sure the default step interval actually triggering eplb here
This reverts commit 7fe9c1a.
This reverts commit 7fe9c1a.
This reverts commit 7fe9c1a.
Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Micah Williamson <micah.williamson@amd.com>
…llm-project#30431) Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
This reverts commit 7bbd491.
Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
This reverts commit 718ccbb.
Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Purpose
#22179 comment Add Async Eplb nightly CI
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.