Revert "[CI] Add Async Eplb nightly CI tests"#30086
Conversation
This reverts commit 7fe9c1a.
There was a problem hiding this comment.
Code Review
This pull request aims to revert the addition of asynchronous EPLB nightly CI tests, which are reportedly causing out-of-memory issues. The changes largely consist of removing the problematic test scripts and their corresponding CI pipeline configurations. This is a reasonable step to stabilize the CI. However, the PR also includes an unrelated addition of comments in vllm/distributed/eplb/rebalance_execute.py. For maintainability and a clean Git history, revert PRs should be atomic. I've recommended moving this unrelated change to a separate PR.
| # A buffer to hold the expert weights in one layer during the exchange. | ||
| # NOTE: Currently we assume the same weights across different layers | ||
| # have the same shape. |
There was a problem hiding this comment.
There was a problem hiding this comment.
This PR is exclusively a reversion of #29385. I have made no other changes.
|
I will take a look, https://buildkite.com/vllm/ci/builds/41883/steps/canvas?sid=019ae82a-231f-4893-aa07-bd72e4ed5bbf |
|
Was green on the nightly feels environmental 🤔 https://buildkite.com/vllm/ci/builds/42071 |
Reverts #29385
This test appears to be OOMing in CI. Let's revert until we figure out what's going on. CC @david6666666