[CI/Build] Add CI config for Sonic MoE kernel tests on H100#31606
[CI/Build] Add CI config for Sonic MoE kernel tests on H100#31606clocksmith wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request adds a new CI configuration to run Sonic MoE kernel tests on H100 GPUs. The change is straightforward, but I've found a potential issue in the pytest command path that could prevent the tests from running correctly. I've also suggested adding a working_dir to ensure consistency with other kernel tests.
Signed-off-by: X <x@simulatte.world>
6f9f2d6 to
24c08c3
Compare
There was a problem hiding this comment.
kernels/moe/test_sonic_moe.py does not exist until #31548 is merged, its better to just add this to the PR that introduces the test. Why does this need its own build step? 45min is a very long timeout for a basic kernel test, does it actually take this long? if so we try to shorten this to save on CI costs.
OK, closing this PR. I misunderstood the CI workflow and thought the sonicmoe dependency had to be installed before the test could run, not realizing the CI config itself handles pip install sonicmoe in the commands block. Anywho, moved the config into #31548 so the test file and CI land together. Reduced timeout to 10min. Happy to combine with the DeepGEMM H100 step (or another) if you prefer to avoid a separate build step, however I think the isolation is worth the cost, since there are several sepearet build steps w/ H100 already. Thanks! |
Purpose
Adds buildkite CI configuration to run Sonic MoE kernel tests on H100 GPUs.
This config installs the sonicmoe package and executes
tests/kernels/test_sonic_moe.py, when relevant source files change.Unblocks #31548 (Sonic MoE integration for Hopper GPUs) by enabling CI validation.
Unblocks #31039.
Test Plan
CI-only change. The test configuration will trigger when #31548 is rebased/merged after this PR merges:
source_file_dependenciesmatches files added in [Kernel] Add Sonic MoE integration for Hopper GPUs #31548pip install sonicmoethenpytest -v -s kernels/test_sonic_moe.pyTest Result
N/A - CI configuration change. Validation occurs when #31548 runs against updated CI.
Essential Elements of an Effective PR Description Checklist