Skip to content

[CI/Build] Add CI config for Sonic MoE kernel tests on H100#31606

Closed
clocksmith wants to merge 1 commit intovllm-project:mainfrom
clocksmith:ci/add-sonicmoe-test
Closed

[CI/Build] Add CI config for Sonic MoE kernel tests on H100#31606
clocksmith wants to merge 1 commit intovllm-project:mainfrom
clocksmith:ci/add-sonicmoe-test

Conversation

@clocksmith
Copy link

@clocksmith clocksmith commented Jan 1, 2026

Purpose

Adds buildkite CI configuration to run Sonic MoE kernel tests on H100 GPUs.

This config installs the sonicmoe package and executes tests/kernels/test_sonic_moe.py, when relevant source files change.

Unblocks #31548 (Sonic MoE integration for Hopper GPUs) by enabling CI validation.

Unblocks #31039.

Test Plan

CI-only change. The test configuration will trigger when #31548 is rebased/merged after this PR merges:

Test Result

N/A - CI configuration change. Validation occurs when #31548 runs against updated CI.


Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

@github-actions
Copy link

github-actions bot commented Jan 1, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the ci/build label Jan 1, 2026
@clocksmith clocksmith marked this pull request as ready for review January 1, 2026 19:08
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new CI configuration to run Sonic MoE kernel tests on H100 GPUs. The change is straightforward, but I've found a potential issue in the pytest command path that could prevent the tests from running correctly. I've also suggested adding a working_dir to ensure consistency with other kernel tests.

Signed-off-by: X <x@simulatte.world>
@clocksmith clocksmith force-pushed the ci/add-sonicmoe-test branch from 6f9f2d6 to 24c08c3 Compare January 1, 2026 19:35
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernels/moe/test_sonic_moe.py does not exist until #31548 is merged, its better to just add this to the PR that introduces the test. Why does this need its own build step? 45min is a very long timeout for a basic kernel test, does it actually take this long? if so we try to shorten this to save on CI costs.

@clocksmith
Copy link
Author

kernels/moe/test_sonic_moe.py does not exist until #31548 is merged, its better to just add this to the PR that introduces the test. Why does this need its own build step? 45min is a very long timeout for a basic kernel test, does it actually take this long? if so we try to shorten this to save on CI costs.

OK, closing this PR. I misunderstood the CI workflow and thought the sonicmoe dependency had to be installed before the test could run, not realizing the CI config itself handles pip install sonicmoe in the commands block.

Anywho, moved the config into #31548 so the test file and CI land together. Reduced timeout to 10min. Happy to combine with the DeepGEMM H100 step (or another) if you prefer to avoid a separate build step, however I think the isolation is worth the cost, since there are several sepearet build steps w/ H100 already. Thanks!

@clocksmith clocksmith closed this Jan 3, 2026
@clocksmith clocksmith deleted the ci/add-sonicmoe-test branch January 20, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants