-
Notifications
You must be signed in to change notification settings - Fork 584
test: Skip unsupported SM Archs for newly added trtllm MoE test #2060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/bot run |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe pull request adds GPU compute capability checks to test functions in the TRTLLM MOE test suite. Tests are conditionally skipped on GPUs with SM versions other than SM100/SM103 (Blackwell-class GPUs) by detecting the compute capability and triggering early returns when requirements are not met. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where a recently introduced TRT-LLM MoE test was causing failures on GPU devices with unsupported Streaming Multiprocessor (SM) architectures. By incorporating a compute capability check, the test now gracefully skips execution on non-SM10X GPUs, ensuring test stability and preventing unnecessary failures on incompatible hardware. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly adds a check to skip a new MoE test on unsupported SM architectures, fixing a test failure bug. My review suggests a refactoring to improve the implementation of this check. Instead of an inline skip, I recommend using a reusable pytest marker. This improves maintainability by avoiding code duplication and adheres to pytest best practices, making the test's requirements more explicit.
| compute_capability = get_compute_capability(torch.device(device="cuda")) | ||
| if compute_capability[0] not in [10]: | ||
| pytest.skip("These tests are only guaranteed to work on SM100 and SM103 GPUs.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This compute capability check is duplicated from tests/moe/test_trtllm_gen_fused_moe.py. To improve maintainability and follow pytest best practices, it's better to use a custom marker with skipif. This avoids runtime checks inside the test body and makes the requirement explicit in the test declaration.
You could define a marker in a shared conftest.py file:
# tests/conftest.py or a shared test utility file
import pytest
import torch
sm10x_only = pytest.mark.skipif(
not torch.cuda.is_available() or torch.cuda.get_device_capability()[0] != 10,
reason="Test requires SM 10.x architecture (e.g., SM100, SM103)"
)Then apply it to the test function. This would allow you to remove these lines, and also the get_compute_capability import on line 39.
# tests/moe/test_trtllm_gen_routed_fused_moe.py
# from tests.conftest import sm10x_only
@sm10x_only
@pytest.mark.parametrize(...)
# ...
def test_trtllm_gen_routed_fused_moe(...):
# No need for the manual skip check here
torch.manual_seed(42)
# ...This approach is cleaner, reusable, and evaluated at test collection time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/moe/test_trtllm_gen_routed_fused_moe.py (2)
39-40: Consolidate imports fromflashinfer.utils.Line 31 already imports from
flashinfer.utils. Consider consolidating both imports into a single statement for better organization.Apply this diff:
-from flashinfer.utils import device_support_pdl +from flashinfer.utils import device_support_pdl, get_compute_capability -from flashinfer.utils import get_compute_capability -
65-67: Consider using the same device for capability check and test execution.The compute capability is checked on a generic
cudadevice (line 65), but the test executes oncuda:0(line 69). While this works in most cases, using the same device for both ensures consistency, especially in multi-GPU systems.Apply this diff:
+ device = torch.device("cuda:0") - compute_capability = get_compute_capability(torch.device(device="cuda")) + compute_capability = get_compute_capability(device) if compute_capability[0] not in [10]: pytest.skip("These tests are only guaranteed to work on SM100 and SM103 GPUs.") torch.manual_seed(42) - device = torch.device("cuda:0")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/moe/test_trtllm_gen_routed_fused_moe.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/moe/test_trtllm_gen_routed_fused_moe.py (1)
flashinfer/utils.py (1)
get_compute_capability(252-255)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
|
[SUCCESS] Pipeline #38051844: 13/17 passed |
📌 Description
tests/moe/test_trtllm_gen_routed_fused_moe.pywas newly added in #2049, but does not have an SM arch check, which causes unit test failures on non SM10X devices.Current PR adds skips
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit