[CI] [FlashInfer v0.6.7] Use offline quantized checkpoint for MXFP8 Gemm tests#21625
[CI] [FlashInfer v0.6.7] Use offline quantized checkpoint for MXFP8 Gemm tests#21625Fridge003 merged 3 commits intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the test suite to use a specific MXFP8 model path and simplifies the test setup by removing the redundant quantization flag. It also temporarily disables the Triton-based MXFP8 GEMM test. A review comment identifies a likely typo in the pull request number referenced in the skip message, which should be corrected to ensure the reason for skipping the test is properly documented.
|
|
||
|
|
||
| @unittest.skip( | ||
| "Temporarily disabled until https://github.com/sgl-project/sglang/pull/19835 is merged" |
|
Hi @Fridge003 ,
CC @wolfcomos |
Thanks! I'm now working on the PR to improve the cuda graph capturing time. |
|
/rerun-stage stage-c-test-4-gpu-b200 |
|
✅ Triggered |
Motivation
@HumansAnd
Use offline mxfp8 checkpoint for CI stability.
MXFP8 Gemm CI is unstable after FlashInfer v0.6.7 update:
After investigating, the root cause is due to the instability of the online quantization code path itself, not flashinfer v0.6.7:
Note,
TestMXFP8GemmTritonis temporarily disabled untila fix in #19835 is mergedlong PCG capture time is fixed.Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci