[AMD] Fix AMD CI test of TestToolChoiceLfm2Moe#19113
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
e1d2d6c to
c02f522
Compare
c02f522 to
841d6ba
Compare
841d6ba to
b57aed1
Compare
test/registered/openai_server/function_call/test_tool_choice.py
Outdated
Show resolved
Hide resolved
759ac90 to
cfdea4c
Compare
- Relax LoRA multi-batch ROUGE-L tolerance from 1.0 to 0.95 to account for minor numerical non-determinism on ROCm - Fix aiter attention backend crashing on hybrid Mamba+attention models (e.g. LFM2-MoE): use get_v_head_dim() for hybrid KV pools instead of hardcoded get_value_buffer(0) which fails when layer 0 is not an attention layer - Skip TestToolChoiceLfm2Moe on AMD: sgl_kernel ROCm build lacks causal_conv1d_update op needed by Mamba layers
cfdea4c to
5bd3199
Compare
6111bf0 to
39d1391
Compare
Upstream already has a proper v_head_dim fix (handling MLA, hybrid_gdn, kimi_linear models) so our hasattr-based version is no longer needed.
The aiter RoPE backend has lower precision (as warned by apex), causing consistent single-token differences between SRT and HF reference outputs (ROUGE-L 0.9774 vs required 1.0). Disable it for the LoRA multi-batch test to produce exact matches.
The existing check only covers hybrid_gdn_config and kimi_linear_config, but LFM2 models use HybridLinearKVPool without either config. Use hasattr(get_v_head_dim) to cover all hybrid KV pool types, matching triton_backend.py.
|
stage-b Lora test will be fix in next pr. cc: @yctseng0211, @bingxche, @sogalin, @HaiShaw |
|
|
|
Install dependency timeout error is unrelated to this PR Could you please take a look? Thanks in advance. @alisonshao @Kangyan-Zhou |
|
@hubertlu-tw Could you have another look? The PR only change TestToolChoiceLfm2Moe now with triton implementation. Thanks! Cc: @HaiShaw |
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com> Co-authored-by: bingxche <Bingxu.Chen@amd.com> Co-authored-by: yctseng0211 <yctseng@amd.com>
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com> Co-authored-by: bingxche <Bingxu.Chen@amd.com> Co-authored-by: yctseng0211 <yctseng@amd.com>


Motivation
Fix pre-existing stage-b AMD CI test failures:
https://github.com/sgl-project/sglang/actions/runs/22327699165/job/64603707599#step:6:14389
ValueError: layer_id=0 not in full attention layers: dict_keys([2, 6, 10, 14, 18, 21])
in aiter_backend.py line 126. The aiter attention backend hardcodes layer_id=0 to get the value head dim, but LFM2-MoE is a hybrid Mamba+attention model where only layers [2, 6, 10, 14, 18, 21] are attention layers. Layer 0 is a Mamba layer.
On CUDA, the model uses flashinfer which handles this correctly. On AMD, aiter is auto-selected and crashes.
Please help review: @yctseng0211, @bingxche, @HaiShaw, @sogalin . Thanks!
Modifications
aiter_backend.py: Usehasattr(get_v_head_dim)to cover all hybrid KV pool types including LFM2, matchingtriton_backend.py. Previous check only coveredhybrid_gdn_config/kimi_linear_config.causal_conv1d.py: Fall back to Triton kernels when sgl_kernelcausal_conv1dops are unavailable on ROCm. (by @bingxche)test_tool_choice.py: RemoveTestToolChoiceLfm2AMD skip since causal_conv1d now falls back to Triton. (by @bingxche)pr-test-amd.yml,pr-test-amd-rocm720.yml: Add new AMD stage-b job. (by @yctseng0211)run_suite.py,slash_command_handler.py: CI plumbing for new stage-b job. (by @yctseng0211)Test plan
TestToolChoiceLfm2Moeno longer crashes withValueError: layer_id=0