Skip to content

Revert "[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM" (#38562)#38598

Draft
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-38562
Draft

Revert "[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM" (#38562)#38598
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-38562

Conversation

@vllm-agent
Copy link
Copy Markdown

Revert of #38562

This reverts commit 2c734ed.

Original PR: #38562
Failure count: 1 new failure in build #58877

Failed tests

  • GPQA Eval (GPT-OSS) (B200) — Evaluation timed out (30 min) on test_gpqa_correctness[gpt-oss-20b-sm100-fi-mxfp4-mxfp8-trtllm]. Switching the SM100 MLA prefill backend to TRT-LLM appears to have caused a significant performance regression on B200, leading to timeout.

Auto-generated by CI failure analyzer

@mergify mergify bot added the bug Something isn't working label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the AttentionConfig in vllm/config/attention.py by changing the default value of use_trtllm_ragged_deepseek_prefill from True to False. I have no feedback to provide as there are no review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant