Add mla-backend argument #5047

Fridge003 · 2025-04-03T20:19:11Z

Motivation

Followup of #5005, this PR adds --mla-backend argument to better handle the difference between normal attention backend and mla backend. In default, --attention-backend is set to flashinfer, and --mla-backend is set to triton.
Arguments like --enable-flashinfer-mla and --enable-flashmla are still usable.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

add mla-backend argument

73ae419

Fridge003 requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy, xiezhq-hermann, zhaochenyang20 and zhyncs as code owners April 3, 2025 20:19

Fridge003 closed this Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mla-backend argument #5047

Add mla-backend argument #5047

Uh oh!

Fridge003 commented Apr 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add mla-backend argument #5047

Add mla-backend argument #5047

Uh oh!

Conversation

Fridge003 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fridge003 commented Apr 3, 2025 •

edited

Loading