Skip to content

Add aiter attention support in prefill-attention-backend of gpt-oss#18282

Merged
HaiShaw merged 3 commits intosgl-project:mainfrom
HaiShaw:gpt-oss-with-aiter-prefill-swa
Mar 2, 2026
Merged

Add aiter attention support in prefill-attention-backend of gpt-oss#18282
HaiShaw merged 3 commits intosgl-project:mainfrom
HaiShaw:gpt-oss-with-aiter-prefill-swa

Conversation

@kkHuang-amd
Copy link
Copy Markdown
Collaborator

@kkHuang-amd kkHuang-amd commented Feb 5, 2026

Motivation

Improve the attention performance on ROCm platform.

Modifications

aiter_backend.py to support sliding window attention with sink
server_args.py to add "aiter" for supporting list

Accuracy Tests

Server command

SGLANG_USE_AITER=1 python3 -m sglang.launch_server --model-path /dockerx/data/models/openai/gpt-oss-120b/ --tp 8 --trust-remote-code --chunked-prefill-size 130172 --max-running-requests 128 --mem-fraction-static 0.85 --prefill-attention-backend aiter --decode-attention-backend triton --port 8000

Client command

/sglang# python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --port 8000 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:50<00:00, 26.00it/s] Accuracy: 0.829 Invalid: 0.014 Latency: 86.113 s Output throughput: 5072.316 token/s

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kkHuang-amd kkHuang-amd requested a review from HaiShaw as a code owner March 2, 2026 00:55
@HaiShaw HaiShaw merged commit 15af26d into sgl-project:main Mar 2, 2026
157 of 172 checks passed
@kkHuang-amd kkHuang-amd deleted the gpt-oss-with-aiter-prefill-swa branch March 2, 2026 08:26
Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants