[refactor] Move trtllm_fp8_kv_kernel to triton_ops directory by harvenstar · Pull Request #15044 · sgl-project/sglang

harvenstar · 2025-12-13T07:24:40Z

PR Description:

Move trtllm_fp8_kv_kernel.py to python/sglang/srt/layers/attention/triton_ops/ for
better code organization, as suggested in #14553.

@b8zhong @Fridge003

Updated all import references accordingly.

Testing

Tested with serving benchmark - no regression observed.

Command:
CUDA_LAUNCH_BLOCKING=1 python3 -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --trust-remote-code --tp 4 --attention-backend trtllm_mha --kv-cache-dtype fp8_e4m3

python3 -m sglang.bench_serving \
  --backend sglang-oai \
  --dataset-name random \
  --random-input-len 1024 \
  --random-output-len 1024 \
  --random-range-ratio 0.98 \
  --num-prompts 80 \
  --max-concurrency 16

Results:
============ Serving Benchmark Result ============
Backend:                                 sglang-oai
Traffic request rate:                    inf
Max request concurrency:                 16
Successful requests:                     80
Benchmark duration (s):                  79.99
Total input tokens:                      81050
Total input text tokens:                 81050
Total input vision tokens:               0
Total generated tokens:                  81085
Total generated tokens (retokenized):    10185
Request throughput (req/s):              1.00
Input token throughput (tok/s):          1013.21
Output token throughput (tok/s):         1013.65
Peak output token throughput (tok/s):    1200.00
Peak concurrent requests:                23
Total token throughput (tok/s):          2026.85
Concurrency:                             15.92
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   15922.70
Median E2E Latency (ms):                 16067.41
---------------Time to First Token----------------
Mean TTFT (ms):                          414.21
Median TTFT (ms):                        282.49
P99 TTFT (ms):                           839.00
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          15.31
Median TPOT (ms):                        15.47
P99 TPOT (ms):                           16.62
---------------Inter-Token Latency----------------
Mean ITL (ms):                           15.32
Median ITL (ms):                         13.45
P95 ITL (ms):                            13.82
P99 ITL (ms):                            28.76
Max ITL (ms):                            454.27

Move trtllm_fp8_kv_kernel.py to python/sglang/srt/layers/attention/triton_ops/ for better code organization, as suggested in sgl-project#14553. Updated all import references accordingly. Tested with serving benchmark - no regression observed.

gemini-code-assist · 2025-12-13T07:24:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

harvenstar · 2025-12-13T07:32:53Z

/tag-and-rerun-ci

harvenstar · 2025-12-13T20:35:32Z

a small followup pr to #14553, file position adjustment, pls review. @Fridge003 Thanks!!

Fridge003 · 2025-12-14T08:03:09Z

/tag-and-rerun-ci

b8zhong · 2025-12-14T21:00:25Z

…ject#15044)

harvenstar requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners December 13, 2025 07:24

github-actions bot added the blackwell SM100/SM120 label Dec 13, 2025

b8zhong approved these changes Dec 14, 2025

View reviewed changes

b8zhong added the express-lane label Dec 14, 2025

github-actions bot added the run-ci label Dec 14, 2025

Fridge003 approved these changes Dec 14, 2025

View reviewed changes

b8zhong enabled auto-merge (squash) December 14, 2025 21:00

Merge branch 'main' into move-trtllm-fp8-kv-to-triton-ops

3c90100

Fridge003 disabled auto-merge December 14, 2025 23:22

Fridge003 merged commit 99cb2ed into sgl-project:main Dec 14, 2025
70 of 75 checks passed

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[refactor] Move trtllm_fp8_kv_kernel to triton_ops directory (sgl-pro…

04bd6d5

…ject#15044)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] Move trtllm_fp8_kv_kernel to triton_ops directory#15044

[refactor] Move trtllm_fp8_kv_kernel to triton_ops directory#15044
Fridge003 merged 2 commits intosgl-project:mainfrom
harvenstar:move-trtllm-fp8-kv-to-triton-ops

harvenstar commented Dec 13, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 13, 2025

Uh oh!

harvenstar commented Dec 13, 2025

Uh oh!

harvenstar commented Dec 13, 2025

Uh oh!

Fridge003 commented Dec 14, 2025

Uh oh!

b8zhong commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

harvenstar commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

gemini-code-assist bot commented Dec 13, 2025

Uh oh!

harvenstar commented Dec 13, 2025

Uh oh!

harvenstar commented Dec 13, 2025

Uh oh!

Fridge003 commented Dec 14, 2025

Uh oh!

b8zhong commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harvenstar commented Dec 13, 2025 •

edited

Loading