[JIT Kernel] Migrate store_kv_cache to JIT kernel by Johnsonms · Pull Request #19298 · sgl-project/sglang

Johnsonms · 2026-02-25T04:53:23Z

Motivation

#17865
store_kv_cache is currently implemented as an AOT (ahead-of-time) compiled kernel in
sgl_kernel. This PR migrates it to the JIT kernel system, consistent with the ongoing effort to
slim down sgl_kernel and move kernels to JIT compilation. The JIT approach compiles for the
exact target architecture at runtime, reducing package size and improving maintainability.

Modifications

Adds JIT-compiled store_kv_cache as the primary implementation in sgl_kernel.memory, with
fallback to the AOT sgl_kernel op.

csrc/memory/store.cuh: CUDA kernels + TVM FFI wrapper adapted from
sgl-kernel/csrc/memory/store.cu; dispatches on int32/int64 index dtype and on
256/128-byte-aligned head dim
jit_kernel/store.py: Python JIT loader exposing store_kv_cache
tests/test_store_kv_cache.py: correctness tests across dtypes, index dtypes, batch sizes, and
head dims (120 cases)
benchmark/bench_store_kv_cache.py: latency benchmark comparing JIT vs AOT across item sizes
and batch sizes
sgl_kernel/memory.py: try JIT first, fall back to AOT torch.ops.sgl_kernel.store_kv_cache

Accuracy Tests

python -m pytest python/sglang/jit_kernel/tests/test_store_kv_cache.py -v

python python/sglang/jit_kernel/benchmark/bench_store_cache.py

python python/sglang/jit_kernel/benchmark/bench_store_kv_cache.py

Verified in python/sglang/jit_kernel/tests/test_store_kv_cache.py 120 cases, all passing:

bash
python -m pytest python/sglang/jit_kernel/tests/test_store_kv_cache.py -v
*** 120 passed

Covers: float16 / bfloat16 / float32 × int32 / int64 indices × batch sizes [1, 4, 16, 64, 128] ×
head dims [64, 128, 256, 512].

Benchmarking and Profiling

Benchmarked on H200, bfloat16, 8 layers, comparing JIT vs AOT (sgl_kernel.set_kv_buffer_kernel):

python python/sglang/jit_kernel/benchmark/bench_store_kv_cache.py

JIT and AOT show comparable latency (direct port of the same algorithm), confirming no regression.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

Adds JIT-compiled store_kv_cache as the primary implementation in sgl_kernel.memory, with fallback to the AOT sgl_kernel op. - csrc/memory/store.cuh: CUDA kernels + TVM FFI wrapper adapted from sgl-kernel/csrc/memory/store.cu; dispatches on int32/int64 index dtype and on 256/128-byte-aligned head dim - jit_kernel/store.py: Python JIT loader exposing store_kv_cache - tests/test_store_kv_cache.py: correctness tests across dtypes, index dtypes, batch sizes and head dims (120 cases) - sgl_kernel/memory.py: try JIT first, fall back to sgl_kernel AOT op

- benchmark/bench_store_kv_cache.py: latency benchmark comparing JIT vs AOT store_kv_cache across item sizes and batch sizes - csrc/memory/store.cuh: apply clang-format to kernel launch call sites

gemini-code-assist · 2026-02-25T04:53:26Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-02-25T04:58:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

DarkSharpness · 2026-02-25T05:21:12Z

Thanks for the PR. Unfortunately, we already have this in https://github.com/sgl-project/sglang/blob/main/python/sglang/jit_kernel/csrc/elementwise/kvcache.cuh

DarkSharpness · 2026-02-26T02:48:53Z

Closed due to duplicate of #16273 . Feel free to reopen if this PR is mistakenly closed.

Johnsonms added 2 commits February 25, 2026 04:30

[JIT Kernel] Add benchmark for store_kv_cache and apply clang-format

b0f6d95

- benchmark/bench_store_kv_cache.py: latency benchmark comparing JIT vs AOT store_kv_cache across item sizes and batch sizes - csrc/memory/store.cuh: apply clang-format to kernel launch call sites

github-actions bot added the sgl-kernel label Feb 25, 2026

Johnsonms marked this pull request as ready for review February 25, 2026 04:58

Johnsonms requested review from BBuf, DarkSharpness, FlamingoPg, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners February 25, 2026 04:58

Johnsonms changed the title ~~Store~~ [JIT Kernel] Migrate store_kv_cache to JIT kernel Feb 25, 2026

DarkSharpness closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT Kernel] Migrate store_kv_cache to JIT kernel#19298

[JIT Kernel] Migrate store_kv_cache to JIT kernel#19298
Johnsonms wants to merge 2 commits intosgl-project:mainfrom
Johnsonms:store

Johnsonms commented Feb 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

DarkSharpness commented Feb 25, 2026

Uh oh!

DarkSharpness commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Johnsonms commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

gemini-code-assist bot commented Feb 25, 2026

Uh oh!

DarkSharpness commented Feb 25, 2026

Uh oh!

DarkSharpness commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Johnsonms commented Feb 25, 2026 •

edited

Loading