Skip to content

[Kernel] Add JIT activation#18401

Closed
weimin023 wants to merge 7 commits intosgl-project:mainfrom
weimin023:jit-activation
Closed

[Kernel] Add JIT activation#18401
weimin023 wants to merge 7 commits intosgl-project:mainfrom
weimin023:jit-activation

Conversation

@weimin023
Copy link
Copy Markdown
Contributor

@weimin023 weimin023 commented Feb 7, 2026

Motivation

Add JIT-compiled CUDA kernels for activation function

Modifications

  • Migrate the activation kernels from ahead-of-time (AOT) compilation in sgl_kernel to the JIT compilation framework under python/sglang/jit_kernel/
  • Port the CUDA source file (activation.cu) into python/sglang/jit_kernel/csrc/elementwise/activation.cuh with minimal modifications
  • Add a Python wrapper (python/sglang/jit_kernel/activation.py)
  • Add comprehensive correctness tests (python/sglang/jit_kernel/tests/test_activation.py) to verify JIT kernels with Torch results

Accuracy Tests

pytest /sgl-workspace/sglang/python/sglang/jit_kernel/tests/test_activation.py

platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /sgl-workspace/sglang/python
configfile: pyproject.toml
plugins: anyio-4.12.1, typeguard-4.4.4
collected 48 items

python/sglang/jit_kernel/tests/test_activation.py ................................................ [100%]

=================================================================================================================== 48 passed in 29.37s ====================================================================================================================

Benchmarking and Profiling

Test the accuracy:
python3 -m sglang.test.few_shot_gsm8k --num-questions 200

Accuracy: 0.820
Invalid: 0.000
Latency: 10.294 s
Output throughput: 2829.049 token/s

Benchmark the speed:

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@weimin023
Copy link
Copy Markdown
Contributor Author

Hi @BBuf, could you please take a look and let me know which benchmark I should run?
Thanks!

"csrc/allreduce/custom_all_reduce.hip",
"csrc/allreduce/deterministic_all_reduce.hip",
"csrc/allreduce/quick_all_reduce.cu",
"csrc/common_extension_rocm.cc",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT kernel has not support rocm yet. Maybe just keep the original HIP code first?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DarkSharpness, I've kept the HIP code and added a _IS_ROCM guard to allow non-NVIDIA GPUs to use the original AOT kernel. Please let me know if this meets the requirements.

@DarkSharpness
Copy link
Copy Markdown
Collaborator

@weimin023 Thanks for contribution! I adapted from this PR in #21766 (which should be a super-set of this PR) and added you as co-author. Feel free to reopen the PR if something is still missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants