Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5dfce1f
feat: fuse QNorm KNorm and RoPE into a single cuda kernel
izhuhaoran Sep 23, 2025
3997660
refactor: use const for q k weight in fused_qk_norm_rope
izhuhaoran Oct 16, 2025
e296264
refactor: update the note comment for fusedQKNormRopeKernel
izhuhaoran Oct 16, 2025
3e3e622
refactor: fix lint error
izhuhaoran Oct 16, 2025
c706c60
Merge branch 'main' into fuse-qknorm-rope
izhuhaoran Oct 18, 2025
8dc3e03
feat: add QKNormRoPEFusionPass for torch.compile
izhuhaoran Oct 17, 2025
d2154f8
add test for qknrom rope fuse pass
izhuhaoran Oct 18, 2025
9a88181
update qknorm_rope fusion pass and its unit test
izhuhaoran Oct 18, 2025
cf67619
lint: fix lint error
izhuhaoran Oct 19, 2025
33c23ed
fix: fix QKNormRoPEFusionPass pattern match
izhuhaoran Oct 22, 2025
f266b28
Merge remote-tracking branch 'origin/main' into fuse-qknorm-rope-compile
izhuhaoran Oct 23, 2025
6d23412
fix: attemp to support rope native forward match
izhuhaoran Oct 23, 2025
f847f1e
refactor: clean code
izhuhaoran Oct 23, 2025
960c240
lint: fix lint error
izhuhaoran Oct 23, 2025
d46dfa7
fix: adjust TODO comment in rms_norm
izhuhaoran Oct 23, 2025
504f0a3
fix: add dtype check for QKNormRoPEFusionPass and use warning_once
izhuhaoran Oct 23, 2025
738ae83
fix: update pytest for qk_norm_rope_fusion
izhuhaoran Oct 23, 2025
17e7c2f
fix: fix non-nexo style precision issue
izhuhaoran Oct 23, 2025
b77f220
fix: add is_neox false in pytest for qknorm rope fusion
izhuhaoran Oct 23, 2025
1a6a20b
ci: add test_qk_norm_rope_fusion in ci test-pipeline
izhuhaoran Oct 23, 2025
a10327f
feat: support float16 for qkv and support float32 for cos_sin_cache
izhuhaoran Oct 24, 2025
e033c00
Merge branch 'main' into fuse-qknorm-rope-compile
izhuhaoran Oct 30, 2025
49fcfbc
refactor: clean qk_norm_rope_fusion code
izhuhaoran Nov 4, 2025
dd0d307
test: add unit test for fused_qk_norm_rope kernel
izhuhaoran Nov 4, 2025
034e99a
chore: add linked issue for todo of rotary_embedding
izhuhaoran Nov 4, 2025
89edb10
fix: use vllm dispatch macro and spport rocm hip
izhuhaoran Nov 10, 2025
b23467c
fix: fix compile error fo sm75 and remove amd support
izhuhaoran Nov 10, 2025
ed09102
fix: use define to avoid amd compile
izhuhaoran Nov 10, 2025
6b6cf2f
Merge remote-tracking branch 'origin/main' into fuse-qknorm-rope-compile
izhuhaoran Nov 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ steps:
- pytest -v -s compile/test_decorator.py
- pytest -v -s compile/test_noop_elimination.py
- pytest -v -s compile/test_aot_compile.py
- pytest -v -s compile/test_qk_norm_rope_fusion.py

- label: PyTorch Fullgraph Smoke Test # 15min
timeout_in_minutes: 30
Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@ set(VLLM_EXT_SRC
"csrc/pos_encoding_kernels.cu"
"csrc/activation_kernels.cu"
"csrc/layernorm_kernels.cu"
"csrc/fused_qknorm_rope_kernel.cu"
"csrc/layernorm_quant_kernels.cu"
"csrc/sampler.cu"
"csrc/cuda_view.cu"
Expand Down
Loading