Skip to content
Closed
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
8024a5f
working integration with flash attention with sinks
LiuXiaoxuanPKU Jun 10, 2025
49fb54f
mxfp4 triton kernel integration
zyongye Jun 11, 2025
4d45846
code clean up
simon-mo Jul 4, 2025
60f6b5a
ran use_existing_torch.py
simon-mo Jul 8, 2025
6cb5e41
new weight format works for tp1 and non-swizzling
simon-mo Jul 10, 2025
53165c9
unpadded tp2 works
zyongye Jul 10, 2025
7b380a1
new format swizzling
zyongye Jul 14, 2025
5b012bf
add non-uniform tp sharding for moe layer
zyongye Jul 14, 2025
e6ca55e
fotmating
zyongye Jul 14, 2025
ab2f28e
comply with upstream triton_kernel refactor
zyongye Jul 16, 2025
4ad40a7
working on blackwell
zyongye Jul 18, 2025
2a05b31
adding constraints to use persistent kernel
zyongye Jul 18, 2025
88b2e85
add epilogue_subtile constraint
zyongye Jul 21, 2025
3d1da66
update on latest triton change, use non-swizzling temporially for nan…
zyongye Jul 22, 2025
4e2a184
rms default value
LiuXiaoxuanPKU Jul 22, 2025
9653279
swiglu limit
LiuXiaoxuanPKU Jul 23, 2025
18f2b76
triton kernel fixed, enable swizzling
zyongye Jul 24, 2025
7707b9d
Remove activation padding
zyongye Jul 25, 2025
c85eb70
new model path and make it configurable
simon-mo Jul 25, 2025
1eb375c
fix lint
simon-mo Jul 25, 2025
81ffa34
Support huggingface format (#10)
heheda12345 Jul 28, 2025
c652a92
Basic Harmony Integration (#9)
WoosukKwon Jul 28, 2025
b499376
Enable open weight model on ROCm
hongxiayang Jul 29, 2025
75be1af
batched ep working with pplx kernels
zyongye Jul 29, 2025
0c190f7
change num_experts name in config
zyongye Jul 30, 2025
cd077b5
deep_ep working, need padding hacks
zyongye Jul 30, 2025
1254dab
Refinement on padding for ROCm case (#13)
hongxiayang Jul 31, 2025
b922d33
Supports built-in web search & python (#12)
WoosukKwon Jul 31, 2025
e8d69e4
move swizzle_mxfp4 to mxfp4_util.py
zyongye Jul 30, 2025
edc111e
add swiglu_alpha and limits as parameter
zyongye Jul 31, 2025
720398f
bugfix
zyongye Aug 1, 2025
6ad314c
update model name
zyongye Aug 1, 2025
b8aff90
fix more config
zyongye Aug 1, 2025
ceb0967
use tool override config
simon-mo Aug 1, 2025
ab42dfc
add vllm logo
zyongye Aug 2, 2025
894bed8
update logo
zyongye Aug 2, 2025
40e947d
move logo to engine core init
zyongye Aug 3, 2025
d86896c
fix the msg related model mismatch (#18)
hongxiayang Aug 3, 2025
0215871
add warning
heheda12345 Aug 4, 2025
c30562f
move reasoning text to reasoning_content field
simon-mo Aug 4, 2025
69e9413
Use empty url for now (#19)
WoosukKwon Aug 4, 2025
1f7dabf
Fix import & type annotation (#22)
WoosukKwon Aug 4, 2025
42cdb41
Support external tool call in response API + harmony (#16)
heheda12345 Aug 5, 2025
7935701
rebase
WoosukKwon Aug 5, 2025
ff7fc8f
minor
WoosukKwon Aug 5, 2025
f991ad2
minor
WoosukKwon Aug 5, 2025
43f09a6
add back
WoosukKwon Aug 5, 2025
31542ef
fix
WoosukKwon Aug 5, 2025
72000e3
fix
WoosukKwon Aug 5, 2025
a702b83
requirements
WoosukKwon Aug 5, 2025
f9ac670
fix
WoosukKwon Aug 5, 2025
14468df
Fix use_harmony
WoosukKwon Aug 5, 2025
291b9b1
Support chat api (#20)
WoosukKwon Aug 5, 2025
808f4db
simple lints
simon-mo Aug 5, 2025
12df518
skip lint queue
simon-mo Aug 5, 2025
076cfce
fix pre-commit lint for rc1 (#27)
simon-mo Aug 5, 2025
b775a39
Support Responses Streaming (#21)
simon-mo Aug 5, 2025
30569a7
Increase the CUDA graph capture sizes (#28)
WoosukKwon Aug 5, 2025
78e69f6
Add TRT-LLM Attention Sink and MXFP4 MoE (#17)
minseokl Aug 5, 2025
c3baa17
fix usage
simon-mo Aug 5, 2025
e0dfe6f
Fix truncated output (#29)
WoosukKwon Aug 5, 2025
258cf6f
ux log (#30)
zyongye Aug 5, 2025
a99f3c1
Fix truncapted output for Responses API (#32)
WoosukKwon Aug 5, 2025
d9c54da
Remove basic_oai.py (#33)
WoosukKwon Aug 5, 2025
594861e
MCP Tool Server (#26)
heheda12345 Aug 5, 2025
e0bf571
Move `responses_api.py` to examples (#31)
heheda12345 Aug 5, 2025
a60c273
full export load can only used for mxfp4 (#34)
zyongye Aug 5, 2025
8260948
update FA3 tag with sink
zyongye Aug 5, 2025
8faff36
fix dependency and address mxfp4 nits
zyongye Aug 6, 2025
20388ad
update registry order
zyongye Aug 6, 2025
6a70830
one more registry
zyongye Aug 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ permissions:

jobs:
pre-commit:
runs-on: ubuntu-latest
runs-on: self-hosted
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/setup-python@42375524e23c412d93fb67b49958b491fce71c38 # v5.4.0
Expand All @@ -26,3 +26,5 @@ jobs:
- uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
with:
extra_args: --all-files --hook-stage manual
env:
SKIP: shellcheck
2 changes: 1 addition & 1 deletion cmake/external_projects/vllm_flash_attn.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ else()
FetchContent_Declare(
vllm-flash-attn
GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
GIT_TAG 1c2624e53c078854e0637ee566c72fe2107e75f4
GIT_TAG b99f8c821771fd11feb66d5c89661e9858fde359
GIT_PROGRESS TRUE
# Don't share the vllm-flash-attn build between build types
BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn
Expand Down
Loading