Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
845ca32
[Bugfix] Fix test_whisper distributed test process handling (#42038)
dzhengAP May 9, 2026
e934e45
[CI][Bugfix] Make test_gpt2_cache_hit observable across V1 EngineCore…
haosdent May 9, 2026
97cc768
Add @Harry-Chen in CODEOWNERS (#42130)
Harry-Chen May 9, 2026
df2636a
[Bugfix] Fix LOGITPROC_SOURCE_ENTRYPOINT test to use spawn-compatible…
dzhengAP May 9, 2026
e8f9038
[ROCm][Bugfix] Re-tag AITER MoE weights as preshuffled after replace_…
maeehart May 9, 2026
a2812be
[Models] Cohere Eagle + fix to Cohere MoE (#42078)
Terrencezzj May 9, 2026
f6490a2
[Bugfix] Preserve leading/trailing whitespace in GLM non-streaming to…
rishaps May 9, 2026
d6563d6
Require C++20 for compatibility with PyTorch (#40380)
r-barnes May 9, 2026
ecd0b60
[LoRA] Initial EP support for LoRA (#40867)
jeejeelee May 9, 2026
34ab4f2
[ROCm] Upgrade aiter to v0.1.13-rc5 (#42113)
micah-wil May 9, 2026
530d371
[DSv4] Improved fused Indexer Q quant kernel (#41428)
gau-nernst May 9, 2026
adb6d96
[Bugfix] Fix GDN KKT precision loss on Hopper GPUs by aligning tl.dot…
Kermit-C May 9, 2026
3dda9ae
[Bugfix] Remove nested torch.compile in GDN rearrange_mixed_qkv causi…
tdoublep May 9, 2026
171d59a
[Bugfix][PD] Fix DSv4 Disaggregated (#41957)
NickLucche May 9, 2026
25abddc
[BugFix] Fix Gemma4 'layers.0.moe.experts.0.down_proj_packed' KeyErro…
SoluMilken May 9, 2026
cd74911
[Model] use AutoWeightsLoader for DeepSeekV2 (#41706)
SoluMilken May 9, 2026
2ee8c2a
[SpecDecoding] extend mtp support for mimo 2.5 (#41905)
ZJY0516 May 9, 2026
7a2b596
[Quantization] Add ModelOpt NVFP4 W4A16 (4-bit weights, fp16/bf16 act…
juhi10071998 May 9, 2026
f80aa53
[Refactor] Nixl util using lazy init (#41392)
yewentao256 May 9, 2026
ea0e501
[KV Connector] Remove compat support for pre-v0.12.0 constructor sign…
yewentao256 May 9, 2026
006af4b
[Bugfix] Skip routed-experts hot path when disabled (#42148)
aoshen02 May 10, 2026
bc5fdc1
Add NVFP4 all-gather GEMM fusion for AsyncTP (#41882)
baonudesifeizhai May 10, 2026
dcb3135
Fix: Nemotron 3 rescue whitespace-only final_content, not just None (…
Naveassaf May 10, 2026
0b272a6
[Bugfix] Fix SP pass for multimodal models and PP+SP residual handlin…
wangxingran222 May 10, 2026
1029e5e
[CI/Build] Use modelscope's international site for regression test (#…
Isotr0py May 10, 2026
0d382ec
Handle optional bool-or-string CLI args in get_kwargs (#40951)
cvan20191 May 10, 2026
00b0618
Use CU_MEMCPY_SRC_ACCESS_ORDER_ANY for batch KV cache swaps (#39306)
Etelis May 10, 2026
27d3bac
docs: clarify Gemma 4 assistant speculative decoding (#42180)
AbhiOnGithub May 10, 2026
986edc8
[Bugfix] Fix DeepSeek v4 topk numerical issue for unaligned max-model…
wzhao18 May 10, 2026
fb1ac80
[ROCm][CI] Stabilize ROCm shutdown and distributed compile CI (#41573)
AndreasKaratzas May 10, 2026
3f5bd48
[Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection …
Dao007forever May 10, 2026
f284012
[ROCm][CI] Fix NIXL spec-decode acceptance startup and diagnostics (#…
AndreasKaratzas May 10, 2026
a5d0a5a
[Frontend][Bugfix] Abort ASR engine requests on cancellation (#41266)
abdulrahman-cohere May 10, 2026
efd0e77
Fix mypy failure on main (#42197)
mmangkad May 10, 2026
301305c
Add @zyongye to CODEOWNERS (#42200)
zyongye May 10, 2026
a2c9d54
[Docs] Fix broken local links (#42160)
chfeng-cs May 10, 2026
84f7a55
[CI] Trigger LoRA test when changing MoE code. (#42196)
jeejeelee May 10, 2026
0a309b5
[ROCm] Cap Triton paged attention block size to fix ROCm shared memor…
AndreasKaratzas May 10, 2026
48698b1
[Bugfix] Fuse Qwen3.5 in_qkvz_proj forwarding with LoRA enabled (#37912)
Isotr0py May 10, 2026
a54f0d1
[CPU] Fix spec decode kernel signatures for synthetic mode compatibil…
jmamou May 10, 2026
e175192
[KV Offload] Pass ReqContext to touch(), complete_load(), and complet…
ronensc May 10, 2026
215e2f7
[Bugfix][Mamba] IMA in causal_conv1d kernel for long sequences (#41617)
Flink-ddd May 10, 2026
f396bee
[DSV4] Add PP support for deepseek-v4 (#41694)
Isotr0py May 10, 2026
21943d4
[Performance] Make safetensors checkpoint prefetch settings configura…
mmangkad May 10, 2026
1b57eb4
[MoE] Move various experts classes to fused_moe/experts/ (#41979)
bnellnm May 10, 2026
879a8c3
Fix Molmo2 image token metadata (#42162)
hqhq1025 May 11, 2026
171019a
add fused mhc_post_pre kernel (#41536)
gnovack May 11, 2026
b168752
[Bugfix] Gemma 4 chat template crash with missing tool name and tool …
yzong-rh May 11, 2026
7f95e66
[ROCm][Bugfix]: dynamically align BLOCK_DMODEL with Lv in MLA decode …
vllmellm May 11, 2026
5536fc0
[Misc] Replace mamba_type string literals with MambaAttentionBackendE…
wangxiyuan May 11, 2026
581b5e9
[Frontend] Return rendered prompt text in chat completion response (#…
princepride May 11, 2026
05d610e
[CI/Build] Reduce LoRA model tests. (#42266)
jeejeelee May 11, 2026
5cba683
Document MolmoWeb hf_overrides (#42163)
hqhq1025 May 11, 2026
f9f770c
fix nixl side-channel host selection (#41806)
shaharmor98 May 11, 2026
b1b5972
bugfix(flashinfer,dcp): remove kv_cache_layout for BatchDCPPrefillWra…
pisceskkk May 11, 2026
9efdddc
[Model] Fix missing `maybe_prefix` (#42280)
DarkLight1337 May 11, 2026
770e9bd
[Nixl][PD] Lease renewal TTL KV blocks on P (#41383)
NickLucche May 11, 2026
5672d10
[KV Connector][NIXL][Bugfix] Fix NIXL handshake failures not honoring…
NickLucche May 11, 2026
17ed5e6
[CI] Make Python-only Installation optional (#42293)
haosdent May 11, 2026
27ae676
Fix EXAONE-4.5 to align with Transformers update (#42246)
lkm2835 May 11, 2026
617239b
[Frontend]Responses API supports chat_template_kwargs (#42272)
chaunceyjiang May 11, 2026
ac06214
Avoid silent weights corruption when loading Nemotron Nano VL with re…
noa-neria May 11, 2026
8415bf2
[kv_offload] Set offloading connector to prefer HND layout (#41928)
hickeyma May 11, 2026
a51376b
[Performance][DSR1]: Fused RoPE+KVCache+q_concat for MLA (#40392)
Rohan138 May 11, 2026
724ed2f
[DSv4] Improved dequant gather K cache kernel (#42236)
gau-nernst May 11, 2026
5f1b313
[ROCm] Clean up a bit the AITER FA backend (#41942)
pschlan-amd May 11, 2026
4b64fc2
[Refactor] Cleanup batch invariant dead code (#41993)
yewentao256 May 11, 2026
4955990
[kv_offload] Move `FilterReusedOffloadingManager` logic to `CPUOffloa…
hickeyma May 11, 2026
a2e776d
[Bugfix] Accept canonicalized `modelopt_*` quant_method in `_extract_…
vadiklyutiy May 11, 2026
3f9c0c2
[Bug] Fix kimi dtype issue with `mm_projector_forward` (#42081)
yewentao256 May 11, 2026
0d453e2
[Perf] Batch invariance with Cutlass fp8 support, 28.9% E2E latency i…
yewentao256 May 11, 2026
7863fff
[ROCm][DSv4] implement flash sparse mla with triton kernels (#41812)
whx-sjtu May 11, 2026
9af6a5e
[Model Runner V2] Fix `seq_lens_cpu_upper_bound` (#42202)
njhill May 11, 2026
5497ffb
Add documentation about vLLM FIPS compliance (#42190)
vrdn-23 May 11, 2026
cf0d279
[Docs] Add Apple Silicon documentation for vLLM-Metal GPU support (#4…
alexagriffith May 11, 2026
6fdb493
[Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 Triton kernel (#…
Flink-ddd May 11, 2026
a721315
[ROCm][Perf] Fix RMSNorm+Quant fusion for gfx950 (non-fnuz) (#41825)
frida-andersson May 11, 2026
639cbfd
[CI] Add tests/parser to CI coverage (#41877)
sfeng33 May 11, 2026
56e5810
[BugFix] Prevent orphaned process on NCCL destroy (#39846)
jeffreywang-anyscale May 11, 2026
a0dc7a0
[CI] Consolidate Speech to Text tests (#42274)
noooop May 11, 2026
5318138
[Bugfix] Fix DSV4 swiglu_limit on marlin backend (#42287)
jeejeelee May 11, 2026
bbee532
[Perf][1/n] Eliminate various GPU<->CPU syncs (#41429)
njhill May 11, 2026
d7af6b3
[Model Runner V2] Bug fix: logprob dtype int64/int32 issue (#41761)
yewentao256 May 11, 2026
39dff5f
Add VLLM_USE_SPINLOOP_EXT to use more efficient busy polling (#36517)
pschlan-amd May 11, 2026
920bf3e
[Bugifx] [Qwen3CoderTool] Restore supports_required_and_named for req…
chaunceyjiang May 12, 2026
630492d
[Fix] Gemma4 Mixed-Resolution Image Co-Batching Crash (#42217)
skyloevil May 12, 2026
0a291a2
Merge upstream/main into deepseekv4-rocm (May 2026)
lcskrishna May 12, 2026
e46c4cf
mhc: add ROCm fallback for the fused mhc_post_pre op
lcskrishna May 12, 2026
75a8e1b
add dsv4 flash mla triton validation
lcskrishna May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
63 changes: 47 additions & 16 deletions .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,7 @@ steps:
- tests/lora
- vllm/platforms/rocm.py
commands:
- pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py --ignore=lora/test_llm_with_multi_loras.py --ignore=lora/test_olmoe_tp.py --ignore=lora/test_deepseekv2_tp.py --ignore=lora/test_gptoss_tp.py --ignore=lora/test_qwen3moe_tp.py --ignore=lora/test_qwen35_densemodel_lora.py
- pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py --ignore=lora/test_qwen3_with_multi_loras.py --ignore=lora/test_olmoe_tp.py --ignore=lora/test_deepseekv2_tp.py --ignore=lora/test_gptoss_tp.py --ignore=lora/test_qwen3moe_tp.py --ignore=lora/test_qwen35_densemodel_lora.py

#------------------------------------------------------ mi250 · model_executor -------------------------------------------------------#

Expand Down Expand Up @@ -880,7 +880,7 @@ steps:
- vllm/platforms/rocm.py
commands:
- uv pip install --system -r /vllm-workspace/requirements/kv_connectors_rocm.txt
- ROCM_ATTN=1 bash v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh
- ATTENTION_BACKEND=ROCM_ATTN bash v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh

- label: V1 e2e (2 GPUs) # TBD
timeout_in_minutes: 180
Expand Down Expand Up @@ -929,6 +929,7 @@ steps:
- tests/tokenizers_
- tests/reasoning
- tests/tool_parsers
- tests/parser
- tests/transformers_utils
- tests/config
commands:
Expand All @@ -942,6 +943,7 @@ steps:
- pytest -v -s tokenizers_
- pytest -v -s reasoning --ignore=reasoning/test_seedoss_reasoning_parser.py --ignore=reasoning/test_glm4_moe_reasoning_parser.py
- pytest -v -s tool_parsers
- pytest -v -s parser
- pytest -v -s transformers_utils
- pytest -v -s config

Expand Down Expand Up @@ -1100,13 +1102,13 @@ steps:
- vllm/compilation/
- vllm/model_executor/layers
- tests/compile/passes/distributed/
- tests/compile/fusions_e2e/
- vllm/_aiter_ops.py
- vllm/platforms/rocm.py
commands:
- export VLLM_TEST_CLEAN_GPU_MEMORY=1
- VLLM_TEST_CLEAN_GPU_MEMORY=1 pytest -v -s tests/compile/passes/distributed/test_async_tp.py
- pytest -v -s tests/compile/passes/distributed/test_sequence_parallelism.py
- pytest -v -s tests/compile/passes/distributed/test_tp2_ar_rms.py::test_tp2_ar_rms_fusions
- pytest -v -s tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fusions

#----------------------------------------------------------- mi300 · cuda ------------------------------------------------------------#

Expand Down Expand Up @@ -1320,7 +1322,6 @@ steps:
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai/completion --ignore=entrypoints/openai/completion/test_tensorizer_entrypoint.py
- pytest -v -s entrypoints/openai/speech_to_text/
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration (API Server openai - Part 3) # TBD
Expand All @@ -1336,7 +1337,21 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion --ignore=entrypoints/openai/completion --ignore=entrypoints/openai/speech_to_text/ --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion --ignore=entrypoints/openai/completion --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py

- label: Entrypoints Integration (Speech to Text) # TBD
timeout_in_minutes: 180
mirror_hardwares: [amdexperimental, amdproduction, amdgfx942nightly, amdmi300]
agent_pool: mi300_1
fast_check: true
torch_nightly: true
working_dir: "/vllm-workspace/tests"
source_file_dependencies:
- vllm/
- tests/entrypoints/speech_to_text
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/speech_to_text

- label: Entrypoints Integration (LLM) # TBD
timeout_in_minutes: 180
Expand Down Expand Up @@ -1760,7 +1775,7 @@ steps:
- export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
- pytest -v -s -x lora/test_chatglm3_tp.py
- pytest -v -s -x lora/test_llama_tp.py
- pytest -v -s -x lora/test_llm_with_multi_loras.py
- pytest -v -s -x lora/test_qwen3_with_multi_loras.py
- pytest -v -s -x lora/test_olmoe_tp.py
- pytest -v -s -x lora/test_gptoss_tp.py
- pytest -v -s -x lora/test_qwen35_densemodel_lora.py
Expand Down Expand Up @@ -1803,9 +1818,10 @@ steps:
- tests/models/multimodal/generation
- tests/models/multimodal/test_mapping.py
commands:
- pip install git+https://github.com/TIGER-AI-Lab/Mantis.git
- pytest -v -s models/multimodal/generation -m 'not core_model' --ignore models/multimodal/generation/test_common.py
- pytest -v -s models/multimodal/test_mapping.py
- uv pip install --system --no-build-isolation 'git+https://github.com/AndreasKaratzas/mamba@rocm-7.0-v2.3.0'
- uv pip install --system --no-build-isolation 'git+https://github.com/Dao-AILab/causal-conv1d@v1.6.0'
- pytest -v -s models/language/generation -m hybrid_model --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --shard-id=$$BUILDKITE_PARALLEL_JOB


- label: Multi-Modal Models (Extended Generation 2) # TBD
timeout_in_minutes: 180
Expand All @@ -1817,8 +1833,10 @@ steps:
- vllm/
- tests/models/multimodal/generation
commands:
- pip install git+https://github.com/TIGER-AI-Lab/Mantis.git
- pytest -v -s models/multimodal/generation/test_common.py -m 'split(group=0) and not core_model'
- uv pip install --system --no-build-isolation 'git+https://github.com/AndreasKaratzas/mamba@rocm-7.0-v2.3.0'
- uv pip install --system --no-build-isolation 'git+https://github.com/Dao-AILab/causal-conv1d@v1.6.0'
- pytest -v -s models/language/generation -m '(not core_model) and (not hybrid_model)'


- label: Multi-Modal Models (Extended Generation 3) # TBD
timeout_in_minutes: 180
Expand Down Expand Up @@ -2763,7 +2781,6 @@ steps:
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai/completion --ignore=entrypoints/openai/completion/test_tensorizer_entrypoint.py
- pytest -v -s entrypoints/openai/speech_to_text/
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration (API Server openai - Part 3) # TBD
Expand All @@ -2779,7 +2796,21 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion --ignore=entrypoints/openai/completion --ignore=entrypoints/openai/speech_to_text/ --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion --ignore=entrypoints/openai/completion --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py

- label: Entrypoints Integration (Speech to Text) # TBD
timeout_in_minutes: 180
mirror_hardwares: [amdexperimental, amdproduction, amdgfx942nightly, amdmi355]
agent_pool: mi355_1
fast_check: true
torch_nightly: true
working_dir: "/vllm-workspace/tests"
source_file_dependencies:
- vllm/
- tests/entrypoints/speech_to_text
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/speech_to_text

- label: Entrypoints Integration (Pooling) # TBD
timeout_in_minutes: 180
Expand Down Expand Up @@ -3043,7 +3074,7 @@ steps:
- vllm/
- tests/models/language/generation
commands:
- uv pip install --system --no-build-isolation 'git+https://github.com/AndreasKaratzas/mamba@fix-rocm-7.0-warp-size-constexpr'
- uv pip install --system --no-build-isolation 'git+https://github.com/AndreasKaratzas/mamba@rocm-7.0-v2.3.0'
- uv pip install --system --no-build-isolation 'git+https://github.com/Dao-AILab/causal-conv1d@v1.6.0'
- pytest -v -s models/language/generation -m '(not core_model) and (not hybrid_model)'

Expand Down Expand Up @@ -3318,7 +3349,7 @@ steps:
- vllm/platforms/rocm.py
commands:
- uv pip install --system -r /vllm-workspace/requirements/kv_connectors_rocm.txt
- ROCM_ATTN=1 bash v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh
- ATTENTION_BACKEND=ROCM_ATTN bash v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh

- label: Distributed NixlConnector PD accuracy (4 GPUs) # TBD
timeout_in_minutes: 180
Expand Down
19 changes: 14 additions & 5 deletions .buildkite/test_areas/entrypoints.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ steps:
- tests/entrypoints/
commands:
- pytest -v -s entrypoints/openai/tool_parsers
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/serve/instrumentator --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/serve/instrumentator --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling --ignore=entrypoints/speech_to_text

- label: Entrypoints Integration (LLM)
key: entrypoints-integration-llm
Expand Down Expand Up @@ -44,7 +44,6 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai/chat_completion --ignore=entrypoints/openai/chat_completion/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/chat_completion/test_oot_registration.py


- label: Entrypoints Integration (API Server openai - Part 2)
key: entrypoints-integration-api-server-openai-part-2
timeout_in_minutes: 50
Expand All @@ -55,7 +54,6 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- pytest -v -s entrypoints/openai/completion --ignore=entrypoints/openai/completion/test_tensorizer_entrypoint.py
- pytest -v -s entrypoints/openai/speech_to_text/
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration (API Server openai - Part 3)
Expand All @@ -69,7 +67,7 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion --ignore=entrypoints/openai/completion --ignore=entrypoints/openai/speech_to_text/ --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/chat_completion --ignore=entrypoints/openai/completion --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/ --ignore=entrypoints/openai/responses --ignore=entrypoints/openai/test_multi_api_servers.py

- label: Entrypoints Integration (API Server 2)
key: entrypoints-integration-api-server-2
Expand All @@ -86,6 +84,17 @@ steps:
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/rpc
- pytest -v -s tool_use

- label: Entrypoints Integration (Speech to Text)
key: entrypoints-integration-speech_to_text
timeout_in_minutes: 50
working_dir: "/vllm-workspace/tests"
source_file_dependencies:
- vllm/
- tests/entrypoints/speech_to_text
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/speech_to_text

- label: Entrypoints Integration (Pooling)
key: entrypoints-integration-pooling
timeout_in_minutes: 50
Expand Down Expand Up @@ -115,5 +124,5 @@ steps:
- csrc/
- vllm/entrypoints/openai/
- vllm/model_executor/models/whisper.py
commands: # LMEval+Transcription WER check
commands: # LMEval
- pytest -s entrypoints/openai/correctness/
5 changes: 3 additions & 2 deletions .buildkite/test_areas/lora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ steps:
- vllm/lora
- tests/lora
commands:
- pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py --ignore=lora/test_llm_with_multi_loras.py --ignore=lora/test_olmoe_tp.py --ignore=lora/test_deepseekv2_tp.py --ignore=lora/test_gptoss_tp.py --ignore=lora/test_qwen3moe_tp.py --ignore=lora/test_qwen35_densemodel_lora.py
- pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py --ignore=lora/test_qwen3_with_multi_loras.py --ignore=lora/test_olmoe_tp.py --ignore=lora/test_deepseekv2_tp.py --ignore=lora/test_gptoss_tp.py --ignore=lora/test_qwen3moe_tp.py --ignore=lora/test_qwen35_densemodel_lora.py
parallelism: 4


Expand All @@ -19,6 +19,7 @@ steps:
num_devices: 4
source_file_dependencies:
- vllm/lora
- vllm/model_executor/layers/fused_moe/
- tests/lora
commands:
# FIXIT: find out which code initialize cuda before running the test
Expand All @@ -30,7 +31,7 @@ steps:
# requires multi-GPU testing for validation.
- pytest -v -s -x lora/test_chatglm3_tp.py
- pytest -v -s -x lora/test_llama_tp.py
- pytest -v -s -x lora/test_llm_with_multi_loras.py
- pytest -v -s -x lora/test_qwen3_with_multi_loras.py
- pytest -v -s -x lora/test_olmoe_tp.py
- pytest -v -s -x lora/test_gptoss_tp.py
- pytest -v -s -x lora/test_qwen35_densemodel_lora.py
3 changes: 3 additions & 0 deletions .buildkite/test_areas/misc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ steps:
- label: Python-only Installation
key: python-only-installation
depends_on: ~
optional: true
timeout_in_minutes: 20
source_file_dependencies:
- tests/standalone_tests/python_only_compile.sh
Expand Down Expand Up @@ -282,6 +283,7 @@ steps:
- tests/tokenizers_
- tests/reasoning
- tests/tool_parsers
- tests/parser
- tests/transformers_utils
- tests/config
device: cpu-small
Expand All @@ -296,6 +298,7 @@ steps:
- pytest -v -s tokenizers_
- pytest -v -s reasoning --ignore=reasoning/test_seedoss_reasoning_parser.py --ignore=reasoning/test_glm4_moe_reasoning_parser.py
- pytest -v -s tool_parsers
- pytest -v -s parser
- pytest -v -s transformers_utils
- pytest -v -s config

Expand Down
21 changes: 16 additions & 5 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
/vllm/distributed/kv_transfer @NickLucche @ApostaC @orozery @xuechendi
/vllm/lora @jeejeelee
/vllm/model_executor/layers/attention @LucasWilkinson @MatthewBonanni
/vllm/model_executor/layers/fused_moe @mgoin @pavanimajety
/vllm/model_executor/layers/quantization @mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256 @pavanimajety
/vllm/model_executor/layers/fused_moe @mgoin @pavanimajety @zyongye
/vllm/model_executor/layers/quantization @mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256 @pavanimajety @zyongye
/vllm/model_executor/layers/mamba @tdoublep @tomeras91
/vllm/model_executor/layers/mamba/gdn_linear_attn.py @tdoublep @ZJY0516 @vadiklyutiy
/vllm/model_executor/layers/rotary_embedding.py @vadiklyutiy
Expand All @@ -18,7 +18,8 @@
/vllm/kernels/helion @ProExpertProg @zou3519
/vllm/multimodal @DarkLight1337 @ywang96 @NickLucche @tjtanaa
/vllm/vllm_flash_attn @LucasWilkinson @MatthewBonanni
CMakeLists.txt @tlrmchlsmth @LucasWilkinson
/CMakeLists.txt @tlrmchlsmth @LucasWilkinson @Harry-Chen
/cmake @tlrmchlsmth @LucasWilkinson @Harry-Chen

# Any change to the VllmConfig changes can have a large user-facing impact,
# so spam a lot of people
Expand Down Expand Up @@ -70,18 +71,22 @@ CMakeLists.txt @tlrmchlsmth @LucasWilkinson
/vllm/v1/worker/gpu @WoosukKwon @njhill
/vllm/v1/worker/gpu/kv_connector.py @orozery

# CI & building
/.buildkite @Harry-Chen
/docker/Dockerfile @Harry-Chen

# Test ownership
/.buildkite/lm-eval-harness @mgoin
/tests/distributed/test_multi_node_assignment.py @youkaichao
/tests/distributed/test_pipeline_parallel.py @youkaichao
/tests/distributed/test_same_node.py @youkaichao
/tests/entrypoints @DarkLight1337 @robertgshaw2-redhat @aarnphm @NickLucche
/tests/evals @mgoin @vadiklyutiy
/tests/kernels @mgoin @tlrmchlsmth @WoosukKwon @yewentao256
/tests/kernels @mgoin @tlrmchlsmth @WoosukKwon @yewentao256 @zyongye
/tests/kernels/ir @ProExpertProg @tjtanaa
/tests/models @DarkLight1337 @ywang96
/tests/multimodal @DarkLight1337 @ywang96 @NickLucche
/tests/quantization @mgoin @robertgshaw2-redhat @yewentao256 @pavanimajety
/tests/quantization @mgoin @robertgshaw2-redhat @yewentao256 @pavanimajety @zyongye
/tests/test_inputs.py @DarkLight1337 @ywang96
/tests/entrypoints/llm/test_struct_output_generate.py @mgoin @russellb @aarnphm
/tests/v1/structured_output @mgoin @russellb @aarnphm
Expand Down Expand Up @@ -147,6 +152,12 @@ mkdocs.yaml @hmellor
# MTP-specific files
/vllm/model_executor/models/deepseek_mtp.py @luccafong

# DeepseekV4-specific files
/vllm/v1/attention/ops/deepseek_v4_ops @zyongye
/vllm/model_executor/layers/deepseek_compressor.py @zyongye
/vllm/model_executor/layers/deepseek_v4_attention.py @zyongye
/vllm/model_executor/layers/sparse_attn_indexer.py @zyongye

# Mistral-specific files
/vllm/model_executor/models/mistral*.py @patrickvonplaten
/vllm/model_executor/models/mixtral*.py @patrickvonplaten
Expand Down
24 changes: 23 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@ cmake_minimum_required(VERSION 3.26)
# cmake --install . --component _C
project(vllm_extensions LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CUDA_STANDARD 20)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
set(CMAKE_HIP_STANDARD 20)
set(CMAKE_HIP_STANDARD_REQUIRED ON)


# CUDA by default, can be overridden by using -DVLLM_TARGET_DEVICE=... (used by setup.py)
Expand Down Expand Up @@ -105,6 +109,24 @@ else()
set(CUDA_SUPPORTED_ARCHS "7.0;7.5;8.0;8.6;8.7;8.9;9.0")
endif()

#
# spinloop extension (pure CXX; must stay above the non-CUDA device branch so
# CPU builds define the target before the early return)
#
set(VLLM_SPINLOOP_EXT_SRC "csrc/spinloop.cpp")
set(SPINLOOP_COMPILE_FLAGS "")
if(CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64|amd64")
list(APPEND SPINLOOP_COMPILE_FLAGS "-mmwaitx")
endif()
define_extension_target(
spinloop
DESTINATION vllm
LANGUAGE CXX
SOURCES ${VLLM_SPINLOOP_EXT_SRC}
COMPILE_FLAGS ${SPINLOOP_COMPILE_FLAGS}
USE_SABI 3.11
WITH_SOABI)

#
# Forward the non-CUDA device extensions to external CMake scripts.
#
Expand Down
2 changes: 1 addition & 1 deletion cmake/cpu_extension.cmake
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
include(FetchContent)

set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_EXTENSIONS ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

Expand Down
1 change: 0 additions & 1 deletion cmake/external_projects/deepgemm.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ if(DEEPGEMM_ARCHS)
"${deepgemm_SOURCE_DIR}/third-party/fmt/include")

target_compile_options(_deep_gemm_C PRIVATE
$<$<COMPILE_LANGUAGE:CXX>:-std=c++17>
$<$<COMPILE_LANGUAGE:CXX>:-O3>
$<$<COMPILE_LANGUAGE:CXX>:-Wno-psabi>
$<$<COMPILE_LANGUAGE:CXX>:-Wno-deprecated-declarations>)
Expand Down
Loading
Loading