Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
e86fe9d
Squashed merge PR #23624
ivanium Dec 2, 2025
f645d95
feat: for sliding window attention, only allocate tokens within the w…
ivanium Dec 2, 2025
ee81aa1
fix: skip outside sliding window tokens when touch and save cached bl…
ivanium Dec 4, 2025
7acbe08
fix: make interfaces consistent and remove debug prints
ivanium Dec 4, 2025
0b2218c
nits: remove test scripts
ivanium Dec 4, 2025
15ef476
fix: revert `cache_block()` changes as we have already handled the nu…
ivanium Dec 4, 2025
eb7bfcf
fix: revert KVCacheManager.allocate_slots() interface changes; revisi…
ivanium Dec 5, 2025
327e472
revert unrelated changes
ivanium Dec 5, 2025
580efd4
revert `blocks_to_touch` changes
ivanium Dec 5, 2025
37d7c3b
fix: update test cases
ivanium Dec 6, 2025
6169524
doc string nits
ivanium Dec 6, 2025
ccfc676
ignore mypy errors
ivanium Dec 6, 2025
ad761e6
fix: resolve comments; mainly merge local_computed_tokens and externa…
ivanium Dec 13, 2025
89af30c
fix: simplify return values of get_num_blocks_to_allocate
ivanium Dec 13, 2025
cf666cd
test: update test cases
ivanium Dec 13, 2025
75593ea
fix: num_new_tokens can be 0 when load_kv_async is enabled
ivanium Dec 13, 2025
fd34b51
fix: revert changes to factory.py
ivanium Dec 14, 2025
75c27e3
nits
ivanium Dec 15, 2025
76855bc
workaround lmcache new interfaces
ivanium Dec 15, 2025
188f661
fix: avoid memory leak in remove_skipped_blocks; workaround gemma3 pr…
ivanium Dec 16, 2025
0bed603
Squashed merge main
ivanium Dec 16, 2025
fe99429
Merge branch 'main' into feat/partial_ext_token_hit
ivanium Dec 16, 2025
ae264a1
Merge branch 'main' into feat/partial_ext_token_hit
ivanium Dec 16, 2025
6c9afb3
nits: revise function name and comments
ivanium Dec 16, 2025
00ac78e
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (#30…
mgoin Dec 16, 2025
b81b822
[Frontend] Add `max-completion-token` option to transcription/transla…
NickLucche Dec 16, 2025
03adf8a
[Refactor] Small refactor for group topk (#30562)
yewentao256 Dec 16, 2025
15110e1
[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE (#…
jiahanc Dec 16, 2025
dfbea50
[Attention] Cache attention metadata builds across hybrid KV-cache gr…
LucasWilkinson Dec 16, 2025
ecf0943
[Core][MM] Optimize encoder cache manager by operating with embedding…
ywang96 Dec 16, 2025
602c268
[Bugfix][DSV32] Fix overflow in topk. (#30754)
dcampora Dec 16, 2025
4317b41
[Kernel][Quantization][MoE] add marlin kernel support for turing (sm7…
jinzhen-lin Dec 16, 2025
b19c650
[CI] Skip ci failure test (#30804)
yewentao256 Dec 16, 2025
5177d06
[Perf][Kernels] Vectorize `csrc/activations_kernels.cu` (#29512)
mgoin Dec 16, 2025
d518996
[ROCm] [Bugfix] Fix torch sdpa hallucination (#30789)
tjtanaa Dec 16, 2025
8d71371
Replace deprecated enable_fusion with fuse_norm_quant in test_rms_gro…
mgoin Dec 16, 2025
f9068d3
[MM] Pass FA version in ViT Attn (#30756)
NickLucche Dec 16, 2025
481d63f
[docker] Allow kv_connectors install to fail on arm64 (#30806)
Dec 17, 2025
c4abe59
Fix nemotron_nas intermediate_size computation (#30795)
grzegorz-k-karch Dec 17, 2025
ff0c04b
Update model-hosting-container-standards to 0.1.10 (#30815)
mgoin Dec 17, 2025
1ec8d58
[CPU] Add action to automatically label CPU related PRs (#30678)
fadara01 Dec 17, 2025
35f5f6f
[CI/Build] Fix compatibility between #30244 and #30396 (#30787)
DarkLight1337 Dec 17, 2025
63a120b
bump up compressed tensors version to 0.13.0 (#30799)
shanjiaz Dec 17, 2025
4c7a8bc
Update note comment for flashinfer attention warmup (#30711)
mgoin Dec 17, 2025
b1997df
[Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models (#30829)
bigPYJ1151 Dec 17, 2025
254881d
[XPU] fix broken fp8 online quantization for XPU platform (#30831)
yma11 Dec 17, 2025
a4e36fb
[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser durin…
WangErXiao Dec 17, 2025
8862b21
[Mamba] Removed disable cascade attn in MambaModelConfig (#30712)
Josephasafg Dec 17, 2025
d3ebe34
CustomOp: grouped topk (#29575)
xinyu-intel Dec 17, 2025
42c55ea
[NIXL][Bugfix] Fix NIXL/RDMA registration failure over CuMemAllocator…
Somoku Dec 17, 2025
89173de
[Doc][ResponsesAPI] add documentation (#30840)
qandrew Dec 17, 2025
095b981
[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#30842)
yeqcharlotte Dec 17, 2025
28a2c91
[UX] Make `vllm bench serve` discover model by default and use --inpu…
mgoin Dec 17, 2025
bc7110b
[compile] Disable aot when eager backend is used. (#30810)
zhxchen17 Dec 17, 2025
15a8514
[compile] Ignore VLLM_FORCE_AOT_LOAD from cache factors (#30809)
zhxchen17 Dec 17, 2025
87c1dd9
[Fix]Load kv-cache dtype from hf_quant_config.json automatically (fix…
danielafrimi Dec 17, 2025
1cc37cf
[compile] Recompile graph module during Dynamo cache loading. (#30743)
zhxchen17 Dec 17, 2025
5b17c00
[Bug] Fix AttributeError: 'ColumnParallelLinear' object has no attrib…
yewentao256 Dec 17, 2025
55e260a
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve direct…
chaunceyjiang Dec 17, 2025
b0a5c93
[ci] Sync test areas yaml file with test-pipeline (#30862)
khluu Dec 17, 2025
0919e25
[Bugfix] deepseek-V3.2 self.weights_proj has no bias (#30841)
baoqian426 Dec 17, 2025
e59ce3b
Fix lazy import (#30858)
hmellor Dec 17, 2025
c8d9bd8
chores: adjust the attn register param order (#30688)
ILikeIneine Dec 17, 2025
674ce4d
[Fix] uniform decode batch check (#30747)
Jialin Dec 17, 2025
2069fe8
[Docs] fix function name (#30748)
lengrongfu Dec 17, 2025
507e816
Adapt the old parameter enable_thinking in chat_template_kwargs (#30852)
SongDI911 Dec 17, 2025
a9c0e32
[Model] Gemma3: Support untied word embeddings (#30827)
www-spam Dec 17, 2025
d35b32a
wip
NickLucche Nov 27, 2025
13d0e7c
wip
NickLucche Dec 17, 2025
49b509f
wip
NickLucche Dec 18, 2025
99d0268
fix: remove skipped blocks before passing them to the connector when …
ivanium Dec 19, 2025
ae80edf
is_null instead of 0 check
NickLucche Dec 22, 2025
2eb9904
get_sw_clippped_blocks to fix over-allocation for swa on D
NickLucche Jan 6, 2026
7576e55
fix issue with null blocks on P being one extra (17) by clipping
NickLucche Jan 6, 2026
4f17655
remove llama4 opt
NickLucche Jan 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,6 @@ if [[ $commands == *" entrypoints/openai "* ]]; then
--ignore=entrypoints/openai/test_audio.py \
--ignore=entrypoints/openai/test_shutdown.py \
--ignore=entrypoints/openai/test_completion.py \
--ignore=entrypoints/openai/test_sleep.py \
--ignore=entrypoints/openai/test_models.py \
--ignore=entrypoints/openai/test_lora_adapters.py \
--ignore=entrypoints/openai/test_return_tokens_as_ids.py \
Expand Down
37 changes: 22 additions & 15 deletions .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ steps:
- tests/entrypoints/
commands:
- pytest -v -s entrypoints/openai/tool_parsers
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/openai --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/instrumentator --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling

- label: Entrypoints Integration Test (LLM) # 30min
timeout_in_minutes: 40
Expand All @@ -148,7 +148,7 @@ steps:
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests

- label: Entrypoints Integration Test (API Server) # 100min
- label: Entrypoints Integration Test (API Server 1) # 100min
timeout_in_minutes: 130
mirror_hardwares: [amdexperimental]
agent_pool: mi325_1
Expand All @@ -162,10 +162,28 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/openai/test_collective_rpc.py # PYTHONPATH is needed to import custom Worker extension
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/test_collective_rpc.py --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration Test (API Server 2)
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
agent_pool: mi325_1
# grade: Blocking
working_dir: "/vllm-workspace/tests"
fast_check: true
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/entrypoints/sleep
- tests/entrypoints/rpc
- tests/tool_use
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/sleep
- pytest -v -s tool_use
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/rpc

- label: Entrypoints Integration Test (Pooling)
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
Expand Down Expand Up @@ -751,17 +769,6 @@ steps:
# Transcription WER check is skipped because encoder-decoder models are not supported on ROCm, see https://github.com/vllm-project/vllm/issues/27442
- pytest -s entrypoints/openai/correctness/

- label: OpenAI-Compatible Tool Use # 23 min
timeout_in_minutes: 35
mirror_hardwares: [amdexperimental, amdproduction]
agent_pool: mi325_1
# grade: Blocking
fast_check: false
source_file_dependencies:
- vllm/
- tests/tool_use
commands:
- pytest -v -s tool_use

##### models test #####

Expand Down
38 changes: 22 additions & 16 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ steps:
- tests/entrypoints/
commands:
- pytest -v -s entrypoints/openai/tool_parsers
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/instrumentator --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling

- label: Entrypoints Integration Test (LLM) # 30min
timeout_in_minutes: 40
Expand All @@ -132,7 +132,7 @@ steps:
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests

- label: Entrypoints Integration Test (API Server) # 100min
- label: Entrypoints Integration Test (API Server 1) # 100min
timeout_in_minutes: 130
mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/tests"
Expand All @@ -144,10 +144,26 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/openai/test_collective_rpc.py # PYTHONPATH is needed to import custom Worker extension
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/test_collective_rpc.py --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/test_chat_utils.py

- label: Entrypoints Integration Test (API Server 2)
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/tests"
fast_check: true
torch_nightly: true
source_file_dependencies:
- vllm/
- tests/entrypoints/sleep
- tests/entrypoints/rpc
- tests/tool_use
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s entrypoints/sleep
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/rpc
- pytest -v -s tool_use

- label: Entrypoints Integration Test (Pooling)
timeout_in_minutes: 50
mirror_hardwares: [amdexperimental]
Expand Down Expand Up @@ -654,7 +670,7 @@ steps:
- vllm/model_executor/layers/quantization
autorun_on_main: true
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-small.txt --tp-size=1
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-small.txt

- label: OpenAI API correctness # 22min
timeout_in_minutes: 30
Expand All @@ -666,16 +682,6 @@ steps:
commands: # LMEval+Transcription WER check
- pytest -s entrypoints/openai/correctness/

- label: OpenAI-Compatible Tool Use # 23 min
timeout_in_minutes: 35
mirror_hardwares: [amdexperimental]
fast_check: false
source_file_dependencies:
- vllm/
- tests/tool_use
commands:
- pytest -v -s tool_use

##### models test #####

- label: Basic Models Tests (Initialization)
Expand Down Expand Up @@ -1064,7 +1070,7 @@ steps:
- csrc/
- vllm/model_executor/layers/quantization
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt --tp-size=1
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt

##### 1 GPU test #####
##### multi gpus test #####
Expand Down
19 changes: 1 addition & 18 deletions .buildkite/test_areas/e2e_integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,28 +32,11 @@ steps:
- label: Prime-RL Integration (2 GPUs)
timeout_in_minutes: 30
optional: true
soft_fail: true
num_gpus: 2
working_dir: "/vllm-workspace"
source_file_dependencies:
- vllm/
- .buildkite/scripts/run-prime-rl-test.sh
commands:
- bash .buildkite/scripts/run-prime-rl-test.sh

- label: DeepSeek V2-Lite Async EPLB Accuracy
timeout_in_minutes: 60
gpu: h100
optional: true
num_gpus: 4
working_dir: "/vllm-workspace"
commands:
- bash .buildkite/scripts/scheduled_integration_test/deepseek_v2_lite_ep_async_eplb.sh 0.25 1319 8030

- label: Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy
timeout_in_minutes: 60
gpu: h100
optional: true
num_gpus: 4
working_dir: "/vllm-workspace"
commands:
- bash .buildkite/scripts/scheduled_integration_test/qwen3_next_mtp_async_eplb.sh 0.8 1319 8040
23 changes: 19 additions & 4 deletions .buildkite/test_areas/entrypoints.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ steps:
- tests/entrypoints/
commands:
- pytest -v -s entrypoints/openai/tool_parsers
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling
- pytest -v -s entrypoints/ --ignore=entrypoints/llm --ignore=entrypoints/rpc --ignore=entrypoints/sleep --ignore=entrypoints/instrumentator --ignore=entrypoints/openai --ignore=entrypoints/offline_mode --ignore=entrypoints/test_chat_utils.py --ignore=entrypoints/pooling

- label: Entrypoints Integration (LLM)
timeout_in_minutes: 40
Expand All @@ -25,7 +25,7 @@ steps:
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests

- label: Entrypoints Integration (API Server)
- label: Entrypoints Integration (API Server 1)
timeout_in_minutes: 130
working_dir: "/vllm-workspace/tests"
source_file_dependencies:
Expand All @@ -34,11 +34,26 @@ steps:
- tests/entrypoints/test_chat_utils
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/openai/test_collective_rpc.py # PYTHONPATH is needed to import custom Worker extension
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/test_collective_rpc.py --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/tool_parsers/
- pytest -v -s entrypoints/test_chat_utils.py


- label: Entrypoints Integration (API Server 2)
timeout_in_minutes: 130
working_dir: "/vllm-workspace/tests"
source_file_dependencies:
- vllm/
- tests/tool_use
- tests/entrypoints/sleep
- tests/entrypoints/instrumentator
- tests/entrypoints/rpc
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- PYTHONPATH=/vllm-workspace pytest -v -s entrypoints/rpc
- pytest -v -s entrypoints/instrumentator
- pytest -v -s entrypoints/sleep
- pytest -v -s tool_use

- label: Entrypoints Integration (Pooling)
timeout_in_minutes: 50
working_dir: "/vllm-workspace/tests"
Expand Down
4 changes: 2 additions & 2 deletions .buildkite/test_areas/lm_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ steps:
- vllm/model_executor/layers/quantization
autorun_on_main: true
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-small.txt --tp-size=1
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-small.txt

- label: LM Eval Large Models (4 GPUs)(A100)
gpu: a100
Expand Down Expand Up @@ -43,4 +43,4 @@ steps:
- csrc/
- vllm/model_executor/layers/quantization
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt --tp-size=1
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-blackwell.txt
2 changes: 2 additions & 0 deletions .buildkite/test_areas/lora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ steps:
# FIXIT: find out which code initialize cuda before running the test
# before the fix, we need to use spawn to test it
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
# Alot of these tests are on the edge of OOMing
- export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# There is some Tensor Parallelism related processing logic in LoRA that
# requires multi-GPU testing for validation.
- pytest -v -s -x lora/test_chatglm3_tp.py
Expand Down
2 changes: 2 additions & 0 deletions .buildkite/test_areas/models_basic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ steps:
source_file_dependencies:
- vllm/
- tests/models/test_initialization.py
- tests/models/registry.py
commands:
# Run a subset of model initialization tests
- pytest -v -s models/test_initialization.py::test_can_initialize_small_subset
Expand All @@ -20,6 +21,7 @@ steps:
source_file_dependencies:
- vllm/model_executor/models/
- tests/models/test_initialization.py
- tests/models/registry.py
commands:
# Only when vLLM model source is modified - test initialization of a large
# subset of supported models (the complement of the small subset in the above
Expand Down
4 changes: 3 additions & 1 deletion .buildkite/test_areas/pytorch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ steps:
# tests covered elsewhere.
# Use `find` to launch multiple instances of pytest so that
# they do not suffer from https://github.com/vllm-project/vllm/issues/28965
- "find compile/ -maxdepth 1 -name 'test_*.py' -exec pytest -s -v {} \\;"
# However, find does not normally propagate error codes, so we combine it with xargs
# (using -0 for proper path handling)
- "find compile/ -maxdepth 1 -name 'test_*.py' -print0 | xargs -0 -n1 -I{} pytest -s -v '{}'"

- label: PyTorch Fullgraph Smoke Test
timeout_in_minutes: 30
Expand Down
13 changes: 0 additions & 13 deletions .buildkite/test_areas/tool_use.yaml

This file was deleted.

14 changes: 14 additions & 0 deletions .github/mergify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,20 @@ pull_request_rules:
add:
- rocm

- name: label-cpu
description: Automatically apply cpu label
conditions:
- label != stale
- files~=^(?!.*kv_offload)(?!.*cpu_offload).*\bcpu.*
actions:
label:
add:
- cpu
assign:
users:
- "fadara01"
- "aditew01"

- name: label-structured-output
description: Automatically apply structured-output label
conditions:
Expand Down
Loading