Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. #32005

@ProExpertProg

3 rules match and 25 potential rules

⚠️ The pull request has been merged by @ProExpertProg

Rule: label-documentation (comment, label)

Rule: comment-pre-commit-failure (comment)

-closed
status-failure=pre-commit
-draft

Rule: comment-dco-failure (comment)

-closed
status-failure=dco
-draft

Rule: label-ci-build (label)

Rule: label-deepseek (label)

any of:
- files~=^examples/.*deepseek.*\.py
- files~=^tests/.*deepseek.*\.py
- files~=^vllm/entrypoints/openai/tool_parsers/.*deepseek.*\.py
- files~=^vllm/model_executor/models/.*deepseek.*\.py
- files~=^vllm/reasoning/.*deepseek.*\.py
- files~=^vllm/transformers_utils/.*deepseek.*\.py
- title~=(?i)DeepSeek
label != stale

Rule: label-frontend (label)

files~=^vllm/entrypoints/
label != stale

Rule: label-llama (label)

any of:
- files~=^examples/.*llama.*\.py
- files~=^tests/.*llama.*\.py
- files~=^vllm/entrypoints/openai/tool_parsers/llama.*\.py
- files~=^vllm/model_executor/models/.*llama.*\.py
- files~=^vllm/transformers_utils/configs/.*llama.*\.py
- title~=(?i)llama
label != stale

Rule: label-multi-modality (label)

Rule: label-new-model (label)

all of:
- files=vllm/model_executor/models/registry.py
- files~=^vllm/model_executor/models/
label != stale

Rule: label-performance (label)

Rule: label-qwen (label)

Rule: label-gpt-oss (label)

✅ Rule: label-nvidia (label)

Rule: label-rocm (label)

Rule: label-cpu (assign, label)

files~=^(?!.*kv_offload)(?!.*cpu_offload).*\bcpu.*
label != stale

Rule: label-structured-output (label)

Rule: label-speculative-decoding (label)

any of:
- files=vllm/model_executor/models/mlp_speculator.py
- files~=^examples/.*(spec_decode|mlpspeculator|eagle|speculation).*\.py
- files~=^tests/v1/spec_decode/
- files~=^vllm/model_executor/models/.*eagle.*\.py
- files~=^vllm/transformers_utils/configs/(eagle|medusa|mlp_speculator)\.py
- files~=^vllm/v1/spec_decode/
label != stale

✅ Rule: label-v1 (label)

label != stale
any of:
- files~=^tests/v1/
- files~=^vllm/v1/

Rule: label-tpu (label)

✅ Rule: label-tpu-remove (label)

Rule: label-tool-calling (label)

Rule: auto-rebase if approved, ready, and 40 commits behind main (rebase)

Rule: ping author on conflicts and add 'needs-rebase' label (comment, label)

-closed
conflict
label != stale

Rule: assign reviewer for tensorizer changes (assign)

any of:
- files~=^tests/entrypoints/openai/test_tensorizer_entrypoint.py
- files~=^tests/model_executor/model_loader/tensorizer_loader/
- files~=^vllm/model_executor/model_loader/tensorizer.py
- files~=^vllm/model_executor/model_loader/tensorizer_loader.py
label != stale

Rule: assign reviewer for modelopt changes (assign)

any of:
- files~=^docs/features/quantization/modelopt\.md$
- files~=^tests/models/quantization/test_modelopt\.py$
- files~=^tests/models/quantization/test_nvfp4\.py$
- files~=^tests/quantization/test_modelopt\.py$
- files~=^vllm/model_executor/layers/quantization/__init__\.py$
- files~=^vllm/model_executor/layers/quantization/modelopt\.py$
label != stale

Rule: remove 'needs-rebase' label when conflict is resolved (label)

-closed
-conflict

Rule: label-bug (label)

any of:
- title~=(?i)\bbug\b
- title~=(?i)\bbugfix\b
label != stale

Rule: label-kv-connector (label)

💖 Mergify is proud to provide this service for free to open source projects.

🚀 You can help us by becoming a sponsor!

Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

@Mergifyio refresh will re-evaluate the rules
@Mergifyio rebase will rebase this PR on its base branch
@Mergifyio update will merge the base branch into this PR
@Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

look at your merge queues
generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. #32005

Uh oh!

Uh oh!

Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. #32005

Uh oh!

3 rules match and 25 potential rules

Rule: label-documentation (comment, label)

Rule: comment-pre-commit-failure (comment)

Rule: comment-dco-failure (comment)

Rule: label-ci-build (label)

Rule: label-deepseek (label)

Rule: label-frontend (label)

Rule: label-llama (label)

Rule: label-multi-modality (label)

Rule: label-new-model (label)

Rule: label-performance (label)

Rule: label-qwen (label)

Rule: label-gpt-oss (label)

✅ Rule: label-nvidia (label)

Rule: label-rocm (label)

Rule: label-cpu (assign, label)

Rule: label-structured-output (label)

Rule: label-speculative-decoding (label)

✅ Rule: label-v1 (label)

Rule: label-tpu (label)

✅ Rule: label-tpu-remove (label)

Rule: label-tool-calling (label)

Rule: auto-rebase if approved, ready, and 40 commits behind main (rebase)

Rule: ping author on conflicts and add 'needs-rebase' label (comment, label)

Rule: assign reviewer for tensorizer changes (assign)

Rule: assign reviewer for modelopt changes (assign)

Rule: remove 'needs-rebase' label when conflict is resolved (label)

Rule: label-bug (label)

Rule: label-kv-connector (label)

Re-running checks...

Uh oh!

Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. #32005

Uh oh!

Fix errors

Uh oh!

Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. #32005

Uh oh!

3 rules match and 25 potential rules

Rule: label-documentation (comment, label)

Rule: comment-pre-commit-failure (comment)

Rule: comment-dco-failure (comment)

Rule: label-ci-build (label)

Rule: label-deepseek (label)

Rule: label-frontend (label)

Rule: label-llama (label)

Rule: label-multi-modality (label)

Rule: label-new-model (label)

Rule: label-performance (label)

Rule: label-qwen (label)

Rule: label-gpt-oss (label)

✅ Rule: label-nvidia (label)

Rule: label-rocm (label)

Rule: label-cpu (assign, label)

Rule: label-structured-output (label)

Rule: label-speculative-decoding (label)

✅ Rule: label-v1 (label)

Rule: label-tpu (label)

✅ Rule: label-tpu-remove (label)

Rule: label-tool-calling (label)

Rule: auto-rebase if approved, ready, and 40 commits behind main (rebase)

Rule: ping author on conflicts and add 'needs-rebase' label (comment, label)

Rule: assign reviewer for tensorizer changes (assign)

Rule: assign reviewer for modelopt changes (assign)

Rule: remove 'needs-rebase' label when conflict is resolved (label)

Rule: label-bug (label)

Rule: label-kv-connector (label)

Re-running checks...