Skip to content

fix memory for online fp8 quantization with streaming weight load

5bf77bd
Select commit
Loading
Failed to load commit list.
Merged

fix memory for online fp8 quantization with streaming weight load #31914

fix memory for online fp8 quantization with streaming weight load
5bf77bd
Select commit
Loading
Failed to load commit list.
Mergify / Summary succeeded Feb 2, 2026 in 1s

1 rule matches and 27 potential rules

⚠️ The pull request has been merged by @mgoin

Rule: label-documentation (comment, label)

  • any of:
    • files~=^[^/]+\.md$
    • files~=^docs/
    • files~=^examples/
  • label != stale

Rule: comment-pre-commit-failure (comment)

  • -closed
  • status-failure=pre-commit
  • -draft

Rule: comment-dco-failure (comment)

  • -closed
  • status-failure=dco
  • -draft

Rule: label-ci-build (label)

  • any of:
    • files=CMakeLists.txt
    • files=setup.py
    • files~=\.buildkite/
    • files~=^\.github/
    • files~=^cmake/
    • files~=^docker/Dockerfile
    • files~=^requirements.*\.txt
  • label != stale

Rule: label-deepseek (label)

  • any of:
    • files~=^examples/.*deepseek.*\.py
    • files~=^tests/.*deepseek.*\.py
    • files~=^vllm/entrypoints/openai/tool_parsers/.*deepseek.*\.py
    • files~=^vllm/model_executor/models/.*deepseek.*\.py
    • files~=^vllm/reasoning/.*deepseek.*\.py
    • files~=^vllm/transformers_utils/.*deepseek.*\.py
    • title~=(?i)DeepSeek
  • label != stale

Rule: label-frontend (label)

  • files~=^vllm/entrypoints/
  • label != stale

Rule: label-llama (label)

  • any of:
    • files~=^examples/.*llama.*\.py
    • files~=^tests/.*llama.*\.py
    • files~=^vllm/entrypoints/openai/tool_parsers/llama.*\.py
    • files~=^vllm/model_executor/models/.*llama.*\.py
    • files~=^vllm/transformers_utils/configs/.*llama.*\.py
    • title~=(?i)llama
  • label != stale

Rule: label-multi-modality (label)

  • any of:
    • files=tests/models/test_vision.py
    • files~=^tests/models/multimodal/
    • files~=^tests/multimodal/
    • files~=^vllm/multimodal/
  • label != stale

Rule: label-new-model (label)

  • all of:
    • files=vllm/model_executor/models/registry.py
    • files~=^vllm/model_executor/models/
  • label != stale

Rule: label-performance (label)

  • any of:
    • files~=^\.buildkite/performance-benchmarks/
    • files~=^benchmarks/
    • files~=^tests/benchmarks/
    • files~=^vllm/benchmarks/
  • label != stale

Rule: label-qwen (label)

  • any of:
    • files~=^examples/.*qwen.*\.py
    • files~=^tests/.*qwen.*\.py
    • files~=^vllm/model_executor/models/.*qwen.*\.py
    • files~=^vllm/reasoning/.*qwen.*\.py
    • title~=(?i)Qwen
  • label != stale

Rule: label-gpt-oss (label)

  • any of:
    • files~=^examples/.*gpt[-_]?oss.*\.py
    • files~=^tests/.*gpt[-_]?oss.*\.py
    • files~=^tests/entrypoints/openai/test_response_api_with_harmony.py
    • files~=^tests/entrypoints/test_context.py
    • files~=^vllm/entrypoints/context.py
    • files~=^vllm/entrypoints/openai/parser/harmony_utils.py
    • files~=^vllm/entrypoints/tool.py
    • files~=^vllm/entrypoints/tool_server.py
    • files~=^vllm/model_executor/layers/.*gpt[-_]?oss.*\.py
    • files~=^vllm/model_executor/models/.*gpt[-_]?oss.*\.py
    • title~=(?i)gpt[-_]?oss
    • title~=(?i)harmony
  • label != stale

Rule: label-nvidia (label)

  • any of:
    • files~=cuda
    • files~=cutlass
    • files~=flashinfer
    • files~=trtllm
    • title~=(?i)CUDA
    • title~=(?i)CUTLASS
    • title~=(?i)NVIDIA
  • label != stale

Rule: label-rocm (label)

  • any of:
    • files=vllm/platforms/rocm.py
    • files~=^csrc/rocm/
    • files~=^docker/Dockerfile.rocm
    • files~=^requirements/rocm.*\.txt
    • files~=^tests/kernels/.*_rocm.*\.py
    • files~=^vllm/model_executor/layers/fused_moe/rocm.*\.py
    • files~=^vllm/v1/attention/backends/mla/rocm.*\.py
    • files~=^vllm/v1/attention/backends/rocm.*\.py
    • files~=^vllm/v1/attention/ops/rocm.*\.py
    • title~=(?i)AMD
    • title~=(?i)ROCm
  • label != stale

Rule: label-cpu (assign, label)

  • files~=^(?!.*kv_offload)(?!.*cpu_offload).*\bcpu.*
  • label != stale

Rule: label-structured-output (label)

  • any of:
    • files=benchmarks/benchmark_serving_structured_output.py
    • files=benchmarks/run_structured_output_benchmark.sh
    • files=docs/features/structured_outputs.md
    • files=examples/offline_inference/structured_outputs.py
    • files=examples/online_serving/openai_chat_completion_structured_outputs.py
    • files=examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py
    • files=tests/v1/entrypoints/llm/test_struct_output_generate.py
    • files~=^benchmarks/structured_schemas/
    • files~=^tests/v1/structured_output/
    • files~=^vllm/v1/structured_output/
  • label != stale

Rule: label-speculative-decoding (label)

  • any of:
    • files=vllm/model_executor/models/mlp_speculator.py
    • files~=^examples/.*(spec_decode|mlpspeculator|eagle|speculation).*\.py
    • files~=^tests/v1/spec_decode/
    • files~=^vllm/model_executor/models/.*eagle.*\.py
    • files~=^vllm/transformers_utils/configs/(eagle|medusa|mlp_speculator)\.py
    • files~=^vllm/v1/spec_decode/
  • label != stale

Rule: label-v1 (label)

  • any of:
    • files~=^tests/v1/
    • files~=^vllm/v1/
  • label != stale

Rule: label-tpu (label)

  • any of:
    • files~=/tpu/
    • files~=_tpu
    • files~=pallas
    • files~=tpu.py
    • files~=tpu_
  • label != stale

✅ Rule: label-tpu-remove (label)

  • label != stale
  • all of:
    • -files~=/tpu/
    • -files~=_tpu
    • -files~=pallas
    • -files~=tpu.py
    • -files~=tpu_

Rule: label-tool-calling (label)

  • any of:
    • files=docs/features/tool_calling.md
    • files=examples/offline_inference/chat_with_tools.py
    • files=examples/online_serving/openai_chat_completion_client_with_tools.py
    • files=examples/online_serving/openai_chat_completion_client_with_tools_required.py
    • files=examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py
    • files=tests/entrypoints/openai/test_chat_with_tool_reasoning.py
    • files~=^examples/tool_chat_*
    • files~=^tests/entrypoints/openai/tool_parsers/
    • files~=^tests/tool_use/
    • files~=^vllm/entrypoints/openai/tool_parsers/
  • label != stale

Rule: auto-rebase if approved, ready, and 40 commits behind main (rebase)

  • #commits-behind >= 40
  • -closed
  • -closed [📌 rebase requirement]
  • any of:
    • #commits-behind > 0 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]
  • #approved-reviews-by >= 1
  • -conflict
  • -conflict [📌 rebase requirement]
  • -draft
  • base = main
  • label=ready
  • queue-position = -1 [📌 rebase requirement]

Rule: ping author on conflicts and add 'needs-rebase' label (comment, label)

  • -closed
  • conflict
  • label != stale

Rule: assign reviewer for tensorizer changes (assign)

  • any of:
    • files~=^tests/entrypoints/openai/test_tensorizer_entrypoint.py
    • files~=^tests/model_executor/model_loader/tensorizer_loader/
    • files~=^vllm/model_executor/model_loader/tensorizer.py
    • files~=^vllm/model_executor/model_loader/tensorizer_loader.py
  • label != stale

Rule: assign reviewer for modelopt changes (assign)

  • any of:
    • files~=^docs/features/quantization/modelopt\.md$
    • files~=^tests/models/quantization/test_modelopt\.py$
    • files~=^tests/models/quantization/test_nvfp4\.py$
    • files~=^tests/quantization/test_modelopt\.py$
    • files~=^vllm/model_executor/layers/quantization/__init__\.py$
    • files~=^vllm/model_executor/layers/quantization/modelopt\.py$
  • label != stale

Rule: remove 'needs-rebase' label when conflict is resolved (label)

  • -closed
  • -conflict

Rule: label-bug (label)

  • any of:
    • title~=(?i)\bbug\b
    • title~=(?i)\bbugfix\b
  • label != stale

Rule: label-kv-connector (label)

  • any of:
    • files~=^examples/offline_inference/disaggregated[^/]*/.*
    • files~=^examples/online_serving/disaggregated[^/]*/.*
    • files~=^examples/others/lmcache/
    • files~=^tests/v1/kv_connector/
    • files~=^vllm/distributed/kv_transfer/
    • title~=(?i)LMCache
    • title~=(?i)NIXL
    • title~=(?i)\bP/?D\b
  • label != stale

💖  Mergify is proud to provide this service for free to open source projects.

🚀  You can help us by becoming a sponsor!


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com