Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Mar 12, 2025

Bumps vllm from 0.7.2 to 0.7.3.

Release notes

Sourced from vllm's releases.

v0.7.3

Highlights

🎉 253 commits from 93 contributors, including 29 new contributors!

  • Deepseek enhancements:
    • Support for DeepSeek Multi-Token Prediction, 1.69x speedup in low QPS scenarios (#12755)
    • AMD support: DeepSeek tunings, yielding 17% latency reduction (#13199)
    • Using FlashAttention3 for MLA (#12807)
    • Align the expert selection code path with official implementation (#13474)
    • Optimize moe_align_block_size for deepseek_v3 (#12850)
    • Expand MLA to support most types of quantization (#13181)
  • V1 Engine:
    • LoRA Support (#10957, #12883)
    • Logprobs and prompt logprobs support (#9880), min_p sampling support (#13191), logit_bias in v1 Sampler (#13079)
    • Use msgpack for core request serialization (#12918)
    • Pipeline parallelism support (#12996, #13353, #13472, #13417, #13315)
    • Metrics enhancements: GPU prefix cache hit rate % gauge (#12592), iteration_tokens_total histogram (#13288), several request timing histograms (#12644)
    • Initial speculative decoding support with ngrams (#12193, #13365)

Model Support

  • Enhancement to Qwen2.5-VL: BNB support (#12944), LoRA (#13261), Optimizations (#13155)
  • Support GPTQModel Dynamic [2,3,4,8]bit GPTQ quantization (#7086)
  • Support Unsloth Dynamic 4bit BnB quantization (#12974)
  • IBM/NASA Prithvi Geospatial model (#12830)
  • Support Mamba2 (Codestral Mamba) (#9292), Bamba Model (#10909)
  • Ultravox Model: Support v0.5 Release (#12912)
  • transformers backend
    • Enable quantization support for transformers backend (#12960)
    • Set torch_dtype in TransformersModel (#13088)
  • VLM:
    • Implement merged multimodal processor for Mllama (#11427), GLM4V (#12449), Molmo (#12966)
    • Separate text-only and vision variants of the same model architecture (#13157)

Hardware Support

  • Pluggable platform-specific scheduler (#13161)
  • NVIDIA: Support nvfp4 quantization (#12784)
  • AMD:
    • Per-Token-Activation Per-Channel-Weight FP8 (#12501)
    • Tuning for Mixtral on MI325 and Qwen MoE on MI300 (#13503), Mixtral8x7B on MI300 (#13577)
    • Add intial ROCm support to V1 (#12790)
  • TPU: V1 Support (#13049)
  • Neuron: Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency (#12921)
  • Gaudi:
    • Support Contiguous Cache Fetch (#12139)
    • Enable long-contexts + LoRA support (#12812)

Engine Feature

  • Add sleep and wake up endpoint and v1 support (#12987)
  • Add /v1/audio/transcriptions OpenAI API endpoint (#12909)

... (truncated)

Changelog

Sourced from vllm's changelog.

Releasing vLLM

vLLM releases offer a reliable version of the code base, packaged into a binary format that can be conveniently accessed via PyPI. These releases also serve as key milestones for the development team to communicate with the community about newly available features, improvements, and upcoming changes that could affect users, including potential breaking changes.

Release Versioning

vLLM uses a “right-shifted” versioning scheme where a new patch release is out every 2 weeks. And patch releases contain features and bug fixes (as opposed to semver where patch release contains only backwards-compatible bug fixes). When critical fixes need to be made, special release post1 is released.

  • major major architectural milestone and when incompatible API changes are made, similar to PyTorch 2.0.
  • minor major features
  • patch features and backwards-compatible bug fixes
  • post1 or patch-1 backwards-compatible bug fixes, either explicit or implicit post release

Release Cadence

Patch release is released on bi-weekly basis. Post release 1-3 days after patch release and uses same branch as patch release. Following is the release cadence for year 2025. All future release dates below are tentative. Please note: Post releases are optional.

Release Date Patch release versions Post Release versions
Jan 2025 0.7.0 ---
Feb 2025 0.7.1, 0.7.2, 0.7.3 ---
Mar 2025 0.7.4, 0.7.5 ---
Apr 2025 0.7.6, 0.7.7 ---
May 2025 0.7.8, 0.7.9 ---
Jun 2025 0.7.10, 0.7.11 ---
Jul 2025 0.7.12, 0.7.13 ---
Aug 2025 0.7.14, 0.7.15 ---
Sep 2025 0.7.16, 0.7.17 ---
Oct 2025 0.7.18, 0.7.19 ---
Nov 2025 0.7.20, 0.7.21 ---
Dec 2025 0.7.22, 0.7.23 ---

Release branch

Each release is built from a dedicated release branch.

  • For major, minor, patch releases, the release branch cut is performed 1-2 days before release is live.
  • For post releases, previously cut release branch is reused
  • Release builds are triggered via push to RC tag like vX.Y.Z-rc1 . This enables us to build and test multiple RCs for each release.
  • Final tag : vX.Y.Z does not trigger the build but used for Release notes and assets.
  • After branch cut is created we monitor the main branch for any reverts and apply these reverts to a release branch.

Release Cherry-Pick Criteria

After branch cut, we approach finalizing the release branch with clear criteria on what cherry picks are allowed in. Note: a cherry pick is a process to land a PR in the release branch after branch cut. These are typically limited to ensure that the team has sufficient time to complete a thorough round of testing on a stable code base.

  • Regression fixes - that address functional/performance regression against the most recent release (e.g. 0.7.0 for 0.7.1 release)
  • Critical fixes - critical fixes for severe issue such as silent incorrectness, backwards compatibility, crashes, deadlocks, (large) memory leaks
  • Fixes to new features introduced in the most recent release (e.g. 0.7.0 for 0.7.1 release)

... (truncated)

Commits

Dependabot compatibility score

You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.7.2...v0.7.3)

---
updated-dependencies:
- dependency-name: vllm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 12, 2025
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github May 7, 2025

Superseded by #62.

@dependabot dependabot bot closed this May 7, 2025
@dependabot dependabot bot deleted the dependabot/pip/vllm-0.7.3 branch May 7, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant