-
Notifications
You must be signed in to change notification settings - Fork 15
[do-not-merge] Ibm 20241105 #218
Conversation
Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Michael Goin <[email protected]>
…vllm-project#8704) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
…cript (vllm-project#9013) Co-authored-by: Isotr0py <[email protected]>
…ect#9056) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Andrew Feldman <[email protected]> Co-authored-by: afeldman-nm <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: laishzh <[email protected]> Co-authored-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…sage (vllm-project#9352) Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
…ation and embedding (vllm-project#9424)
Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal>
…r cores (vllm-project#9497) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Cody Yu <[email protected]>
Signed-off-by: Linkun Chen <[email protected]> Co-authored-by: Linkun Chen <[email protected]> Co-authored-by: Linkun Chen <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
…0007) Signed-off-by: youkaichao <[email protected]>
…ct#9895) Signed-off-by: mgoin <[email protected]>
…m-project#9994) Signed-off-by: Tyler Michael Smith <[email protected]>
…osable (vllm-project#9604) Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
…llm-project#10017) Signed-off-by: chaunceyjiang <[email protected]>
…20241105 Signed-off-by: Jefferson Fialho <[email protected]>
Signed-off-by: Jefferson Fialho <[email protected]>
Signed-off-by: Jefferson Fialho <[email protected]>
Signed-off-by: Jefferson Fialho <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fialhocoelho The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Jefferson Fialho <[email protected]>
Signed-off-by: Jefferson Fialho <[email protected]> Pin tiktoken >=0.7.0, <0.8.0 Signed-off-by: Jefferson Fialho <[email protected]> Pin tiktoken==0.7.0 Signed-off-by: Jefferson Fialho <[email protected]> Pin pillow==10.4.0 Signed-off-by: Jefferson Fialho <[email protected]> pin pytorch in cmake list Signed-off-by: Jefferson Fialho <[email protected]>
Signed-off-by: Jefferson Fialho <[email protected]>
Signed-off-by: Jefferson Fialho <[email protected]>
48e916c to
9b51d3d
Compare
|
@fialhocoelho: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…218) * Automatically set rpd env var with profile flag * Add readme * Fix lint errors --------- Co-authored-by: AdrianAbeyta <[email protected]>
Syncing midstream NM fork to Upstream tag of [v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) + cherry pick of vllm-project@be633fb needed for benchmarks + [CP](neuralmagic/nm-vllm-ent@1fe447d) for compressed tensor bump + [CP](vllm-project#17677) for lora on AMD + [CP](vllm-project#17315) for llama4 w/ pure dense layers ``` commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5) Author: Chauncey <[email protected]> Date: Wed Apr 30 15:11:04 2025 +0800 [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]> commit f8db0bd Author: Lucas Wilkinson <[email protected]> Date: Fri May 2 14:01:38 2025 -0400 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <[email protected]> commit e335c34 Author: Robert Shaw <[email protected]> Date: Fri May 2 04:07:03 2025 -0400 [BugFix] Fix Memory Leak (vllm-project#17567) Signed-off-by: [email protected] <[email protected]> commit cc463fe Merge: 1e358ff ba41cc9 Author: Selbi Nuryyeva <[email protected]> Date: Tue Apr 29 12:34:57 2025 -0400 Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5 commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5) Author: Michael Goin <[email protected]> Date: Mon Apr 28 16:20:24 2025 -0600 [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <[email protected]> commit dcbac4c Author: Simon Mo <[email protected]> Date: Mon Apr 28 14:12:01 2025 -0700 [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <[email protected]> [...] ``` Commands ``` git fetch upstream git checkout -b upstream-v0.8.5 git merge upstream/v0.8.5 git cherry-pick be633fb ``` TEST PLAN accept sync: https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552 related PR in cicd: neuralmagic/nm-cicd#99 release workflow: https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864
Image build.