vllm main updates by prashantgupta24 · Pull Request #283 · vllm-project/vllm-spyre

prashantgupta24 · 2025-07-04T00:26:26Z

Description

This branch has a fix for:

Caching the token_ids (now the new tokens are cached in execute_model instead of update_states. This is because of [Optimization] Cache sampled token ids in model runner vllm#20291. )
Changes from the CachedRequestData (✨ Use shared CachedRequestData as vllm:main #273)

Related Issues

Fix for #271

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

We were previously reusing the GPU SamplingMetadata class but there have been incompatible changes upstream (PR vllm-project/vllm#16728) Since it's not clear for now whether we want, should or can reuse the LogitsProcessor implementation as is, I'm making a copy of the old version of the class for the spyre backend. This won't affect any features for now since the vllm change was an internal refactoring without UX impact. Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

…lm-main-updates

github-actions · 2025-07-04T00:26:34Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

yannicks1 · 2025-07-08T08:29:34Z

quick question before reviewing this one thoroughly:
Do we want to ensure backwards compatibility (I see a couple of copied files from upstream, but also failing test with missing make_empty attribute for CachedRequestData). If yes down to which version do we ensure compatibility, if not we should enforce a minimum version which does not fail the test...

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 · 2025-07-08T14:30:50Z

quick question before reviewing this one thoroughly: Do we want to ensure backwards compatibility (I see a couple of copied files from upstream, but also failing test with missing make_empty attribute for CachedRequestData). If yes down to which version do we ensure compatibility, if not we should enforce a minimum version which does not fail the test...

I think the general consensus is that we want to pin to vllm>=0.9.2, we were just waiting for vllm to release the version

joerunde · 2025-07-08T16:29:37Z

@yannicks1 Yeah we talked very briefly about it last Thursday, our assertion here is that our current delivery mechanism is images that bundle vllm-spyre with vllm, so we're the ones that have to worry about building images with compatible versions.

The small risk here is that there could be a regression in vllm 0.9.2 with some model or models that we then can't easily roll back, but we haven't matured this stack yet to GA support anywhere so there's currently no chance of that causing a product regression

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

joerunde · 2025-07-08T17:34:57Z

uv.lock

+]
+
+[[package]]
+name = "mlx-lm"


🍎🍎🍎

vllm_spyre/v1/worker/spyre_model_runner.py

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

@prashantgupta24

# Description Fixes a bug introduced in #283 where the decode pass was moved out of the warmup context. AKA: @prashantgupta24's editor likes to unindent things, and @joerunde likes to view diffs with whitespace changes ignored, so here we are 😅 --------- Signed-off-by: Joe Runde <joe@joerun.de>

# Description Fixes a bug introduced in #283 where the non-driver workers did not cache the output tokens for the next decode iteration. This also allows TP tests with TP=2 to run on cpu, so that we can catch these bugs on GHA runs. --------- Signed-off-by: Joe Runde <joe@joerun.de> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 and others added 21 commits July 1, 2025 10:34

🐛 req_ids is now a list

f7f6bb6

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🐛 req_ids is now a list

8f7acaf

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

♻️ using cached requests

06b8d7d

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🐛 first pass for sb

a064d3b

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🐛 first pass for cb

989f6d2

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

Merge remote-tracking branch 'upstream/main' into fix-upstream

1dcc582

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🐛 fix merge bug

bd7e008

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 renaming vars

fe8e64c

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🔥 remove commented code

df9214b

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🔥 remove extra commas

f065785

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

fix linting

425c3d2

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Actually more classes need to be duplicated

05ea423

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

import the right sampler

6e1f712

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix tests

c614eb1

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix tests

a3b37c4

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

🚧 wip to see if tests pass

d965338

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🚧 cache new tokens

7c82358

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🚧 cache new tokens works???

6f4c5de

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

test till logits processor commit

a4d610b

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

Merge remote-tracking branch 'upstream/fix_sampling_metadata' into vl…

1e08815

…lm-main-updates

prashantgupta24 added 4 commits July 3, 2025 17:27

revert to test from main

09c7c03

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🐛 fix for sb

c959648

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 fmt

efc467f

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 fmt

24a7653

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

prashantgupta24 changed the title ~~[WIP don't look] vllm main updates~~ vllm main updates Jul 5, 2025

prashantgupta24 marked this pull request as ready for review July 5, 2025 19:24

prashantgupta24 requested review from rafvasq and sducouedic as code owners July 5, 2025 19:24

prashantgupta24 requested review from nikolaospapandreou, tdoublep and yannicks1 as code owners July 5, 2025 19:24

prashantgupta24 requested a review from joerunde July 5, 2025 19:25

This was referenced Jul 7, 2025

Do not merge #280

Closed

✨ Use shared CachedRequestData as vllm:main #273

Closed

prashantgupta24 added 5 commits July 7, 2025 13:44

Merge remote-tracking branch 'upstream/main' into vllm-main-updates

335f461

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🔥 remove unused var

dfcbba4

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 add comment, remove unused var

6d61784

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 improve condition

9a5769f

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 add comment

b1a7c85

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

⬆️ bump to vllm>=0.9.2

6a5d3a6

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

Merge branch 'main' into vllm-main-updates

cda2775

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

joerunde reviewed Jul 8, 2025

View reviewed changes

uv.lock

]

[[package]]

name = "mlx-lm"

Copy link

Collaborator

joerunde Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍎🍎🍎

joerunde reviewed Jul 8, 2025

View reviewed changes

vllm_spyre/v1/worker/spyre_model_runner.py Outdated Show resolved Hide resolved

prashantgupta24 added 2 commits July 8, 2025 10:39

Merge remote-tracking branch 'upstream/main' into vllm-main-updates

e92a566

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

🎨 fix comments

daadc0e

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>

joerunde approved these changes Jul 8, 2025

View reviewed changes

joerunde merged commit 7e7ee37 into main Jul 8, 2025
18 checks passed

joerunde deleted the vllm-main-updates branch July 8, 2025 18:04

joerunde mentioned this pull request Jul 9, 2025

🐛 put decode back in warmup #293

Merged

joerunde mentioned this pull request Jul 10, 2025

🐛 fix tensor parallel #301

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm main updates#283

vllm main updates#283
joerunde merged 34 commits intomainfrom
vllm-main-updates

prashantgupta24 commented Jul 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

yannicks1 commented Jul 8, 2025

Uh oh!

prashantgupta24 commented Jul 8, 2025

Uh oh!

joerunde commented Jul 8, 2025

Uh oh!

joerunde Jul 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

prashantgupta24 commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Uh oh!

github-actions bot commented Jul 4, 2025

Uh oh!

yannicks1 commented Jul 8, 2025

Uh oh!

prashantgupta24 commented Jul 8, 2025

Uh oh!

joerunde commented Jul 8, 2025

Uh oh!

joerunde Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prashantgupta24 commented Jul 4, 2025 •

edited

Loading