Skip to content

vllm main updates#283

Merged
joerunde merged 34 commits intomainfrom
vllm-main-updates
Jul 8, 2025
Merged

vllm main updates#283
joerunde merged 34 commits intomainfrom
vllm-main-updates

Conversation

@prashantgupta24
Copy link
Collaborator

@prashantgupta24 prashantgupta24 commented Jul 4, 2025

Description

This branch has a fix for:

Related Issues

Fix for #271

prashantgupta24 and others added 21 commits July 1, 2025 10:34
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
We were previously reusing the GPU SamplingMetadata
class but there have been incompatible changes upstream
(PR vllm-project/vllm#16728)

Since it's not clear for now whether we want, should
or can reuse the LogitsProcessor implementation as is,
I'm making a copy of the old version of the class for
the spyre backend.

This won't affect any features for now since the vllm
change was an internal refactoring without UX impact.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@github-actions
Copy link

github-actions bot commented Jul 4, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@prashantgupta24 prashantgupta24 changed the title [WIP don't look] vllm main updates vllm main updates Jul 5, 2025
@prashantgupta24 prashantgupta24 marked this pull request as ready for review July 5, 2025 19:24
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@yannicks1
Copy link
Collaborator

quick question before reviewing this one thoroughly:
Do we want to ensure backwards compatibility (I see a couple of copied files from upstream, but also failing test with missing make_empty attribute for CachedRequestData). If yes down to which version do we ensure compatibility, if not we should enforce a minimum version which does not fail the test...

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@prashantgupta24
Copy link
Collaborator Author

quick question before reviewing this one thoroughly: Do we want to ensure backwards compatibility (I see a couple of copied files from upstream, but also failing test with missing make_empty attribute for CachedRequestData). If yes down to which version do we ensure compatibility, if not we should enforce a minimum version which does not fail the test...

I think the general consensus is that we want to pin to vllm>=0.9.2, we were just waiting for vllm to release the version

@joerunde
Copy link
Collaborator

joerunde commented Jul 8, 2025

@yannicks1 Yeah we talked very briefly about it last Thursday, our assertion here is that our current delivery mechanism is images that bundle vllm-spyre with vllm, so we're the ones that have to worry about building images with compatible versions.

The small risk here is that there could be a regression in vllm 0.9.2 with some model or models that we then can't easily roll back, but we haven't matured this stack yet to GA support anywhere so there's currently no chance of that causing a product regression

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
]

[[package]]
name = "mlx-lm"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍎🍎🍎

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@joerunde joerunde merged commit 7e7ee37 into main Jul 8, 2025
18 checks passed
@joerunde joerunde deleted the vllm-main-updates branch July 8, 2025 18:04
yannicks1 pushed a commit that referenced this pull request Jul 9, 2025
# Description

Fixes a bug introduced in #283 where the decode pass was moved out of
the warmup context.

AKA: @prashantgupta24's editor likes to unindent things, and @joerunde
likes to view diffs with whitespace changes ignored, so here we are 😅

---------

Signed-off-by: Joe Runde <joe@joerun.de>
prashantgupta24 added a commit that referenced this pull request Jul 11, 2025
# Description

Fixes a bug introduced in #283 where the non-driver workers did not
cache the output tokens for the next decode iteration.

This also allows TP tests with TP=2 to run on cpu, so that we can catch
these bugs on GHA runs.

---------

Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants