Skip to content

✨ Use shared CachedRequestData as vllm:main#273

Closed
prashantgupta24 wants to merge 11 commits intomainfrom
fix-upstream
Closed

✨ Use shared CachedRequestData as vllm:main#273
prashantgupta24 wants to merge 11 commits intomainfrom
fix-upstream

Conversation

@prashantgupta24
Copy link
Copy Markdown
Collaborator

@prashantgupta24 prashantgupta24 commented Jul 1, 2025

Description

This is more complicated than I thought originally :)

Alright, all tests are passing locally. There seems to be another breaking change in vllm:main that will have to be addressed to make the main tests pass. The default tests fail because this code is not backward compatible yet 😅

Related Issues

fix #271

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 1, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@prashantgupta24 prashantgupta24 changed the title 🐛 req_ids is now a list in vllm:main 🐛 Use shared CachedRequestData as vllm:main Jul 1, 2025
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@prashantgupta24 prashantgupta24 changed the title 🐛 Use shared CachedRequestData as vllm:main ✨ Use shared CachedRequestData as vllm:main Jul 2, 2025
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Comment thread vllm_spyre/v1/worker/spyre_worker.py
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@maxdebayser
Copy link
Copy Markdown
Collaborator

@prashantgupta24 , I think the other breaking change that you mentioned is the sampling metadata one, right? I've opened a hacky PR to temporarily fix this: #278

@prashantgupta24
Copy link
Copy Markdown
Collaborator Author

@prashantgupta24 , I think the other breaking change that you mentioned is the sampling metadata one, right? I've opened a hacky PR to temporarily fix this: #278

Yep, thanks!

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@prashantgupta24 prashantgupta24 mentioned this pull request Jul 4, 2025
@prashantgupta24
Copy link
Copy Markdown
Collaborator Author

closing in favor of #283

joerunde pushed a commit that referenced this pull request Jul 8, 2025
# Description

This branch has a fix for:
- Caching the token_ids (now the new tokens are cached in
`execute_model` instead of `update_states`. This is because of
vllm-project/vllm#20291. )
- Changes from the `CachedRequestData`
(#273)

## Related Issues

Fix for #271

---------

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
@prashantgupta24 prashantgupta24 deleted the fix-upstream branch March 5, 2026 17:01
rafvasq pushed a commit to rafvasq/sendnn-inference that referenced this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix plugin code agains vLLM main

2 participants