Skip to content

Conversation

@ehsk
Copy link
Collaborator

@ehsk ehsk commented Dec 23, 2025

This PR upgrades vLLM from 0.8.5.post1 to 0.11.2. Other notable upgrades as a result of this change is torch upgraded to 2.9.0, transformers to 4.57.x and flash-attention to 2.8.3

The vLLM upgrade is needed for Apriel multi-modal training (#111), using new tool parsers, and newer models.

For weight updates in vLLM v1, I followed https://github.com/vllm-project/vllm/blob/v0.11.2/examples/offline_inference/rlhf_utils.py.
Also found a similar code in TRL.

@ehsk ehsk requested a review from rafapi December 23, 2025 14:57
@ehsk
Copy link
Collaborator Author

ehsk commented Jan 18, 2026

The results don't look like the old vLLM:

GSPO (blue=v0)

Reward Entropy AIME'24 MATH-500
image image image image

GRPO (orange=v0)

Reward Entropy AIME'24 MATH-500
image image image image

A main difference between vLLM v0 and v1 is that in v0, new requests will get blocked until weight update request fulfilled and in v1, new requests will go ahead (with a mix of old/new weights) during a weight update.

The logprobs are the same at the beginning but start to diverge (blue and green are v0):

image

Leave this PR open for now! And instead, upgrade vllm to a recent version that uses v0, see #122.

@ehsk ehsk mentioned this pull request Jan 19, 2026
@ehsk ehsk closed this pull request by merging all changes into main in 64073e3 Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants