feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when `TP==1` #773

zhandaz · 2025-07-28T21:09:34Z

What does this PR do ?

tldr: Support top-k and top-p for dtensor worker with vLLM v0. This pr only supports tp==1, the pr for tp>1 is in #774.

By default, Nemo-RL is using vLLM v1. According to https://docs.vllm.ai/en/v0.8.1/getting_started/v1_user_guide.html#semantic-changes-and-deprecated-features, vLLM v1 returns the model’s raw output (i.e. before applying any logits post-processing such as temperature scaling or penalty adjustments). Therefore, we don't need to support temperature, top-k, or top-p.
If we want to support Top-k and Top-p in vLLM v0, then it means the training logic of using vLLM v0 and v1 would be different. I.e., in v0, the loss is calculated from the processed logits while in v1 the loss is calculated from the raw logits.
This pr assumes we want to make it possible. But supporting Top-k and Top-p with tp_size > 1 would need changes on the distributed functions. Because Top-k/top-p filtering requires seeing the full vocabulary to determine which tokens to keep. From what I can tell, we need to change DistributedLogprob or _compute_distributed_log_softmax to support p and k. We have another PR for this case so that people can easily revert if the changes on distributed functions are considered not elegant.

For the detailed implementation, we add our implementation based on vLLM's: https://github.com/vllm-project/vllm/blob/34a20c49b3f81f64133428b3a0d62309db1256f9/vllm/v1/sample/ops/topk_topp_sampler.py.

Issues

Related Issue: #69

…_only

Copilot

Pull Request Overview

This PR adds support for top-k and top-p sampling in the dtensor worker when using vLLM v0 engine with tensor parallelism size equal to 1. The implementation includes the sampling logic and comprehensive test coverage.

Key changes:

Implements apply_top_k_top_p and apply_top_k_only functions for sampling logits
Updates dtensor policy worker to apply sampling parameters after temperature scaling for vLLM v0
Adds comprehensive unit tests for the sampling utilities

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
nemo_rl/models/policy/utils.py	Implements top-k and top-p sampling functions based on vLLM's approach
nemo_rl/models/policy/dtensor_policy_worker.py	Integrates sampling into logits post-processing with TP=1 restriction
tests/unit/models/policy/test_utils.py	Adds comprehensive test suite for sampling utilities

nemo_rl/models/policy/utils.py

SahilJain314 · 2025-07-28T22:03:01Z

thanks for adding this! With the upcoming vllm 0.10.0, we should be able to get v1 to output post-processed logprobs as well.

zhandaz · 2025-07-28T22:06:22Z

@SahilJain314 Thanks for pointing this out. In that case, I will wait for the pr for upgrading to vllm=0.10.0 to be merged, and then I will add the option to choose the log processing methods.

Signed-off-by: Zhanda Zhu <[email protected]>

zhandaz · 2025-08-05T21:03:16Z

@SahilJain314 @YUki-666

Since vllm has been upgraded to 0.10.0 in #766, I am looking into continuing this PR to support logprobs_mode.

However, I found that the meaning of logprobs_mode seems different from what we have understood: regardless of the value of logprobs_mode, the returned logprobs are calculated before temperature, top_k, and top_p.

Specifically, in https://github.com/vllm-project/vllm/blob/v0.10.0/vllm/v1/sample/sampler.py#L28-L91, if processed_logprobs is set, then the returned logprobs are after allowed_token_ids_processing, bad_words_exclusion, logits_processors, and penalties. However, temperature, top_k, and top_p are applied in the sample method, which does not affect the returned logprobs.

From my perspective, it is not possible to re-apply sampling parameters like temperature, top_k, or top_p to the returned_logprobs. These sampling methods require a full probability distribution across the entire vocabulary to function. Since the returned log probability is only for the single token chosen during the generation sampling step, the information about the rest of the distribution has been irreversibly lost.

Please correct me if I'm wrong. What are your thoughts on how we should proceed? The most effective solution is to request a feature that returns the after-sampling logprobs.

cc: @wangshangsam

yuki-97 · 2025-08-06T13:39:21Z

@Dazz993 I think you are right that in vllm==0.10.0 v1 engine:

all the logprobs_mode (raw_logprobs, raw_logits, processed_logprobs, processed_logits) don't apply the sample stage and not match what we want.
top_k & top_p need the entire vocabulary. I think it's too expensive to take all the vocabulary logit from vllm to calculate top_k & top_p at train side and we won't do that.

the way I can think for now is to have a vllm patch to get the processed_logprobs we really want.

replace Sampler.forward and Sampler.sample in sampler.py.
replace TopKTopPSampler in topk_topp_sampler.py to return logit.

also we may ask vllm team if they could support this in the future.

wdyt? @Dazz993 @SahilJain314

yuki-97 · 2025-08-06T13:41:47Z

BTW, two questions:

will these things affect logits? as far as I know we don't measure them at training side.
- allowed_token_ids_processing, bad_words_exclusion, logits_processors, and penalties
what logits or logprobs does vllm v0 engine return? is that what we want?

zhandaz · 2025-08-06T15:27:42Z

@YUki-666 Thanks for your comments!

top_k & top_p need the entire vocabulary

Yes. Temperature also requires the entire vocabulary, since $logprobs_i = \dfrac{exp(x_i / \tau)}{\sum_j exp(x_j/ \tau)}$.

the way I can think for now is to have a vllm patch to get the processed_logprobs we really want.

replace Sampler.forward and Sampler.sample in sampler.py.

replace TopKTopPSampler in topk_topp_sampler.py to return logit.

Yes, that is exactly what I am thinking.

will these things affect logits? as far as I know we don't measure them at training side. allowed_token_ids_processing, bad_words_exclusion, logits_processors, and penalties

Not sure about the exact behavior for different recipes. In our GenerationConfig, we don't provide fields to modify these SamplingMetadata. If vLLM doesn't process them by default, then it should be fine. Can investigate further if needed.

what logits or logprobs does vllm v0 engine return? is that what we want?

According to https://github.com/vllm-project/vllm/blob/v0.10.0/vllm/model_executor/layers/sampler.py#L219-L328, I think it is returning the final logprobs (sampling metadata applied).

zhandaz · 2025-08-07T18:05:15Z

Update: I found this PR to get the final logprobs from vllm: vllm-project/vllm#22387

22quinn · 2025-08-07T22:02:11Z

Update: I found this PR to get the final logprobs from vllm: vllm-project/vllm#22387

Thank you for the interest! That PR will apply not only top-k & top-p, but also temperature & min-p. Does that work for your use case?

zhandaz · 2025-08-07T23:08:20Z

Update: I found this PR to get the final logprobs from vllm: vllm-project/vllm#22387

Thank you for the interest! That PR will apply not only top-k & top-p, but also temperature & min-p. Does that work for your use case?

Yep! We want the final logprobs just before the sampling. Thank you for your PR!

wangshangsam · 2025-08-19T19:51:27Z

@zhandaz Just to confirm - the state of this PR right now is that it's waiting for the vLLM 0.11 release?

zhandaz · 2025-08-20T00:34:03Z

@wangshangsam

Just to confirm - the state of this PR right now is that it's waiting for the vLLM 0.11 release?

Yep. Unless we want to another more complicated file patch, or we have to wait for the vLLM to support this first.

terrykong · 2025-09-25T16:55:02Z

@zhandaz is 0.10.2 new enough?

jseppanen · 2025-10-07T07:48:20Z

@terrykong @zhandaz vllm 0.10.2 seems to include PR 22387: https://github.com/vllm-project/vllm/releases/tag/v0.10.2

zhandaz added 4 commits July 28, 2025 13:24

feat: Add top_k_top_p util functions

b3e0ef2

feat: Support top-k and top-p sampling for dtensor with vLLM v0

cf703f6

fix: Consider the cases where top_p and top_k are trivial values

2c8b5ec

fix: Consider the cases where top_k is trivial values for apply_top_k…

dd62111

…_only

zhandaz requested a review from Copilot July 28, 2025 21:09

zhandaz self-assigned this Jul 28, 2025

zhandaz added the enhancement New feature or request label Jul 28, 2025

Copilot AI reviewed Jul 28, 2025

View reviewed changes

nemo_rl/models/policy/utils.py Outdated Show resolved Hide resolved

nemo_rl/models/policy/utils.py Show resolved Hide resolved

nemo_rl/models/policy/utils.py Show resolved Hide resolved

zhandaz mentioned this pull request Jul 28, 2025

[2/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP>1 #774

Draft

4 tasks

wangshangsam linked an issue Jul 29, 2025 that may be closed by this pull request

Top-p/Top-k Sampling Params handling in VLLM v1 #69

Open

zhandaz marked this pull request as draft July 30, 2025 15:08

SahilJain314 mentioned this pull request Jul 31, 2025

chore: upgrade vllm to v0.10.0 #766

Merged

zhandaz and others added 2 commits August 5, 2025 12:24

fix: Improve the test cases

58c1e00

Merge branch 'main' into zhanda/top-p-k

ef22edb

Signed-off-by: Zhanda Zhu <[email protected]>

zhandaz changed the title ~~[1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1~~ feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1 Aug 5, 2025

zhandaz marked this pull request as ready for review August 5, 2025 19:30

zhandaz mentioned this pull request Aug 5, 2025

Top-p/Top-k Sampling Params handling in VLLM v1 #69

Open

This was referenced Aug 12, 2025

Fix and test accuracies related to temperature and vllm final returned logprob #902

Closed

fix: fix temperature-related issues #935

Merged

terrykong mentioned this pull request Oct 12, 2025

chore: major version bump (torch 2.8, vllm 0.11, ray 2.49) & SP fixes #1334

Merged

feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1 #773

Are you sure you want to change the base?

feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1 #773

Uh oh!

Conversation

zhandaz commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SahilJain314 commented Jul 28, 2025

Uh oh!

zhandaz commented Jul 28, 2025

Uh oh!

zhandaz commented Aug 5, 2025

Uh oh!

yuki-97 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuki-97 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhandaz commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhandaz commented Aug 7, 2025

Uh oh!

22quinn commented Aug 7, 2025

Uh oh!

zhandaz commented Aug 7, 2025

Uh oh!

wangshangsam commented Aug 19, 2025

Uh oh!

zhandaz commented Aug 20, 2025

Uh oh!

terrykong commented Sep 25, 2025

Uh oh!

jseppanen commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when `TP==1` #773

feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when `TP==1` #773

zhandaz commented Jul 28, 2025 •

edited

Loading

yuki-97 commented Aug 6, 2025 •

edited

Loading

yuki-97 commented Aug 6, 2025 •

edited

Loading

zhandaz commented Aug 6, 2025 •

edited

Loading