Skip to content

Conversation

@zhandaz
Copy link
Contributor

@zhandaz zhandaz commented Jul 28, 2025

What does this PR do ?

tldr: Support top-k and top-p for dtensor worker with vLLM v0. This pr only supports tp==1, the pr for tp>1 is in #774.

  1. By default, Nemo-RL is using vLLM v1. According to https://docs.vllm.ai/en/v0.8.1/getting_started/v1_user_guide.html#semantic-changes-and-deprecated-features, vLLM v1 returns the model’s raw output (i.e. before applying any logits post-processing such as temperature scaling or penalty adjustments). Therefore, we don't need to support temperature, top-k, or top-p.
  2. If we want to support Top-k and Top-p in vLLM v0, then it means the training logic of using vLLM v0 and v1 would be different. I.e., in v0, the loss is calculated from the processed logits while in v1 the loss is calculated from the raw logits.
  3. This pr assumes we want to make it possible. But supporting Top-k and Top-p with tp_size > 1 would need changes on the distributed functions. Because Top-k/top-p filtering requires seeing the full vocabulary to determine which tokens to keep. From what I can tell, we need to change DistributedLogprob or _compute_distributed_log_softmax to support p and k. We have another PR for this case so that people can easily revert if the changes on distributed functions are considered not elegant.

For the detailed implementation, we add our implementation based on vLLM's: https://github.com/vllm-project/vllm/blob/34a20c49b3f81f64133428b3a0d62309db1256f9/vllm/v1/sample/ops/topk_topp_sampler.py.

Issues

Related Issue: #69

@zhandaz zhandaz requested a review from Copilot July 28, 2025 21:09
@zhandaz zhandaz self-assigned this Jul 28, 2025
@zhandaz zhandaz added the enhancement New feature or request label Jul 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for top-k and top-p sampling in the dtensor worker when using vLLM v0 engine with tensor parallelism size equal to 1. The implementation includes the sampling logic and comprehensive test coverage.

Key changes:

  • Implements apply_top_k_top_p and apply_top_k_only functions for sampling logits
  • Updates dtensor policy worker to apply sampling parameters after temperature scaling for vLLM v0
  • Adds comprehensive unit tests for the sampling utilities

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
nemo_rl/models/policy/utils.py Implements top-k and top-p sampling functions based on vLLM's approach
nemo_rl/models/policy/dtensor_policy_worker.py Integrates sampling into logits post-processing with TP=1 restriction
tests/unit/models/policy/test_utils.py Adds comprehensive test suite for sampling utilities

@SahilJain314
Copy link
Contributor

thanks for adding this! With the upcoming vllm 0.10.0, we should be able to get v1 to output post-processed logprobs as well.

@zhandaz
Copy link
Contributor Author

zhandaz commented Jul 28, 2025

@SahilJain314 Thanks for pointing this out. In that case, I will wait for the pr for upgrading to vllm=0.10.0 to be merged, and then I will add the option to choose the log processing methods.

@wangshangsam wangshangsam linked an issue Jul 29, 2025 that may be closed by this pull request
@zhandaz zhandaz marked this pull request as draft July 30, 2025 15:08
@zhandaz zhandaz changed the title [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1 feat: [1/2] Top-k and Top-p support for dtensor worker with vLLM V0 when TP==1 Aug 5, 2025
@zhandaz zhandaz marked this pull request as ready for review August 5, 2025 19:30
@zhandaz
Copy link
Contributor Author

zhandaz commented Aug 5, 2025

@SahilJain314 @YUki-666

Since vllm has been upgraded to 0.10.0 in #766, I am looking into continuing this PR to support logprobs_mode.

However, I found that the meaning of logprobs_mode seems different from what we have understood: regardless of the value of logprobs_mode, the returned logprobs are calculated before temperature, top_k, and top_p.

Specifically, in https://github.com/vllm-project/vllm/blob/v0.10.0/vllm/v1/sample/sampler.py#L28-L91, if processed_logprobs is set, then the returned logprobs are after allowed_token_ids_processing, bad_words_exclusion, logits_processors, and penalties. However, temperature, top_k, and top_p are applied in the sample method, which does not affect the returned logprobs.

From my perspective, it is not possible to re-apply sampling parameters like temperature, top_k, or top_p to the returned_logprobs. These sampling methods require a full probability distribution across the entire vocabulary to function. Since the returned log probability is only for the single token chosen during the generation sampling step, the information about the rest of the distribution has been irreversibly lost.

Please correct me if I'm wrong. What are your thoughts on how we should proceed? The most effective solution is to request a feature that returns the after-sampling logprobs.

cc: @wangshangsam

@yuki-97
Copy link
Contributor

yuki-97 commented Aug 6, 2025

@Dazz993 I think you are right that in vllm==0.10.0 v1 engine:

  1. all the logprobs_mode (raw_logprobs, raw_logits, processed_logprobs, processed_logits) don't apply the sample stage and not match what we want.
  2. top_k & top_p need the entire vocabulary. I think it's too expensive to take all the vocabulary logit from vllm to calculate top_k & top_p at train side and we won't do that.

the way I can think for now is to have a vllm patch to get the processed_logprobs we really want.

  1. replace Sampler.forward and Sampler.sample in sampler.py.
  2. replace TopKTopPSampler in topk_topp_sampler.py to return logit.

also we may ask vllm team if they could support this in the future.

wdyt? @Dazz993 @SahilJain314

@yuki-97
Copy link
Contributor

yuki-97 commented Aug 6, 2025

BTW, two questions:

  1. will these things affect logits? as far as I know we don't measure them at training side.
    • allowed_token_ids_processing, bad_words_exclusion, logits_processors, and penalties
  2. what logits or logprobs does vllm v0 engine return? is that what we want?

@zhandaz
Copy link
Contributor Author

zhandaz commented Aug 6, 2025

@YUki-666 Thanks for your comments!

top_k & top_p need the entire vocabulary

Yes. Temperature also requires the entire vocabulary, since $logprobs_i = \dfrac{exp(x_i / \tau)}{\sum_j exp(x_j/ \tau)}$.

the way I can think for now is to have a vllm patch to get the processed_logprobs we really want.

  1. replace Sampler.forward and Sampler.sample in sampler.py.
  2. replace TopKTopPSampler in topk_topp_sampler.py to return logit.

Yes, that is exactly what I am thinking.

will these things affect logits? as far as I know we don't measure them at training side. allowed_token_ids_processing, bad_words_exclusion, logits_processors, and penalties

Not sure about the exact behavior for different recipes. In our GenerationConfig, we don't provide fields to modify these SamplingMetadata. If vLLM doesn't process them by default, then it should be fine. Can investigate further if needed.

what logits or logprobs does vllm v0 engine return? is that what we want?

According to https://github.com/vllm-project/vllm/blob/v0.10.0/vllm/model_executor/layers/sampler.py#L219-L328, I think it is returning the final logprobs (sampling metadata applied).

@zhandaz
Copy link
Contributor Author

zhandaz commented Aug 7, 2025

Update: I found this PR to get the final logprobs from vllm: vllm-project/vllm#22387

@22quinn
Copy link

22quinn commented Aug 7, 2025

Update: I found this PR to get the final logprobs from vllm: vllm-project/vllm#22387

Thank you for the interest! That PR will apply not only top-k & top-p, but also temperature & min-p. Does that work for your use case?

@zhandaz
Copy link
Contributor Author

zhandaz commented Aug 7, 2025

Update: I found this PR to get the final logprobs from vllm: vllm-project/vllm#22387

Thank you for the interest! That PR will apply not only top-k & top-p, but also temperature & min-p. Does that work for your use case?

Yep! We want the final logprobs just before the sampling. Thank you for your PR!

@wangshangsam
Copy link
Contributor

@zhandaz Just to confirm - the state of this PR right now is that it's waiting for the vLLM 0.11 release?

@zhandaz
Copy link
Contributor Author

zhandaz commented Aug 20, 2025

@wangshangsam

Just to confirm - the state of this PR right now is that it's waiting for the vLLM 0.11 release?

Yep. Unless we want to another more complicated file patch, or we have to wait for the vLLM to support this first.

@terrykong
Copy link
Contributor

@zhandaz is 0.10.2 new enough?

@jseppanen
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Top-p/Top-k Sampling Params handling in VLLM v1

8 participants