Skip to content

Aligning top_p and top_k Sampling#1885

Merged
Yard1 merged 5 commits intovllm-project:mainfrom
chenxu2048:align_top_k_top_p
Jan 12, 2024
Merged

Aligning top_p and top_k Sampling#1885
Yard1 merged 5 commits intovllm-project:mainfrom
chenxu2048:align_top_k_top_p

Conversation

@chenxu2048
Copy link
Copy Markdown
Contributor

We noticed that there are a little differences in the implementation of top_p and top_k in the vLLM sampler compared to Huggingface's implementation. We have aligned the implementation details of TopPLogitsWarper and TopKLogitsWarper in Huggingface transformers.

1. Sampling Order

In Huggingface transformers and FasterTransformers , top_k is applied first, followed by top_p. In vLLM, it is the opposite. Therefore, when specifying them simultaneously, the probability distribution generated in vLLM may be different.

# top_k = 2
# top_p = 0.5

# top_k first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, -inf, 0.4 ] -> [ 0, 0, 0, 1 ]
# top_p first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, 0.3, 0.4 ] -> [ 0, 0, 0.475, 0.525 ]

2. Sorting Order

Huggingface transformers top_p use ascending order, while vLLM uses descending order. When the logits of tokens are equal, the chosen token may be inconsistent (torch uses stable sorting).

# top_p = 0.3

# descending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]
# ascending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, -inf, 0.3 ]

3. TopK Selection

In Huggingface transformers, top_k selection is based on logits greater than or equal to the k-th largest, not the top_k items.

# top_k = 1

# huggingface
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, 0.3, 0.3, 0.3 ]
# vllm
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]

@Yard1
Copy link
Copy Markdown
Collaborator

Yard1 commented Dec 1, 2023

@chenxu2048 In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations? Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

@Yard1 Yard1 mentioned this pull request Dec 1, 2023
@chenxu2048
Copy link
Copy Markdown
Contributor Author

chenxu2048 commented Dec 4, 2023

In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations?

Ok, we will provide a simple unit test to compare the sampler with HF. Should we add the script into the repo or provide it in
the PR comments?

Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

top_k with vocab_size - 1 gather the vocab_size - 1-th element in the logits_sorted, which is the last one. All the top_k_mask would be true in this case. We can run a unit test to check it.

@Yard1 PTAL

@Yard1
Copy link
Copy Markdown
Collaborator

Yard1 commented Dec 4, 2023

@chenxu2048 please put the test in the repo (tests/samplers would be great), thanks!

@zhuohan123
Copy link
Copy Markdown
Member

@Yard1 Has the issue in this PR been fixed by the refactor #1889?

@Yard1
Copy link
Copy Markdown
Collaborator

Yard1 commented Dec 18, 2023

No, the refactor did not change the logic, unlike this PR @zhuohan123

@chenxu2048
Copy link
Copy Markdown
Contributor Author

No, the refactor did not change the logic, unlike this PR @zhuohan123

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

@chenxu2048
Copy link
Copy Markdown
Contributor Author

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

This PR is ready for review. PTAL.

@chenxu2048
Copy link
Copy Markdown
Contributor Author

chenxu2048 commented Dec 25, 2023

It seems that _get_prompt_and_output_tokens is unused in the sampler, and I think we could remove it.

@chenxu2048
Copy link
Copy Markdown
Contributor Author

Here are the results of testing on main branch and this PR.
main.test_sampler_top_k_top_p.txt
pr.test_sampler_top_k_top_p.txt

@Yard1 Yard1 merged commit 218dc2c into vllm-project:main Jan 12, 2024
@Yard1
Copy link
Copy Markdown
Collaborator

Yard1 commented Jan 12, 2024

Thanks!

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jan 14, 2024
* Align top_p and top_k with huggingface

* remove _get_prompt_and_output_tokens

* rename _apply_top_p_top_k

* compare top_p top_k with hf

* fix test errors
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
* Align top_p and top_k with huggingface

* remove _get_prompt_and_output_tokens

* rename _apply_top_p_top_k

* compare top_p top_k with hf

* fix test errors
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
* Align top_p and top_k with huggingface

* remove _get_prompt_and_output_tokens

* rename _apply_top_p_top_k

* compare top_p top_k with hf

* fix test errors
jinyouzhi pushed a commit to jinyouzhi/vllm that referenced this pull request Sep 12, 2025
WeNeedMoreCode pushed a commit to WeNeedMoreCode/vllm that referenced this pull request Dec 15, 2025
There is a lot torchair specified logic in common code. It results hard
code maintenance. We will create a new torchair module to launch
torchair related logic there. I plan to add 4 PR.

1. Refactor worker (this PR)
- create torchair module and move torchair related code in worker to the
new module
3. Refactor utils
4. Refactor model_runner
5. Refactor attention


- vLLM version: v0.9.2
- vLLM main:
vllm-project@8188196

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants