Aligning `top_p` and `top_k` Sampling by chenxu2048 · Pull Request #1885 · vllm-project/vllm

chenxu2048 · 2023-12-01T09:24:43Z

We noticed that there are a little differences in the implementation of top_p and top_k in the vLLM sampler compared to Huggingface's implementation. We have aligned the implementation details of TopPLogitsWarper and TopKLogitsWarper in Huggingface transformers.

1. Sampling Order

In Huggingface transformers and FasterTransformers , top_k is applied first, followed by top_p. In vLLM, it is the opposite. Therefore, when specifying them simultaneously, the probability distribution generated in vLLM may be different.

# top_k = 2
# top_p = 0.5

# top_k first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, -inf, 0.4 ] -> [ 0, 0, 0, 1 ]
# top_p first
[ 0.1, 0.2, 0.3, 0.4 ] -> [ -inf, -inf, 0.3, 0.4 ] -> [ 0, 0, 0.475, 0.525 ]

2. Sorting Order

Huggingface transformers top_p use ascending order, while vLLM uses descending order. When the logits of tokens are equal, the chosen token may be inconsistent (torch uses stable sorting).

# top_p = 0.3

# descending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]
# ascending
[ 0.2, 0.2, 0.3, 0.3 ] -> [ -inf, -inf, -inf, 0.3 ]

3. TopK Selection

In Huggingface transformers, top_k selection is based on logits greater than or equal to the k-th largest, not the top_k items.

# top_k = 1

# huggingface
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, 0.3, 0.3, 0.3 ]
# vllm
[ 0.1, 0.3, 0.3, 0.3 ] -> [ -inf, -inf, 0.3, -inf ]

Yard1 · 2023-12-01T18:59:48Z

@chenxu2048 In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations? Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

chenxu2048 · 2023-12-04T02:10:19Z

In order to keep parity moving forwards, do you think it would make sense to add a simple unit test comparing outputs of vLLM and HF implementations?

Ok, we will provide a simple unit test to compare the sampler with HF. Should we add the script into the repo or provide it in
the PR comments?

Also, we set top_k to be vocab_size-1 by default, is that still going to mean "all possible tokens" with the new implementation?

top_k with vocab_size - 1 gather the vocab_size - 1-th element in the logits_sorted, which is the last one. All the top_k_mask would be true in this case. We can run a unit test to check it.

@Yard1 PTAL

Yard1 · 2023-12-04T17:24:23Z

@chenxu2048 please put the test in the repo (tests/samplers would be great), thanks!

zhuohan123 · 2023-12-18T03:09:01Z

@Yard1 Has the issue in this PR been fixed by the refactor #1889?

Yard1 · 2023-12-18T03:09:54Z

No, the refactor did not change the logic, unlike this PR @zhuohan123

chenxu2048 · 2023-12-20T03:21:40Z

No, the refactor did not change the logic, unlike this PR @zhuohan123

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

chenxu2048 · 2023-12-25T04:50:00Z

Hi, @Yard1 @zhuohan123 I'll rebase my work with tests on #1889 and PR again in this weekend.

This PR is ready for review. PTAL.

chenxu2048 · 2023-12-25T04:52:57Z

It seems that _get_prompt_and_output_tokens is unused in the sampler, and I think we could remove it.

chenxu2048 · 2023-12-27T06:44:57Z

Here are the results of testing on main branch and this PR.
main.test_sampler_top_k_top_p.txt
pr.test_sampler_top_k_top_p.txt

Yard1 · 2024-01-12T21:51:13Z

Thanks!

* Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors

There is a lot torchair specified logic in common code. It results hard code maintenance. We will create a new torchair module to launch torchair related logic there. I plan to add 4 PR. 1. Refactor worker (this PR) - create torchair module and move torchair related code in worker to the new module 3. Refactor utils 4. Refactor model_runner 5. Refactor attention - vLLM version: v0.9.2 - vLLM main: vllm-project@8188196 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Yard1 mentioned this pull request Dec 1, 2023

Make sampler less blocking #1889

Merged

Align top_p and top_k with huggingface

e1dc47b

chenxu2048 force-pushed the align_top_k_top_p branch from 485dbca to e1dc47b Compare December 24, 2023 14:29

chenxu2048 added 3 commits December 24, 2023 22:37

remove _get_prompt_and_output_tokens

4e1fb8d

rename _apply_top_p_top_k

6bfc124

compare top_p top_k with hf

f96f732

chenxu2048 force-pushed the align_top_k_top_p branch from e4b0614 to 9b1f683 Compare December 25, 2023 04:47

fix test errors

2d0e798

chenxu2048 force-pushed the align_top_k_top_p branch from 9b1f683 to 2d0e798 Compare December 27, 2023 06:41

Yard1 approved these changes Jan 12, 2024

View reviewed changes

Yard1 merged commit 218dc2c into vllm-project:main Jan 12, 2024

peng1999 mentioned this pull request Aug 8, 2024

[Core] Use flashinfer sampling kernel when available #7137

Merged

jinyouzhi pushed a commit to jinyouzhi/vllm that referenced this pull request Sep 12, 2025

Update vllm container path in the README.md (vllm-project#1885)

2e9b2b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aligning `top_p` and `top_k` Sampling#1885

Aligning `top_p` and `top_k` Sampling#1885
Yard1 merged 5 commits intovllm-project:mainfrom
chenxu2048:align_top_k_top_p

chenxu2048 commented Dec 1, 2023

Uh oh!

Yard1 commented Dec 1, 2023 •

edited

Loading

Uh oh!

chenxu2048 commented Dec 4, 2023 •

edited

Loading

Uh oh!

Yard1 commented Dec 4, 2023

Uh oh!

zhuohan123 commented Dec 18, 2023

Uh oh!

Yard1 commented Dec 18, 2023

Uh oh!

chenxu2048 commented Dec 20, 2023

Uh oh!

chenxu2048 commented Dec 25, 2023

Uh oh!

chenxu2048 commented Dec 25, 2023 •

edited

Loading

Uh oh!

chenxu2048 commented Dec 27, 2023

Uh oh!

Yard1 commented Jan 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

chenxu2048 commented Dec 1, 2023

1. Sampling Order

2. Sorting Order

3. TopK Selection

Uh oh!

Yard1 commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenxu2048 commented Dec 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yard1 commented Dec 4, 2023

Uh oh!

zhuohan123 commented Dec 18, 2023

Uh oh!

Yard1 commented Dec 18, 2023

Uh oh!

chenxu2048 commented Dec 20, 2023

Uh oh!

chenxu2048 commented Dec 25, 2023

Uh oh!

chenxu2048 commented Dec 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenxu2048 commented Dec 27, 2023

Uh oh!

Yard1 commented Jan 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yard1 commented Dec 1, 2023 •

edited

Loading

chenxu2048 commented Dec 4, 2023 •

edited

Loading

chenxu2048 commented Dec 25, 2023 •

edited

Loading