[BugFix] fix greedy temperature in eagle#5423
[BugFix] fix greedy temperature in eagle#5423bnbryan wants to merge 1 commit intovllm-project:mainfrom bnbryan:fix-eagle-greedy-temperature
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request addresses a critical bug in the rejection sampler for greedy decoding. By changing the GREEDY_TEMPERATURE constant from -1 to 0, it correctly identifies greedy sampling requests (where temperature is 0). This prevents a potential division-by-zero error during logit processing and ensures the is_greedy flag is set correctly, resolving a crash that occurred when mixing greedy and non-greedy requests in a batch. The change is correct and effectively fixes the described issue.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
The issue has already been resolved in #5417 |
What this PR does / why we need it?
Updated the GREEDY_TEMPERATURE value in rejection_sampler.py to 0.
Commit vllm-project/vllm@a676e66 changed greedy temperature from -1 to 0 in gpu_input_batch.py. So, previously setting this constant to -1 in rejection_sampler.py will compromised the correctness of the is_greedy flag, which will cause a crash if a batch contains a mix of sequences with temperature > 0 and temperature = 0.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@be2a947