[BugFix] Fix mixed penalties batch with async scheduling#27910
[BugFix] Fix mixed penalties batch with async scheduling#27910njhill merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in async scheduling where a batch containing a mix of requests with and without penalties could fail. The fix involves replacing placeholder -1 token IDs with a valid token ID to prevent errors in downstream operations. The approach is sound. I've suggested a minor improvement to make the fix more robust by using vocab_size as the replacement value, which is already used as a padding/ignore value, instead of 0.
Signed-off-by: Nick Hill <nhill@redhat.com>
36e9424 to
1aecaef
Compare
…t#27910) Signed-off-by: Nick Hill <nhill@redhat.com>
…t#27910) Signed-off-by: Nick Hill <nhill@redhat.com>
…t#27910) Signed-off-by: Nick Hill <nhill@redhat.com>
| # scatter done in apply_penalties is valid. | ||
| # NOTE(nick): The penalties implementation is currently quite inefficient and | ||
| # will be reworked anyhow. | ||
| output_tokens_t.masked_fill_(output_tokens_t == -1, vocab_size) |
There was a problem hiding this comment.
Hi, I’d like to ask why the actual draft token isn’t used here to replace the placeholder? Thanks.
There was a problem hiding this comment.
@hidva this line should only apply to requests/rows in the batch which don't require the output tokens (i.e. those which don't use penalties sampling parameters). The other rows should not contain any placeholder tokens at this point.
Also async scheduling + spec decode + penalties isn't yet supported (any help with that appreciated though - see discussion in #30122).
#26467 fixed compatibility of penalties sampling parameters with async scheduling but has a flaw that it breaks in cases where there is a mix of requests with and without penalties in the batch, specifically if a request with a penalties param starts while a batch without penalties is already running.
This is a fix for that case.