Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions vllm/v1/worker/gpu_input_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,8 @@ def add_request(
req_index = self.num_reqs

req_id = request.req_id
if req_id in self.req_id_to_index:
req_index = self.req_id_to_index[req_id]
Comment on lines +290 to +291
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

While this change correctly prevents duplicate request entries, it introduces an issue for non-pooling models by creating an inconsistent state for logits processors.

The _register_add_request function is called for non-pooling models before this check. It adds an entry to self.batch_update_builder.added with a newly allocated req_index. Your change then overwrites req_index with the existing request's index, but the batch_update_builder is left with a stale entry pointing to the wrong index. This can lead to incorrect behavior during logits processing.

A safer approach is to check for req_id's existence before calling _register_add_request. This ensures a request is only registered as "added" when it is truly new.

Consider this alternative structure to replace lines 283-291:

        req_id = request.req_id
        if req_id in self.req_id_to_index:
            req_index = self.req_id_to_index[req_id]
        elif not self.is_pooling_model:
            # New request index bookkeeping for autoregressive models.
            req_index = self._register_add_request(request)
        else:
            req_index = self.num_reqs

if req_index == len(self._req_ids):
self._req_ids.append(req_id)
self.req_output_token_ids.append(request.output_token_ids)
Expand Down