[train][fullyAsync] Fix abort/pause broken after vllm 0.16.0 bump#1250
Merged
[train][fullyAsync] Fix abort/pause broken after vllm 0.16.0 bump#1250
Conversation
In vllm 0.16.0, InputProcessor.assign_request_id() now creates internal request IDs (with random suffix) separate from the user-provided external request IDs. The output_processor.request_states dict is keyed by these internal IDs, but engine.abort() with internal=False (default) looks them up in the external_req_ids mapping. Since internal IDs don't exist as external keys, the abort silently did nothing and requests completed normally with finish_reason="length" instead of "abort". Add _get_unfinished_request_ids() helper that uses external_req_ids when available (vllm 0.16.0+) and falls back to request_states keys for older versions. Apply to abort_generation() and both sync/async sleep() methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request addresses an issue with abort_generation() and sleep() in the vLLM engine, which was caused by a change in request ID handling in vllm version 0.16.0. The fix introduces a _get_unfinished_request_ids helper method to correctly retrieve request IDs for aborting, ensuring backward compatibility. The changes are logical and well-contained. My feedback includes a minor suggestion to improve type hinting for the new helper method.
SumanthRH
approved these changes
Mar 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
abort_generation()andsleep()abort logic that broke silently after the vllm 0.16.0 bump (Bump vLLM to 0.16.0 with required dep updates #1240)_get_unfinished_request_ids()helper to resolve internal vs external request ID mismatchRoot Cause
In vllm 0.16.0,
InputProcessor.assign_request_id()now creates internal request IDs (with a random suffix) that are distinct from the user-provided external request IDs:Our code was reading request IDs from
output_processor.request_states.keys()(which are now internal IDs) and passing them toengine.abort()withinternal=False(the default). The abort looked them up in theexternal_req_idsmapping, found nothing, and silently did nothing. Requests completed normally withfinish_reason="length"instead of"abort".This broke fully async RL's pause/resume flow, which relies on abort returning partial outputs with
finish_reason="abort"so the retry loop can re-submit with accumulated tokens.Related vllm changes:
Fix
Add a
_get_unfinished_request_ids()static method onBaseVLLMInferenceEnginethat:output_processor.external_req_ids.keys()when available (vllm 0.16.0+)output_processor.request_states.keys()for older vllm versionsApplied to all three abort call sites:
AsyncVLLMInferenceEngine.abort_generation()— used by fully async pause/resumeAsyncVLLMInferenceEngine.sleep()— cleanup before sleepVLLMInferenceEngine.sleep()— sync engine cleanup before sleepTest plan
test_abort_generation_vllm_engine— passes (was failing withassert 'length' == 'abort')test_continue_generation_vllm_engine_chat_completion— passestest_continue_generation_generate_vllm_engine_generation— passesgsm8k_fully_async_ciproject) — ran ~12 training steps successfully with pause/resume working correctlyLight blue is the run after this fix (our nightly gsm8k fully async CI) https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_fully_async_ci