[vllm, rollout] feat: support reset prefix cache after abort#4519
[vllm, rollout] feat: support reset prefix cache after abort#4519PeterSH6 merged 2 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a reset_prefix_cache parameter to allow resetting the prefix cache after aborting requests. While this is a good addition, there is a critical issue that needs to be addressed: the new parameter is not propagated through the vLLMReplica methods (abort_all_requests and abort_request in verl/workers/rollout/vllm_rollout/vllm_async_server.py). This makes the feature non-functional for callers of vLLMReplica. Please update these methods to accept and pass the reset_prefix_cache parameter to the underlying server calls. Additionally, I've noted an opportunity to improve maintainability by refactoring duplicated code for resetting the cache.
| # Try to reset prefix cache to ensure clean state | ||
| if reset_prefix_cache: | ||
| try: | ||
| await self.engine.reset_prefix_cache() | ||
| logger.info("Prefix cache reset after abort") | ||
| except Exception as cache_error: | ||
| logger.warning(f"Error resetting prefix cache: {cache_error}") |
There was a problem hiding this comment.
This block of code for resetting the prefix cache is also present in the abort_request method (lines 598-604). This code duplication goes against the DRY (Don't Repeat Yourself) principle and can lead to maintenance issues where one instance is updated but the other is missed. I recommend extracting this logic into a private helper method, for instance _try_reset_prefix_cache, and calling it from both abort_all_requests and abort_request.
| self.engine.output_processor.abort_requests(request_ids) | ||
| await self.engine.engine_core.abort_requests_async(request_ids) | ||
|
|
||
| # Try to reset prefix cache to ensure clean state |
There was a problem hiding this comment.
We should always reset_prefix_cache when abort request?
There was a problem hiding this comment.
Yes, we should reset prefix cache after abort as the rollout weights are usually updated after aborting.
There was a problem hiding this comment.
If not reset, it may degenerate to a paradigm similar to pipelineRL
| return {"aborted_count": 0, "request_ids": [], "error": str(e)} | ||
|
|
||
| async def abort_request(self, request_id: str) -> dict[str, Any]: | ||
| async def abort_request(self, request_id: str, reset_prefix_cache: bool = True) -> dict[str, Any]: |
There was a problem hiding this comment.
When will we want to abort single request?
There was a problem hiding this comment.
This may not be used quite often, but we can leave it as an API for fine-grained control
…oject#4519) ### What does this PR do? Add a reset option after calling the abort request in vLLM. Default reset prefix cache after abort. If not reset, it may become pipelinerl. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
…oject#4519) ### What does this PR do? Add a reset option after calling the abort request in vLLM. Default reset prefix cache after abort. If not reset, it may become pipelinerl. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
What does this PR do?
Add a reset option after calling the abort request in vLLM.
Default reset prefix cache after abort.
If not reset, it may become pipelinerl.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)