fix vllm dp & reset_encoder_cache & fix vllm init with zero3#9295
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the vLLM engine integration, including new methods for resetting encoder and multimodal (MM) caches and improved device visibility management for data parallel ranks on both CUDA and NPU platforms. Review feedback identifies potential AttributeError risks for the reset_mm_cache call if it is not supported by the underlying vLLM engine. Additionally, there are suggestions to refactor duplicated logic in the cache reset broadcasting methods and a request to verify the '0.16' version check, which may be a typo for '0.6.x'.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enhances the vLLM engine integration by increasing the DP coordinator timeout to 180 seconds and improving device visibility management for data-parallel rollout workers. It also introduces functionality to reset encoder and multimodal caches, adding corresponding FastAPI endpoints and client-side methods. The review feedback points out that the implementation of these new asynchronous endpoints uses blocking I/O, which could negatively impact the event loop's performance, and recommends offloading these operations to an executor.
Uh oh!
There was an error while loading. Please reload this page.