Skip to content

fix vllm dp & reset_encoder_cache & fix vllm init with zero3#9295

Merged
hjh0119 merged 4 commits into
modelscope:mainfrom
hjh0119:fix-vllm-dp
May 11, 2026
Merged

fix vllm dp & reset_encoder_cache & fix vllm init with zero3#9295
hjh0119 merged 4 commits into
modelscope:mainfrom
hjh0119:fix-vllm-dp

Conversation

@hjh0119

@hjh0119 hjh0119 commented May 9, 2026

Copy link
Copy Markdown
Collaborator
  1. fix vllm dp with dense model Major speed degradation in GRPO with async external server after vllm update #9221 [BugFix] Support online dense model DP without overhead vllm-project/vllm#30739
  2. support reset_encoder_cache to clean vision embedding cache Support clear mm and encoder cache vllm-project/vllm#33452
  3. Fix a bug where, under zero3 + colocate, vllm model initialization may fail due to the zero3 context introduced by training (e.g. gemma4).) Gemma4 Support FFT + Zero3 #9255
  4. fix timeout issue in async dp introduced by Fix DP coordinator ZMQ port TOCTOU vllm-project/vllm#37452

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several enhancements to the vLLM engine integration, including new methods for resetting encoder and multimodal (MM) caches and improved device visibility management for data parallel ranks on both CUDA and NPU platforms. Review feedback identifies potential AttributeError risks for the reset_mm_cache call if it is not supported by the underlying vLLM engine. Additionally, there are suggestions to refactor duplicated logic in the cache reset broadcasting methods and a request to verify the '0.16' version check, which may be a typo for '0.6.x'.

Comment thread swift/megatron/trainers/rollout_mixin.py
Comment thread swift/rlhf_trainers/rollout_mixin.py
Comment thread swift/rlhf_trainers/vllm_client.py
Comment thread swift/pipelines/infer/rollout.py
Comment thread swift/rlhf_trainers/rollout_mixin.py
Comment thread swift/megatron/trainers/rollout_mixin.py
@hjh0119

hjh0119 commented May 11, 2026

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the vLLM engine integration by increasing the DP coordinator timeout to 180 seconds and improving device visibility management for data-parallel rollout workers. It also introduces functionality to reset encoder and multimodal caches, adding corresponding FastAPI endpoints and client-side methods. The review feedback points out that the implementation of these new asynchronous endpoints uses blocking I/O, which could negatively impact the event loop's performance, and recommends offloading these operations to an executor.

Comment thread swift/pipelines/infer/rollout.py
Comment thread swift/pipelines/infer/rollout.py
@hjh0119 hjh0119 merged commit cab4169 into modelscope:main May 11, 2026
2 of 3 checks passed
@hjh0119 hjh0119 deleted the fix-vllm-dp branch May 11, 2026 06:54
@hjh0119 hjh0119 mentioned this pull request May 11, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants