fix vllm dp & reset_encoder_cache & fix vllm init with zero3 by hjh0119 · Pull Request #9295 · modelscope/ms-swift

hjh0119 · 2026-05-09T07:45:33Z

fix vllm dp with dense model Major speed degradation in GRPO with async external server after vllm update #9221 [BugFix] Support online dense model DP without overhead vllm-project/vllm#30739
support reset_encoder_cache to clean vision embedding cache Support clear mm and encoder cache vllm-project/vllm#33452
Fix a bug where, under zero3 + colocate, vllm model initialization may fail due to the zero3 context introduced by training (e.g. gemma4).) Gemma4 Support FFT + Zero3 #9255
fix timeout issue in async dp introduced by Fix DP coordinator ZMQ port TOCTOU vllm-project/vllm#37452

gemini-code-assist

Code Review

This pull request introduces several enhancements to the vLLM engine integration, including new methods for resetting encoder and multimodal (MM) caches and improved device visibility management for data parallel ranks on both CUDA and NPU platforms. Review feedback identifies potential AttributeError risks for the reset_mm_cache call if it is not supported by the underlying vLLM engine. Additionally, there are suggestions to refactor duplicated logic in the cache reset broadcasting methods and a request to verify the '0.16' version check, which may be a typo for '0.6.x'.

hjh0119 · 2026-05-11T06:31:08Z

/gemini review

gemini-code-assist

Code Review

This pull request enhances the vLLM engine integration by increasing the DP coordinator timeout to 180 seconds and improving device visibility management for data-parallel rollout workers. It also introduces functionality to reset encoder and multimodal caches, adding corresponding FastAPI endpoints and client-side methods. The review feedback points out that the implementation of these new asynchronous endpoints uses blocking I/O, which could negatively impact the event loop's performance, and recommends offloading these operations to an executor.

hjh0119 added 2 commits May 9, 2026 15:29

fix vllm dp & reset_encoder_cache & fix vllm init with zero3

450ef45

fix comment

3c677d1

hjh0119 mentioned this pull request May 9, 2026

Major speed degradation in GRPO with async external server after vllm update #9221

Closed

1 task

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

hjh0119 added 2 commits May 9, 2026 16:39

fix dp

55f5c35

fix async dp timeout

e6d824f

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

Comment thread swift/pipelines/infer/rollout.py

Comment thread swift/pipelines/infer/rollout.py

Jintao-Huang approved these changes May 11, 2026

View reviewed changes

hjh0119 merged commit cab4169 into modelscope:main May 11, 2026
2 of 3 checks passed

hjh0119 deleted the fix-vllm-dp branch May 11, 2026 06:54

hjh0119 mentioned this pull request May 11, 2026

Gemma4 Support FFT + Zero3 #9255

Closed

1 task

Jintao-Huang pushed a commit that referenced this pull request May 17, 2026

fix vllm dp & reset_encoder_cache & fix vllm init with zero3 (#9295)

9f27c8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix vllm dp & reset_encoder_cache & fix vllm init with zero3#9295

fix vllm dp & reset_encoder_cache & fix vllm init with zero3#9295
hjh0119 merged 4 commits into
modelscope:mainfrom
hjh0119:fix-vllm-dp

hjh0119 commented May 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hjh0119 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjh0119 commented May 9, 2026 •

edited

Loading