[Model] enable data parallel for Llama4 vision encoder#18368
[Model] enable data parallel for Llama4 vision encoder#18368houseroad merged 5 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
sarckk
left a comment
There was a problem hiding this comment.
this looks good to me overall, could you also add MM eval results with DP and TP?
|
cc @houseroad |
ywang96
left a comment
There was a problem hiding this comment.
Sorry for the delayed review and thank you for the contribution! Overall and I left some comments!
There was a problem hiding this comment.
This is great! As a follow-up I think it makes sense to rewrite this into a separate function to be shared by other models since this is not specific to mllama4 vision encoder in particular!
There was a problem hiding this comment.
maybe create an issue to follow up?
70aacd3 to
ac307ed
Compare
vllm/multimodal/utils.py
Outdated
There was a problem hiding this comment.
wondering if we need clone here?
vllm/multimodal/utils.py
Outdated
There was a problem hiding this comment.
we can add some unittest for function. (good for a follow up PR)
There was a problem hiding this comment.
nit: we can add some sanity checks to ensure _consolidate_qkv_weights operate appropriately.
|
@ywang96 , could you give another pass? |
ywang96
left a comment
There was a problem hiding this comment.
Please address Lu's comment otherwise LGTM!
4a0e16e to
e2d3ee5
Compare
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Summary: Add unit test for run_dp_sharded_vision_model, following up on vllm-project#18368 pytest tests/multimodal/test_utils.py -k "test_run_dp_sharded_vision_model" =3 passed, 44 deselected, 5 warnings in 37.76s = Signed-off-by: Siqi Yan <siqi@meta.com>
…18368) Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com> Co-authored-by: yZhen <yZhen@fb.com> Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Summary:
Llama4 vision encoder in dp8 is ~3x as fast as in tp8, especially when handling a large number of input images (eg. 9 images per request).
Add an enable_vision_encoder_data_parallel to allow using different parallelism for vision model and language model.
Unit test
pytest tests/models/multimodal/generation/test_common.py -k "llama4"MM Eval
Baseline TP8
DP8
perf result
DP8
Baseline TP8