[Model] enable data parallel for Llama4 vision encoder by jennyyyyzhen · Pull Request #18368 · vllm-project/vllm

jennyyyyzhen · 2025-05-19T22:33:28Z

Summary:
Llama4 vision encoder in dp8 is ~3x as fast as in tp8, especially when handling a large number of input images (eg. 9 images per request).
Add an enable_vision_encoder_data_parallel to allow using different parallelism for vision model and language model.

Unit test
pytest tests/models/multimodal/generation/test_common.py -k "llama4"

MM Eval
Baseline TP8

lm_eval --model vllm-vlm --model_args pretrained=meta-llama/Llama-4-Scout-17B-16E-Instruct,tensor_parallel_size=8,max_model_len=32768,gpu_memory_utilization=0.9 --tasks chartqa --batch_size auto --apply_chat_template

Tasks	Version	Filter	Metric		Value		Stderr
chartqa	0	none	anywhere_accuracy	↑	0.8872	±	0.0063
		none	exact_match	↑	0.6548	±	0.0095
		none	relaxed_accuracy	↑	0.8848	±	0.0064

DP8

lm_eval --model vllm-vlm --model_args pretrained=meta-llama/Llama-4-Scout-17B-16E-Ins
truct,tensor_parallel_size=8,max_model_len=32768,gpu_memory_utilization=0.9,enable_multimodal_encoder_data_parallel=true --tasks chartqa --batch_size auto --apply_chat_template

Tasks	Version	Filter	Metric		Value		Stderr
chartqa	0	none	anywhere_accuracy	↑	0.8852	±	0.0064
		none	exact_match	↑	0.6492	±	0.0095
		none	relaxed_accuracy	↑	0.8820	±	0.0065

perf result

HF_CHECKPOINT=meta-llama/Llama-4-Scout-17B-16E-Instruct
 python benchmarks/benchmark_serving.py --backend openai-chat --model $HF_CHECKPOINT --dataset-name hf --dataset-path lmarena-ai/VisionArena-Chat --hf-split train --num-prompts 1000 --endpoint /v1/chat/completions --max-concurrency 32 --ignore-eos --seed 0

DP8

============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  115.99    
Total input tokens:                      87321     
Total generated tokens:                  128000    
Request throughput (req/s):              8.62      
Output token throughput (tok/s):         1103.55   
Total Token throughput (tok/s):          1856.38   
---------------Time to First Token----------------
Mean TTFT (ms):                          641.44    
Median TTFT (ms):                        571.65    
P99 TTFT (ms):                           2443.98   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          23.86     
Median TPOT (ms):                        23.52     
P99 TPOT (ms):                           31.16     
---------------Inter-token Latency----------------
Mean ITL (ms):                           23.69     
Median ITL (ms):                         17.91     
P99 ITL (ms):                            222.95    
==================================================

Baseline TP8

============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  124.06    
Total input tokens:                      87321     
Total generated tokens:                  128000    
Request throughput (req/s):              8.06      
Output token throughput (tok/s):         1031.74   
Total Token throughput (tok/s):          1735.58   
---------------Time to First Token----------------
Mean TTFT (ms):                          758.51    
Median TTFT (ms):                        734.38    
P99 TTFT (ms):                           1805.42   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          24.95     
Median TPOT (ms):                        24.71     
P99 TPOT (ms):                           30.54     
---------------Inter-token Latency----------------
Mean ITL (ms):                           24.76     
Median ITL (ms):                         17.87     
P99 ITL (ms):                            283.55    
==================================================

github-actions · 2025-05-19T22:33:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

sarckk

this looks good to me overall, could you also add MM eval results with DP and TP?

DarkLight1337 · 2025-05-20T13:26:45Z

cc @houseroad

ywang96

Sorry for the delayed review and thank you for the contribution! Overall and I left some comments!

vllm/engine/arg_utils.py

ywang96 · 2025-05-30T11:34:54Z

vllm/model_executor/models/mllama4.py

This is great! As a follow-up I think it makes sense to rewrite this into a separate function to be shared by other models since this is not specific to mllama4 vision encoder in particular!

maybe create an issue to follow up?

houseroad

Looks pretty reasonable. Could you rebase and address @ywang96 's feedback?

houseroad · 2025-05-31T09:50:03Z

vllm/multimodal/utils.py

wondering if we need clone here?

houseroad

Looks good to me.

houseroad · 2025-05-31T09:52:18Z

vllm/multimodal/utils.py

we can add some unittest for function. (good for a follow up PR)

@houseroad follow up PR is here #19103

houseroad · 2025-05-31T09:56:09Z

vllm/model_executor/models/mllama4.py

nit: we can add some sanity checks to ensure _consolidate_qkv_weights operate appropriately.

houseroad · 2025-05-31T09:58:21Z

@ywang96 , could you give another pass?

ywang96

Please address Lu's comment otherwise LGTM!

Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

Summary: Add unit test for run_dp_sharded_vision_model, following up on vllm-project#18368 pytest tests/multimodal/test_utils.py -k "test_run_dp_sharded_vision_model" =3 passed, 44 deselected, 5 warnings in 37.76s = Signed-off-by: Siqi Yan <siqi@meta.com>

…18368) Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com> Co-authored-by: yZhen <yZhen@fb.com> Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

jennyyyyzhen force-pushed the main branch from 64e929c to dd6dc24 Compare May 20, 2025 00:06

sarckk reviewed May 20, 2025

View reviewed changes

DarkLight1337 requested review from heheda12345 and ywang96 May 20, 2025 07:54

ywang96 reviewed May 30, 2025

View reviewed changes

houseroad reviewed May 30, 2025

View reviewed changes

jennyyyyzhen force-pushed the main branch from dd6dc24 to 89894f7 Compare May 30, 2025 19:59

jennyyyyzhen requested a review from DarkLight1337 as a code owner May 30, 2025 19:59

mergify bot added the multi-modality Related to multi-modality (#4194) label May 30, 2025

jennyyyyzhen force-pushed the main branch 2 times, most recently from 70aacd3 to ac307ed Compare May 30, 2025 20:58

houseroad reviewed May 31, 2025

View reviewed changes

vllm/multimodal/utils.py Outdated

Copy link
Copy Markdown

Collaborator

houseroad May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we need clone here?

jennyyyyzhen reacted with thumbs up emoji

houseroad approved these changes May 31, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label May 31, 2025

ywang96 approved these changes May 31, 2025

View reviewed changes

jennyyyyzhen force-pushed the main branch 3 times, most recently from 4a0e16e to e2d3ee5 Compare June 2, 2025 05:56

yZhen and others added 5 commits June 1, 2025 22:57

enable data parallel for L4 vision encoder

dcd616f

Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

Fix linter

c82c621

Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

address comment

4313092

Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

fix lint

bbdb10c

Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

remove clone

8349179

Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>

jennyyyyzhen force-pushed the main branch from e2d3ee5 to 8349179 Compare June 2, 2025 05:57

houseroad merged commit ebb1ec9 into vllm-project:main Jun 2, 2025
67 checks passed

cryptopic mentioned this pull request Jun 3, 2025

Unit Test for run_dp_sharded_vision_model #19103

Merged

tjtanaa mentioned this pull request Aug 12, 2025

[Feature]: Generalized the DP feature for ViT and multimodal backbone for the benefit of all models #22743

Closed

1 task

DarkLight1337 mentioned this pull request Aug 19, 2025

[CLI][Doc] Formalize --mm-encoder-tp-mode #23190

Merged

4 tasks

Uh oh!

Conversation

jennyyyyzhen commented May 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

sarckk left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented May 20, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywang96 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad May 30, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad May 31, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad May 31, 2025

Choose a reason for hiding this comment

Uh oh!

jennyyyyzhen Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad May 31, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad commented May 31, 2025

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jennyyyyzhen commented May 19, 2025 •

edited by github-actions bot

Loading