[Kernel] Enhance MoE benchmarking & tuning script by WoosukKwon · Pull Request #4921 · vllm-project/vllm

WoosukKwon · 2024-05-20T17:59:23Z

This PR is to enhance the MoE tuning & benchmarking script which is a bit hacky at the moment. Also, the PR enables using multiple GPUs for benchmarking via Ray.

WoosukKwon · 2024-05-20T18:19:15Z

@pcmoritz The PR is not ready now. I will ping you once it's ready.

pcmoritz · 2024-05-20T18:30:27Z

Sounds great, thank you :)

WoosukKwon · 2024-06-03T18:35:59Z

@pcmoritz This PR is ready now. Sorry for the delay.

pcmoritz · 2024-06-03T20:00:45Z

One small gotcha I was running into while trying this out is that currently fp8 can't be benchmarked with an FP16 checkpoint, e.g.

python benchmark_moe.py --dtype fp8

errors out since mistralai/Mixtral-8x7B-Instruct-v0.1 is FP16. I think what we should do here is

diff --git a/benchmarks/kernels/benchmark_moe.py b/benchmarks/kernels/benchmark_moe.py
index 6796ea401..3f3005e20 100644
--- a/benchmarks/kernels/benchmark_moe.py
+++ b/benchmarks/kernels/benchmark_moe.py
@@ -46,6 +46,8 @@ def benchmark_config(
         w2_scale = torch.randn(num_experts, dtype=torch.float32)
         a1_scale = torch.randn(1, dtype=torch.float32)
         a2_scale = torch.randn(1, dtype=torch.float32)
+        w1 = w1.to(torch.float8_e4m3fn)
+        w2 = w2.to(torch.float8_e4m3fn)
 
     input_gating = torch.empty(num_tokens, num_experts, dtype=torch.float32)

since FP8 checkpoints are not widely available yet and also for vLLM FP8 we support running FP16 checkpoints in FP8 :)

WoosukKwon · 2024-06-04T02:47:06Z

@pcmoritz I addressed your comments. PTAL.

pcmoritz

Thanks! I've been using the new script to do some tuning for FP8 and it works like a charm, thanks a lot for improving it -- I'll open a PR with the new configs shortly after I have tested the configs!

Btw, in order to get progress bars, I've been using this modification:

from ray.experimental.tqdm_ray import tqdm

and then where we iterate over the configs:

for config in tqdm(search_space):

This will make sure to print progress bars without messing up stdout and it works like this: https://docs.ray.io/en/latest/ray-observability/user-guides/configure-logging.html#distributed-progress-bars-tqdm

Feel free to add it (don't worry it is currently in the experimental namespace -- I think it is one of the APIs that should be stabilized and I'll look into that).

WoosukKwon · 2024-06-04T03:01:22Z

@pcmoritz Ray tqdm is really cool! I actually wanted to have exactly the same feature. Happy to add that!

Tune Qwen2-57B-A14B configs based on #4921 Throughput Performance command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2 A100 GPU benchmark no config w/ PR tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s

…#5497) Tune Qwen2-57B-A14B configs based on vllm-project#4921 Throughput Performance command: python benchmarks/benchmark_throughput.py --model=Qwen/Qwen2-57B-A14B-Instruct --input-len 1000 --output-len 50 -tp 2 A100 GPU benchmark no config w/ PR tp=2 10.53 requests/s, 11058.17 tokens/s 12.47 requests/s, 13088.57 tokens/s tp=4 17.77 requests/s, 18662.95 tokens/s 20.20 requests/s, 21212.32 tokens/s

WoosukKwon added 2 commits May 20, 2024 17:54

[Kernel] Improve benchmark_moe script

e00d01d

Delete benchmark_mixtral_moe

717e8c7

pcmoritz self-assigned this May 20, 2024

WoosukKwon added 10 commits May 21, 2024 01:56

Minor fixes

1bbed44

Fix

1324af4

Minor

66f418a

Merge branch 'main' into bench-moe

5c99cd6

Merge branch 'main' into bench-moe

419468b

Update

9088e17

Update

2e42872

Add get_default_configs

e5e46e8

Minor

005b07c

Fix search space

2b05e50

WoosukKwon marked this pull request as ready for review June 3, 2024 18:22

youkaichao reviewed Jun 3, 2024

View reviewed changes

Comment thread benchmarks/kernels/benchmark_moe.py

Minor

6f472bb

pcmoritz reviewed Jun 3, 2024

View reviewed changes

Comment thread benchmarks/kernels/benchmark_moe.py Outdated

pcmoritz reviewed Jun 3, 2024

View reviewed changes

Comment thread benchmarks/kernels/benchmark_moe.py Outdated

Address comments

fe1c026

WoosukKwon requested a review from pcmoritz June 4, 2024 02:47

pcmoritz approved these changes Jun 4, 2024

View reviewed changes

WoosukKwon added 2 commits June 3, 2024 19:59

Add Ray tqdm

04c3d11

isort

f78a73b

Merge branch 'main' into bench-moe

ff474db

WoosukKwon merged commit 3a434b0 into main Jun 4, 2024

WoosukKwon deleted the bench-moe branch June 4, 2024 03:07

pcmoritz mentioned this pull request Jun 4, 2024

[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100 #5238

Merged

WoosukKwon mentioned this pull request Jun 4, 2024

[Kernel] Add back batch size 1536 and 3072 to MoE tuning #5242

Merged

chengzhi-lu pushed a commit to chengzhi-lu/vllm-infersche that referenced this pull request Jun 6, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

a4f2d70

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

1d88071

wenyujin333 mentioned this pull request Jun 13, 2024

[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 #5497

Merged

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

d74f5fb

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

d5ac5f1

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

dae8010

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Kernel] Enhance MoE benchmarking & tuning script (vllm-project#4921)

cb18cf9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Kernel] Enhance MoE benchmarking & tuning script#4921

[Kernel] Enhance MoE benchmarking & tuning script#4921
WoosukKwon merged 17 commits intomainfrom
bench-moe

WoosukKwon commented May 20, 2024 •

edited

Loading

Uh oh!

WoosukKwon commented May 20, 2024

Uh oh!

pcmoritz commented May 20, 2024

Uh oh!

Uh oh!

WoosukKwon commented Jun 3, 2024

Uh oh!

pcmoritz commented Jun 3, 2024

Uh oh!

Uh oh!

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

pcmoritz left a comment •

edited

Loading

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

WoosukKwon commented May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WoosukKwon commented May 20, 2024

Uh oh!

pcmoritz commented May 20, 2024

Uh oh!

Uh oh!

WoosukKwon commented Jun 3, 2024

Uh oh!

pcmoritz commented Jun 3, 2024

Uh oh!

Uh oh!

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

pcmoritz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WoosukKwon commented May 20, 2024 •

edited

Loading

pcmoritz left a comment •

edited

Loading