[Performance]: GPU utilization is low when running large batches on H100 #6560

sleepwalker2017 · 2024-07-19T02:49:34Z

Proposal to improve performance

Hi all, I'm running vicuna 13B on H100 using fp8, and I find when batch size is large, say 64 or 96, the gpu utilization is low, about 60%, this is an important cause for the low performance.

I did some analysis, part of this is caused by the schedule and post process of requests.

Do you have any plans for improving this?

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

The text was updated successfully, but these errors were encountered:

youkaichao · 2024-07-19T06:34:07Z

definitely, it is listed in #5805

VincentXWD · 2024-09-02T13:09:16Z

@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?
Thanks!

sleepwalker2017 · 2024-09-03T02:39:58Z

@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?

The command is nothing special, I think it's only nsys profile python xxx.py. you can refer to nsys manual to see the usage.

VincentXWD · 2024-09-03T05:23:39Z

@sleepwalker2017 btw could you please share the command when profiling vLLM using Nsight?

The command is nothing special, I think it's only nsys profile python xxx.py. you can refer to nsys manual to see the usage.

Thanks for your quick reply. I see. It seems that you were using nsys to profile a single .py script. I thought that you were benchmarking the service.

artetaout · 2024-09-25T00:16:36Z

For benchmarking the server, I will add nvtx marker to control the code start and end domain, just before and after .step

github-actions · 2024-12-24T02:00:50Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

sleepwalker2017 added the performance Performance-related issues label Jul 19, 2024

sleepwalker2017 mentioned this issue Aug 12, 2024

[RFC]: Refactor the service pipeline to overlap GPU execution and CPU operations #7408

Closed

github-actions bot added the stale label Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]: GPU utilization is low when running large batches on H100 #6560

[Performance]: GPU utilization is low when running large batches on H100 #6560

sleepwalker2017 commented Jul 19, 2024

youkaichao commented Jul 19, 2024

VincentXWD commented Sep 2, 2024

sleepwalker2017 commented Sep 3, 2024

VincentXWD commented Sep 3, 2024

artetaout commented Sep 25, 2024

github-actions bot commented Dec 24, 2024

[Performance]: GPU utilization is low when running large batches on H100 #6560

[Performance]: GPU utilization is low when running large batches on H100 #6560

Comments

sleepwalker2017 commented Jul 19, 2024

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

youkaichao commented Jul 19, 2024

VincentXWD commented Sep 2, 2024

sleepwalker2017 commented Sep 3, 2024

VincentXWD commented Sep 3, 2024

artetaout commented Sep 25, 2024

github-actions bot commented Dec 24, 2024