[core][torch.compile] not compile for profiling #7796

youkaichao · 2024-08-22T22:23:14Z

we only profile once to determine the space for kv-cache, and don't run profile anymore.

compiling this run will only add guards and code cache for useless code.

removing this compilation can reduce the overhead of dynamp.

github-actions · 2024-08-22T22:23:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-08-22T22:56:47Z

my measurement on gemma 2b shows this can reduce the Dynamo overhead by about 0.1~0.2 ms.

Dockerfile.tpu

tests/tpu/test_compilation.py

WoosukKwon

Thanks for the PR! Left minor comments.

.buildkite/run-tpu-test.sh

vllm/worker/tpu_model_runner.py

vllm/worker/model_runner_base.py

vllm/worker/model_runner.py

WoosukKwon · 2024-08-27T03:41:12Z

@youkaichao On Gemma-2B, this PR reduced the KV cache space from 34552 to 25752, 25% drop.

WoosukKwon · 2024-08-27T03:43:31Z

@youkaichao Can we do this instead?

# Compiled model only used for initial memory profiling
tmp_for_profile = torch.compile(model, ...)
# Compiled model used for actual execution.
self.model = torch.compile(model, ...)

youkaichao · 2024-08-27T03:45:39Z

@youkaichao On Gemma-2B, this PR reduced the KV cache space from 34552 to 25752, 25% drop.

you mean you also want the compilation for profiling stage?

WoosukKwon · 2024-08-27T03:50:24Z

you mean you also want the compilation for profiling stage?

Yeah, while I'm not sure, there's a chance that not using torch.compile can lead to suboptimal memory allocation, overestimating the memory usage in memory profiling.

youkaichao · 2024-08-27T03:52:06Z

@WoosukKwon how do you like the current implementation? compilation and optimiation still happens for the profiling run, but it will be discarded and does not affect later runs.

WoosukKwon · 2024-08-27T03:59:44Z

@youkaichao Looks very good to me! Thanks for the quick fix!

WoosukKwon · 2024-08-27T04:04:23Z

For gemma 2b, I didn't see much performance difference: the latency decreased from 1.59s to 1.58s (batch size 8, input len 1024, output len 128).

youkaichao · 2024-08-27T04:10:27Z

I'm measuring the time spent in the decode run, and I see a clear overhead reduction.

Previously, it takes 8ms for every step, now it only takes 7.5~7.6 ms.

WoosukKwon · 2024-08-27T04:19:08Z

@youkaichao I see. How does it work without --enforce-eager? Will the complied model still have only 2 guards?

youkaichao · 2024-08-27T04:30:42Z

How does it work without --enforce-eager? Will the complied model still have only 2 guards?

yes, you already marked symbolic shapes, and all shapes match the symbolic shapes.

…#7796)

…#7796) Signed-off-by: Alvant <[email protected]>

…#7796)

not compile for profiling

c7ace0a

youkaichao added 2 commits August 22, 2024 15:32

add compilation check

dcbbfca

add test

381023e

youkaichao added the torch.compile label Aug 22, 2024

fix test

374b322

youkaichao requested a review from WoosukKwon August 22, 2024 22:47

WoosukKwon reviewed Aug 23, 2024

View reviewed changes

Dockerfile.tpu Outdated Show resolved Hide resolved

youkaichao added 2 commits August 23, 2024 00:15

move depyf for test script only

f4acc73

test

e54da00

youkaichao requested a review from WoosukKwon August 23, 2024 07:26

mgoin reviewed Aug 23, 2024

View reviewed changes

tests/tpu/test_compilation.py Outdated Show resolved Hide resolved

add comments

db73048

WoosukKwon self-assigned this Aug 27, 2024

Merge branch 'main' into profile_no_compile

579bacc

WoosukKwon reviewed Aug 27, 2024

View reviewed changes

.buildkite/run-tpu-test.sh Outdated Show resolved Hide resolved

vllm/worker/tpu_model_runner.py Outdated Show resolved Hide resolved

vllm/worker/model_runner_base.py Outdated Show resolved Hide resolved

vllm/worker/model_runner_base.py Outdated Show resolved Hide resolved

WoosukKwon reviewed Aug 27, 2024

View reviewed changes

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

youkaichao added 2 commits August 26, 2024 20:38

change

6fad7d9

revert change

144e716

youkaichao added 2 commits August 26, 2024 20:41

change position

4213384

add comments

2d938c1

add both tests in tpu

934b60b

use reset

ae6fa27

use reset

2762535

youkaichao added 2 commits August 26, 2024 21:05

update tests

1a9f0d2

fix tests

e40afd9

WoosukKwon approved these changes Aug 27, 2024

View reviewed changes

youkaichao merged commit 64cc644 into vllm-project:main Aug 27, 2024
20 of 26 checks passed

youkaichao deleted the profile_no_compile branch August 27, 2024 04:34

triple-Mu pushed a commit to triple-Mu/vllm_official that referenced this pull request Sep 4, 2024

[core][torch.compile] discard the compile for profiling (vllm-project…

0f57653

…#7796)

Jeffwan pushed a commit to aibrix/vllm that referenced this pull request Sep 19, 2024

[core][torch.compile] discard the compile for profiling (vllm-project…

3f50f50

…#7796)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[core][torch.compile] discard the compile for profiling (vllm-project…

b369653

…#7796) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[core][torch.compile] discard the compile for profiling (vllm-project…

257b7bb

…#7796)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][torch.compile] not compile for profiling #7796

[core][torch.compile] not compile for profiling #7796

youkaichao commented Aug 22, 2024

github-actions bot commented Aug 22, 2024

youkaichao commented Aug 22, 2024

WoosukKwon left a comment

WoosukKwon commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024 •

edited

Loading

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

youkaichao commented Aug 27, 2024

[core][torch.compile] not compile for profiling #7796

[core][torch.compile] not compile for profiling #7796

Conversation

youkaichao commented Aug 22, 2024

github-actions bot commented Aug 22, 2024

youkaichao commented Aug 22, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024 • edited Loading

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024

youkaichao commented Aug 27, 2024

WoosukKwon commented Aug 27, 2024 •

edited

Loading