[Misc] Add CustomOp interface for device portability by WoosukKwon · Pull Request #5255 · vllm-project/vllm

WoosukKwon · 2024-06-04T17:27:21Z

Currently, the custom layers have two issues. First, they directly import _custom_ops, which are not supported for devices such as TPU and Gaudi. Second, they assume that the custom ops are implemented in the same way for all devices. To address these issues, the PR adds CustomOp interface, an indirection layer that implements the device-specific forward methods. This allows the custom kernels to be lazily imported only for the associated device.

class CustomOp(nn.Module):

    def forward(self, *args, **kwargs):
        if not hasattr(self, "_forward_method"):
            self._forward_method = self.dispatch_forward()
        return self._forward_method(*args, **kwargs)

    def forward_native(self, *args, **kwargs):
        """PyTorch-native implementation of the forward method."""
        raise NotImplementedError

    def forward_cuda(self, *args, **kwargs):
        raise NotImplementedError

    def forward_hip(self, *args, **kwargs):
        ...

    def forward_xpu(self, *args, **kwargs):
        ...

    def forward_cpu(self, *args, **kwargs):
        ...

    def forward_tpu(self, *args, **kwargs):
        ...

    def forward_gaudi(self, *args, **kwargs):
        ...

    def dispatch_forward(self):
        if is_hip():
            return self.forward_hip
        elif is_cpu():
            return self.forward_cpu
        else:
            return self.forward_cuda

According to the benchmarks, the lazy import does not affect the performance:

$ python benchmarks/benchmark_latency.py --model JackFram/llama-68m
# main
Avg latency: 0.3685314357979223 seconds
# This PR
Avg latency: 0.3665724984719418 seconds

vllm/model_executor/layers/activation.py

vllm/model_executor/custom_op.py

comaniac · 2024-06-04T18:20:15Z

vllm/model_executor/custom_op.py

+    def dispatch_forward(self):
+        if is_hip():
+            return self.forward_hip
+        elif is_cpu():
+            return self.forward_cpu
+        else:
+            return self.forward_cuda


I feel we need more flexibility here in the future. For example, we may build a wheel with both CPU and CUDA enabled, but we want to configure which one to use on the fly. On the other hand, this may not be necessary at this moment.

Good point. For now, vLLM is bound to a specific backend at the build time. I added a note that we do not support dynamic dispatching currently.

youkaichao · 2024-06-04T18:25:20Z

pytorch has quite a lot dispatching utilities, can we reuse some? forward usually has a special meaning in pytorch, and it has a special handling in torch.compile . Having these new forward_xxx method might break future torch.compile integration.

WoosukKwon · 2024-06-04T19:09:27Z

pytorch has quite a lot dispatching utilities, can we reuse some? forward usually has a special meaning in pytorch, and it has a special handling in torch.compile . Having these new forward_xxx method might break future torch.compile integration.

@youkaichao Good point. I moved the dispatching logic to __init__ so that it is not included in the scope of torch.compile. After this change, I believe the PR itself does not break torch.compile.

comaniac

LGTM!

bnellnm · 2024-06-04T20:35:05Z

Hi @WoosukKwon , would you mind taking a look at #5047 before you land this? I've been working on registering all the custom operations via TORCH_LIBRARY (which also has per device dispatching). I'm worried these changes might be at odds with TORCH_LIBRARY/pytorch dispatching.

WoosukKwon · 2024-06-04T21:31:55Z

@bnellnm Thanks for bringing it up. If I understand correctly, this PR is orthogonal to yours. Basically, I believe your PR does NOT include per-device dispatching, because vLLM always builds the custom library for at most one device. Also, in our situation, dispatching can't be implemented at the C++ level, because we'd like to use Python libraries to implement some custom ops.

tlrmchlsmth

lgtm :)

WoosukKwon added 9 commits June 4, 2024 16:29

Add CustomOp Interface

e510c0d

Move activation

d9d43a6

Move layernorm

19bff1c

Move RoPE

8bff05a

Minor

af0d31e

Fix

a631e7f

Fix

a1486ff

Fix

e135eae

Merge branch 'main' into dispatcher

31e4930

WoosukKwon marked this pull request as draft June 4, 2024 17:30

WoosukKwon added 4 commits June 4, 2024 17:40

Revert model changes

16bab8e

move back

41b9a2a

forward_native

7986c0f

revert

24e11d2

WoosukKwon marked this pull request as ready for review June 4, 2024 17:49

WoosukKwon requested a review from comaniac June 4, 2024 17:50

comaniac reviewed Jun 4, 2024

View reviewed changes

WoosukKwon added 2 commits June 4, 2024 18:58

Move dispatch to offline

cdc62a2

Add note

d1182e7

comaniac approved these changes Jun 4, 2024

View reviewed changes

pcmoritz approved these changes Jun 4, 2024

View reviewed changes

tlrmchlsmth approved these changes Jun 4, 2024

View reviewed changes

WoosukKwon merged commit 41ca62c into main Jun 5, 2024

WoosukKwon deleted the dispatcher branch June 5, 2024 16:18

chengzhi-lu pushed a commit to chengzhi-lu/vllm that referenced this pull request Jun 6, 2024

[Misc] Add CustomOp interface for device portability (vllm-project#5255)

d3c19ba

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 11, 2024

[Misc] Add CustomOp interface for device portability (vllm-project#5255)

f5d9197

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Misc] Add CustomOp interface for device portability (vllm-project#5255)

014e9bc

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024

[Misc] Add CustomOp interface for device portability (vllm-project#5255)

0c081a8

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024

[Misc] Add CustomOp interface for device portability (vllm-project#5255)

5b14fa1

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Misc] Add CustomOp interface for device portability (vllm-project#5255)

4df85a9

whx-sjtu mentioned this pull request Aug 28, 2025

[RFC]: Design a new Layer-Pluggable abstraction to work together with CustomOp #23786

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[Misc] Add CustomOp interface for device portability#5255

[Misc] Add CustomOp interface for device portability#5255
WoosukKwon merged 15 commits intomainfrom
dispatcher

WoosukKwon commented Jun 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

comaniac Jun 4, 2024

Uh oh!

WoosukKwon Jun 4, 2024

Uh oh!

youkaichao commented Jun 4, 2024

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

comaniac left a comment

Uh oh!

bnellnm commented Jun 4, 2024

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Comments

Conversation

WoosukKwon commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comaniac Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Jun 4, 2024

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Jun 4, 2024

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

bnellnm commented Jun 4, 2024

Uh oh!

WoosukKwon commented Jun 4, 2024

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

WoosukKwon commented Jun 4, 2024 •

edited

Loading