[Misc] Add CustomOp interface for device portability#5255
Conversation
| def dispatch_forward(self): | ||
| if is_hip(): | ||
| return self.forward_hip | ||
| elif is_cpu(): | ||
| return self.forward_cpu | ||
| else: | ||
| return self.forward_cuda |
There was a problem hiding this comment.
I feel we need more flexibility here in the future. For example, we may build a wheel with both CPU and CUDA enabled, but we want to configure which one to use on the fly. On the other hand, this may not be necessary at this moment.
There was a problem hiding this comment.
Good point. For now, vLLM is bound to a specific backend at the build time. I added a note that we do not support dynamic dispatching currently.
|
pytorch has quite a lot dispatching utilities, can we reuse some? |
@youkaichao Good point. I moved the dispatching logic to |
|
Hi @WoosukKwon , would you mind taking a look at #5047 before you land this? I've been working on registering all the custom operations via |
|
@bnellnm Thanks for bringing it up. If I understand correctly, this PR is orthogonal to yours. Basically, I believe your PR does NOT include per-device dispatching, because vLLM always builds the custom library for at most one device. Also, in our situation, dispatching can't be implemented at the C++ level, because we'd like to use Python libraries to implement some custom ops. |
Currently, the custom layers have two issues. First, they directly import
_custom_ops, which are not supported for devices such as TPU and Gaudi. Second, they assume that the custom ops are implemented in the same way for all devices. To address these issues, the PR addsCustomOpinterface, an indirection layer that implements the device-specificforwardmethods. This allows the custom kernels to be lazily imported only for the associated device.According to the benchmarks, the lazy import does not affect the performance: