[DISCUSS] Common DLPack Harness for Web #124

tqchen · 2023-03-31T15:24:44Z

We have circulated the ideas a bit and I think it might be worthwhile kicking off some discussions.

Background and Motivation

DLPack supported many frameworks in the python ecosystem with a broad set of platforms. One of the things that are emerging and would be super nice is to think about support for the Web platform.
Specifically, with the emergence of WebGPU, it would be really nice for frameworks that operate on the browser environment to be able to perform the zero-cost exchange. In this case, the primary use could be WebGPU, but it also can include web assembly memory. For example, it would be really ideal that the WebGPU NDArray/Tensor backed by framework A can be used by framework B in a zero-copy fashion.
Borrowing past experience from DLPack, we find that some initial discussion from broad stakeholders can be helpful to form a minimum but a sufficient foundation that frameworks can sure. So we would like to start an initial kickoff discussion to welcome everyone to chime in here before we come down to any concrete actions.
This thread aimed at an initial kick-off discussion to see people's preferences and what kind of minimum common format makes sense for the web env. Based on our past lessons, with sufficient input and prior knowledge, likely something in common would emerge to form a minimum thing that the framework can share and reuse. DLPack header is one such example.

Some Initial Technical Considerations

We list some of the initial design considerations.

Reuse of the data structure

DLPack already comes with WebGPU flag, and the overall layout can be used for array exchange. One potential way that we can start with is to replicate most of the array structures but in javascript, which allows effective exchange among frameworks. Following the success lesson from DLPack, it is important that the exchange tensor carry a deleter, which allows the caller frameworks to define its own memory pool and ways of recycling the framework.

Minimum WebGPU Harness Among Frameworks

Unlike CUDA, where the nvidia driver defines the common global device across the environment and we can simply refer to them as "cuda:0" and "cuda:1". There is no standard default WebGPU device globally. Each application has to use its own adapter to create a WebGPU Device. While this enables flexibility across applications, it prevents the potential sharing among applications. The WebGPU device from app0 can be different from app1.

To enable sharing across the device, we need to have the ability to ensure that "webgpu:0" from framework A is the same as "webgpu:0" from framework B, which means something similar to the common cuda runtime layer.

To resolve this problem, we need a minimum harness across frameworks to create a common WebGPU device. Of course, this would mean that the frameworks will need to depend on this part of the code (e.g. possibly as a webdlpack package), but the intention would be to keep it as a minimum, so frameworks for example, can still have their own memory-pool and runtime mechanism if necessary internally.

Here is one initial strawman just to demonstrate the idea

// common context shared across frameworks
// find a way to have frameworks to obtain a global singleton in env
class WebDLPackContext {
   // common setup logic to request webgpu device
   async setup(cfg) {
   }  
   // called by frameworks to get the default device by ID.
   getWebGPUDevice(device_id: number): WebGPUDevice {
   }
};

WebAssembly Compatibility

Some of the frameworks might need to compile through WASM. That would mean a common C ABI compatible layout in memory would be useful. Luckily, DLPack already provides that. The main follow-up question is to make sure frameworks agree on a common WebGPU harness. One of the main thing that is missing is the WASM Buffer pointer translation support.

Specifically, when we have a GPU pointer in wasm, such a pointer needs to be translated to a WebGPU Buffer that is retained by the javascript runtime. If we want to have a common layer that also supports exchange among WASM DLTensors. Then we will need to have a common wasm buffer translation layer that handles buffer allocation, and translation with the following functions.

GPUPointer which is an alias of number
allocWebGPU() ->GPUPointer
gpuPointerToBuffer(ptr:GPUPointer) ->GPUBuffer
freeWebGPU(ptr: GPUPointer)

// common context shared across frameworks
// find a way to have frameworks to obtain a global singleton in env
class WebDLPackContext {
   // common setup logic to request webgpu device
   
   private gpuBuffers: Record<number, GPUBuffer> = {};

   async setup(cfg) {
   }  
   // called by frameworks to get the default device by ID.
   getWebGPUDevice(device_id: number): WebGPUDevice {
   }
   // translation layer that translates buffer from 
   getBufferFromGPUPtr(ptr: number): GPUBuffer {
   }
};

The text was updated successfully, but these errors were encountered:

tqchen · 2023-03-31T15:30:32Z

based on initial discussions with @bwasti, @gyagp, @mattsoulanille

gyagp mentioned this issue Apr 12, 2023

DLPack for the Web webatintel/tvm-web#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Common DLPack Harness for Web #124

[DISCUSS] Common DLPack Harness for Web #124

tqchen commented Mar 31, 2023 •

edited

Loading

tqchen commented Mar 31, 2023

[DISCUSS] Common DLPack Harness for Web #124

[DISCUSS] Common DLPack Harness for Web #124

Comments

tqchen commented Mar 31, 2023 • edited Loading

Background and Motivation

Some Initial Technical Considerations

Reuse of the data structure

Minimum WebGPU Harness Among Frameworks

WebAssembly Compatibility

tqchen commented Mar 31, 2023

tqchen commented Mar 31, 2023 •

edited

Loading