-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[js/web] WebGPU backend via JSEP #14579
Conversation
commit 340c88b Author: Yulong Wang <[email protected]> Date: Thu Sep 8 13:40:31 2022 -0700 batch mode commit b160840 Author: Yulong Wang <[email protected]> Date: Tue Jul 26 17:00:39 2022 -0700 sum commit 306a19b Author: Yulong Wang <[email protected]> Date: Mon Jul 25 19:04:48 2022 -0700 squeeze + transpose commit 86d8d3a Author: Yulong Wang <[email protected]> Date: Mon Jul 18 16:31:59 2022 -0700 fix webgpu test launch commit e104d17 Author: Yulong Wang <[email protected]> Date: Tue Jul 12 16:52:54 2022 -0700 shape commit a2197f0 Author: Yulong Wang <[email protected]> Date: Tue Jul 12 13:49:15 2022 -0700 pool commit 59b10fb Author: Yulong Wang <[email protected]> Date: Thu Jul 7 17:32:56 2022 -0700 upgrade to latest webgpu spec commit 4ed1bfb Author: Yulong Wang <[email protected]> Date: Tue Jun 28 14:23:08 2022 -0700 naive conv commit 7c5e446 Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:37:12 2022 -0700 check webgpu backend in execution loop commit b0d7dfa Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:31:19 2022 -0700 dump shader source only in debug mode commit 7fca0ea Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:17:27 2022 -0700 add verbose log for buffer upload/download commit 179712b Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:06:03 2022 -0700 fix program key commit 67ea4cb Author: Yulong Wang <[email protected]> Date: Wed Jun 8 15:05:20 2022 -0700 concat: fix 1 input commit 21b5dfe Author: Yulong Wang <[email protected]> Date: Tue Jun 7 16:13:12 2022 -0700 matmul (no-broadcast) commit a8def8e Author: Yulong Wang <[email protected]> Date: Thu Jun 2 17:56:15 2022 -0700 ... commit e871138 Author: Yulong Wang <[email protected]> Date: Fri May 27 16:12:56 2022 -0700 slice (scalar) commit 75c7941 Author: Yulong Wang <[email protected]> Date: Thu May 26 16:54:53 2022 -0700 slice (...) commit 40b15e4 Author: Yulong Wang <[email protected]> Date: Thu May 26 12:45:16 2022 -0700 slice commit 9d92513 Author: Yulong Wang <[email protected]> Date: Wed May 25 22:37:48 2022 -0700 gemm (scalar) commit c1185b4 Author: Yulong Wang <[email protected]> Date: Tue May 24 16:54:43 2022 -0700 gemm... commit 99653f5 Author: Yulong Wang <[email protected]> Date: Tue May 24 16:54:20 2022 -0700 format code commit 86c75bb Author: Yulong Wang <[email protected]> Date: Tue May 24 11:39:35 2022 -0700 gemm commit 79dd539 Author: Yulong Wang <[email protected]> Date: Fri Apr 8 04:46:03 2022 -0700 concat commit 25c9d2a Author: Yulong Wang <[email protected]> Date: Thu Apr 7 19:32:48 2022 -0700 gather commit 6627349 Author: Yulong Wang <[email protected]> Date: Thu Apr 7 18:46:53 2022 -0700 binary ops commit fb81d7f Author: Yulong Wang <[email protected]> Date: Wed Apr 6 17:55:07 2022 -0700 binary - add commit 073695f Author: Yulong Wang <[email protected]> Date: Wed Apr 6 17:54:24 2022 -0700 optimize types commit e9775fe Author: Yulong Wang <[email protected]> Date: Tue Apr 5 16:45:27 2022 -0700 working commit cba119c Author: Yulong Wang <[email protected]> Date: Tue Apr 5 15:10:26 2022 -0700 upgrade @webgpu/[email protected] commit ed17c57 Author: Yulong Wang <[email protected]> Date: Tue Apr 5 03:37:29 2022 -0700 neg commit e8e4d88 Author: Yulong Wang <[email protected]> Date: Mon Apr 4 16:28:52 2022 -0700 other f32 unary operators commit a1fbcfd Author: Yulong Wang <[email protected]> Date: Fri Apr 1 17:24:10 2022 -0700 leaky relu commit dbe57fe Author: Yulong Wang <[email protected]> Date: Fri Apr 1 17:09:27 2022 -0700 exp, floor commit 3b883b9 Author: Yulong Wang <[email protected]> Date: Fri Apr 1 16:43:15 2022 -0700 elu commit aac2fc6 Author: Yulong Wang <[email protected]> Date: Thu Mar 24 20:30:54 2022 -0700 always create storage buffer with 16 bytes alignment commit ad6bd01 Author: Yulong Wang <[email protected]> Date: Thu Mar 24 20:30:07 2022 -0700 fix unary funcs async signature commit a782667 Author: Yulong Wang <[email protected]> Date: Wed Mar 23 19:57:38 2022 -0700 fix upload commit b6e7fba Author: Yulong Wang <[email protected]> Date: Wed Mar 23 15:36:58 2022 -0700 reshape commit dfbf6f3 Author: Yulong Wang <[email protected]> Date: Thu Mar 24 16:11:31 2022 -0700 clip and ceil commit 55af08e Author: Yulong Wang <[email protected]> Date: Thu Mar 24 15:57:58 2022 -0700 fix clip commit 41274ba Author: Yulong Wang <[email protected]> Date: Thu Mar 24 14:58:23 2022 -0700 try more unary ops commit fe850d1 Author: Yulong Wang <[email protected]> Date: Mon Mar 14 16:15:58 2022 -0700 first operator (correctness validated) commit ba09337 Author: Yulong Wang <[email protected]> Date: Fri Jan 28 17:50:56 2022 -0800 enable initialization of webgpu commit 3fb2712 Author: Yulong Wang <[email protected]> Date: Fri Jan 28 17:50:24 2022 -0800 install webgpu typescript type declaration commit ed35262 Author: Yulong Wang <[email protected]> Date: Fri Jan 28 14:53:50 2022 -0800 [POC] __blank ( npm test -- -b=webgpu )
### Description This PR resolves a part of non-critical comments from code review comments in #14579. - use `USE_JSEP` instead of `USE_JS` in build definition to make it less ambiguous - remove unused util functions from util.ts - fix transpose.h - other misc fixes
### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (*ON_COMPLETE)(PVOID state, DATA *data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview **Inter-op** JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. **Resource Management** Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. **about data transfer** `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. **run kernel in JS** Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_*`. **disabled features** memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. **prefer channels last** JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. **Testing code** It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <[email protected]>
### Description This PR resolves a part of non-critical comments from code review comments in microsoft#14579. - use `USE_JSEP` instead of `USE_JS` in build definition to make it less ambiguous - remove unused util functions from util.ts - fix transpose.h - other misc fixes
Can this be used to execute models with WebGPU on desktop? |
not now, but probably can do via dawn nodejs binding in future |
to set expectations: I don't think there will be webgpu support for native desktop apps anytime soon. |
Hopefully something it is moving on the Google's DART side of the moon |
throw new Error('WebGpuBackend: WebGPU is not available.'); | ||
} | ||
|
||
const adapter = await navigator.gpu.requestAdapter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I recommend passing powerPreference
when requesting a GPU adapter so that developers can request which type of GPU they're looking for.
In huggingface/transformers.js#545 for instance, it would be preferable to test the "high-performance" GPU.
const adapter = await navigator.gpu.requestAdapter(); | |
const adapter = await navigator.gpu.requestAdapter({ | |
powerPreference: 'high-performance' | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great! 🔥 It will also be useful if we can get the selected adapter, without having to re-request an adapter. cc @guschmue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Will think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#19857 is created to address this. Please take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my understanding on some OSes, like Windows, if we have multi-GPU and integrated GPU is the one chosen by Chrome during startup, simply set powerPreference to high-performance will not force WebGPU to utilize the discrete GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's currently true on Windows. See https://source.chromium.org/chromium/chromium/src/+/main:gpu/command_buffer/service/webgpu_decoder_impl.cc;drc=5bc3326c91b582cbec543bc9896201a1c56bdebd;l=1621
On macOS, this is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI for Windows, here's the bug: https://issues.chromium.org/issues/329211593
### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois
### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois
### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in #14579 (comment). - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois
### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (*ON_COMPLETE)(PVOID state, DATA *data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview **Inter-op** JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. **Resource Management** Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. **about data transfer** `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. **run kernel in JS** Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_*`. **disabled features** memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. **prefer channels last** JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. **Testing code** It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <[email protected]>
### Description This PR resolves a part of non-critical comments from code review comments in microsoft#14579. - use `USE_JSEP` instead of `USE_JS` in build definition to make it less ambiguous - remove unused util functions from util.ts - fix transpose.h - other misc fixes
Description
This change introduced the following new components into ONNX Runtime Web:
Code need to be polished. still working on it.
Q&A
What is JSEP?
Why JSEP?
What is WebGPU?
What is the async problem and why we have the problem?
What is Emscripten? How is the Asyncify feature resolved the problem?
Design Overview
Inter-op
JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js:
This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime:
jsepBackend
: assign the singleton object to webassembly modulejsepAlloc
andjsepFree
: implementation of data transfer's Alloc() and Free()jsepCopy
: synchronized copy ( GPU to GPU, CPU to GPU)jsepCopyAsync
: asynchronized copy ( GPU to CPU)jsepCreateKernel
andjsepReleaseKernel
: a corresponding object that maintained in JS to match lifecycle of Kernel in ORTjsepRun
: OpKernel::Compute() should call into thisThe abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript.
Resource Management
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes.
about data transfer
js::DataTransfer::CopyTensor
implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macroEM_ASYNC_JS
is used to wrap the async function to be called in the synchronized context.run kernel in JS
Kernel class constructor calls once
jsepCreateKernel()
with an optional per-kernel specific serialization to pass attributes into JavaScript.Compute()
are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macroEM_ASM_*
.disabled features
memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it.
prefer channels last
JSEP prefers channels last and returns
DataLayout::NHWC
in methodGetPreferredLayout()
. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used.Testing code
It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API.