-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added DML and CUDA provider support in onnxruntime-node #16050
Conversation
@microsoft-github-policy-service agree |
@snnn are the existing tasks in Zip-*-Packaging-Pipeline generates DLLs for cuda and DML? |
@fs-eire it does not require cuda libs. Will just throw JS error "onnxruntime_providers_cuda.dll(so)" failed to load" if not installed on user's system when using "cuda" execution provider. But it needs bundled DirectML.dll because the one included with windows is outdated. And it's quite small in size compared to cuda provider (11mb vs about 400 for cuda), so i guess having it to work out of the box is good. And to have cuda user will need to install cuda and cudnn anyway. |
Yes |
I think it should be good for the code part. My concern is that we should also update our CI so that the published npm package in current onnxruntime-node package, we have those files under /bin/napi-v3/win32/x64/:
In my understanding, we need to add the following files as well:
and not sure if we need a different version of onnxruntime.dll/onnxruntime_providers_shared.dll (is it same for CPU/GPU ?) |
DML EP is not a pluggable EP. It is in onnxruntime.dll. And it is not compatible with CUDA EP. |
I didn't know what DML EP is not compatible with CUDA EP. Does this mean there is no way to build with both DML and CUDA support? If so, do we need to prepare 2 different onnxruntime.dll files under different folder to support DML and CUDA? |
After your conversation I've run some more builds/tests on Windows and WSL2 (and realized that on Win it did not even build correctly because of include paths, sorry) So here are key things:
Now the explanation
There are options to improve user experience
Let me know if you have a workaround for point 3, or if you want me to remove static linking for some testing. Or any other feedback. Thanks P.S. in the worst case it can work only with CUDA provider until the issue with DirectML.dll loading is resolved, but having DML will simplify user experience on Windows by a magnitude |
🤨 As a quick sanity check, I tried this minimal example with ORT 1.15.0 and DML 1.12.0, and it's definitely loading both Though, I have all my DLL's and the .exe in the same directory, and maybe the issue here is that the .exe and plugin .dll's exist in different directories? Maybe LoadLibrary appears to favor the system32 path only because it fails to find DirectML.dll in the .exe path, and the system DirectML.dll is later in the search path. This makes me wonder what most other current customers using DML are doing, because this is the first I've heard of this challenge (unless everybody else just puts all the DLL's into the .exe path too because they have complete control over their distribution and binary paths) 🤔. |
Yeah, i guess the problem is that node.js interpreter lies somewhere in Program Files and node binding lib is loaded from "node_modules/onnxruntime-node/.." in target JS project directory. Same issue with OBS plugin i've linked. I've tried LoadLibrary and LoadLibraryEx with exact path but it did not help :( UPD: the node binding loads all libraries itself as they are linked during the build, but i thought if i try to load it manually before making any calls to ONNX runtime, it would force to use already loaded one |
When nodejs call LoadLibraryEx, did they specify LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR? |
It does not use LoadLibraryEx to load, i've just tried to load manually to see if it helps @fdwr I've found a workaround. Link DirectML with node binding AND call it with some random data before initializing ONNX runtime
That way it loads DirectML.dll from the node binding folder. And runtime uses the already loaded library even when linked dynamically. I've tried just linking before but as no functions were used, it did not work. Thanks for your time :) |
@dakenf: Does calling |
yes, you are right. it dit not work before because i assumed
would return current module path. But it was returning the path to node.js interpreter as it should by design. Instead i should have used something like
I guess sometimes it's better to stop and come back with a fresh look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a more robust approach. Added some comments, but still deferring to Yulong and Guenther for actual approval, since I do not own/know this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Arthur. Minor comments - easier than my previous one. (still deferring to owners like Yulong and Guenther for actual sign-off)
@fdwr can you also help me resolving this warning? it does not affect anything but just outputs to console when DML provider is used
Should i change options for ORT memory allocator when using DML provider or just ignore it? |
What opset is your .onnx model? That means there is some operator which the DML EP doesn't support, or a newer version of an existing operator. It's a perf concern, because some operators are falling back to the CPU (incurring GPU<->CPU synchronization), but the model will still run. So I wouldn't worry about it for your change, but it would be useful to run |
perf_test with -v flag returned quite a lot of nodes (i havent included full output, it has lots of pages) |
Fixed parsing sessionOptions when model passed as a Buffer
I've also fixed an error that prevented sessionOptions from being parsed when model was created with 4 arguments |
/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Thanks everyone BTW, @fdwr DML gives quite meaningful error messages anyway, still better than CUDA Actually joking since it's from different test cases and most likely my test cases are not correct |
@dakenf: 😉 Not really, I agree, but the error messages are much more informative in the ORT debug build if you enable the Direct3D debug layer: Start / Run / dxcpl.exe. e.g.: I'm not sure how it works within ORT Node, but for a C++ process, you can see the diagnostic output in Visual Studio's output subwindow. |
Well, i guess there's a reasonable explaination as when i've got "Windows Internals Book" about 10 years ago with ms action pack (or some kind of other promotion for free, don't exactly remember) i got quite a lot of "ah, that's why" In ort node you open a cmd and call |
) ### Description The yaml file changes made in #16050 do not really work. Currently the pipeline is failing with error: ``` Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib ``` So, I will revert the yaml changes first to bring the pipeline back. Some people are waiting for our nightly packages. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results ### Motivation and Context
Is the cuda support added to the onnxruntime-node? I am using onnxruntime-node v 1.6.0 in the electron. It seems that the cuda and directml providers don't work. |
Same here |
Since the npm package has not been updated yet, the |
Still waiting for release!! |
Looks like none of the platforms have had a release yet. Any idea when this will happen? |
I have finished test for DML on windows and still pending on testing CUDA on linux. Once I get a dev package published I will update it here. |
) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request microsoft#14127 which is tracked here microsoft#14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------
…rosoft#17441) ### Description The yaml file changes made in microsoft#16050 do not really work. Currently the pipeline is failing with error: ``` Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib ``` So, I will revert the yaml changes first to bring the pipeline back. Some people are waiting for our nightly packages. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results ### Motivation and Context
Any updates? :) @fs-eire |
Oh I missed this thread. So it should be working now at version |
) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request microsoft#14127 which is tracked here microsoft#14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------
Description
I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error)
Motivation and Context
It fixes this feature request #14127 which is tracked here #14529
I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential.
Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs