Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"GPU - DirectML" setting is broken in the latest release #272

Closed
Sparronator9999 opened this issue Apr 23, 2023 · 52 comments
Closed

"GPU - DirectML" setting is broken in the latest release #272

Sparronator9999 opened this issue Apr 23, 2023 · 52 comments
Labels
bug Something isn't working
Milestone

Comments

@Sparronator9999
Copy link
Contributor

Describe the bug

When opening OBS with the inference mode set to GPU, the output will disappear (if background blur is off), or be entirely blurred (if background blur is on). Setting the inference mode back to CPU fixes this, but setting it back to GPU does nothing (this can be seen using Task Manager).

To Reproduce

Steps to reproduce the behavior:

  1. Add a Background Blur filter with Inference Device set to GPU
  2. Restart OBS, and open Task Manager
  3. Notice that the output is now completely missing/transparent (if you set Background Blur Factor higher than 0, the output will be completely blurred)
  4. Set the Inference Device back to CPU
  5. Notice the output works correctly again
  6. Set the Inference Device back to GPU
  7. Notice, in Task Manager, the CPU usage doesn't drop.

Expected behavior

GPU inference works correctly

Screenshots

To demonstrate the issue, I switched between CPU and GPU inference modes every few seconds, starting in CPU mode from a fresh start of OBS in both cases.

You can see clearly in the left screenshot (Before) that the CPU usage drops significantly as the work is offloaded to a GPU. The right screenshot (After) shows no change in CPU resource usage.

This issues was tested with Background Blur off, because of another issue (see #271).

Before (e2d9a11) After (Release v0.5.17)
before after

(as i'm writing this i realised i should have switched to the CPU tab, oops)

Log and Crash Report

2023-04-24 08-38-53.txt

2023-04-24 08-39-11.txt

Desktop (please complete the following information):

  • OS: Windows 10 (22H2)
  • Browser: N/A
  • Plugin Version: Latest (v0.5.17)
  • OBS Version: Latest (v29.0.2)

Additional context
Commit 322a098 (using prebuilt OnnxRuntime) appears to be the first commit where this problem occurs, as e2d9a11 does not have this issue.

@Sparronator9999 Sparronator9999 added the bug Something isn't working label Apr 23, 2023
@umireon
Copy link
Member

umireon commented Apr 24, 2023

\a\_work\1\s\onnxruntime\core\providers\dml\dml_provider_factory.cc(163)\onnxruntime.dll!00007FFB24C0C842: (caller: 00007FFB24C0C61E) Exception(1) tid(a28) 887A0004 The specified device interface or feature level is not supported on this system.

@umireon
Copy link
Member

umireon commented Apr 24, 2023

It seems that DirectML.dll included in this plugin is too old

@umireon
Copy link
Member

umireon commented Apr 24, 2023

@Sparronator9999 Can you try #274 build?

@Sparronator9999
Copy link
Contributor Author

Hello, I tried both #273 and #274, but neither appear to have fixed the problem.

@royshil
Copy link
Collaborator

royshil commented Apr 24, 2023

@umireon we shuold anyway not crash... we need a try-catch to intercept the exception and fall back to CPU
potentially we shuold alert the user, but i've no idea how to do that from an OBS plugin

@umireon
Copy link
Member

umireon commented Apr 25, 2023

@Sparronator9999

Please remove the following files:

C:\Program Files\obs-studio\obs-plugins\64bit\onnxruntime.dll
C:\Program Files\obs-studio\obs-plugins\64bit\DirectML.dll
C:\Program Files\obs-studio\obs-plugins\64bit\obs-backgroundremoval.pdb
C:\Program Files\obs-studio\obs-plugins\64bit\obs-backgroundremoval.pdb
C:\Program Files\obs-studio\data\obs-plugins\obs-backgroundremoval

Can you try this artifact after these files are removed?

https://github.com/royshil/obs-backgroundremoval/actions/runs/4799986561#:~:text=macos%2Dx86_64%2D03699db39-,obs%2Dbackgroundremoval%2Dwindows%2Dx64%2D03699db39,-33.1%20MB

@umireon umireon added this to the v0.5.18 milestone Apr 25, 2023
@umireon umireon pinned this issue Apr 25, 2023
@umireon
Copy link
Member

umireon commented Apr 25, 2023

Please be sure to place DirectML.dll and onnxruntime.dll into the obs-plugins directory.

@Sparronator9999
Copy link
Contributor Author

@umireon, I just tested the linked artifact after removing the files you listed. It still doesn't work properly (the plugin will still blur everything if set to GPU on startup).

I copied the contents of the zip file (the data and obs-plugins folders) to the root of the OBS directory, so everything should be where it needs to be.

@Sparronator9999
Copy link
Contributor Author

I have also just tested #279, which also currently doesn't appear to fix the problem.

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@Sparronator9999 Can you post the log here, please?

@Sparronator9999
Copy link
Contributor Author

Sure thing!

2023-04-26 14-16-50.txt

I started OBS with GPU mode for the plugin, then switched to CPU and back to GPU.

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@Sparronator9999 Can you post the result of Dependencies here?

Please start DependenciesGui and open C:\Program Files\obs-studio\obs-plugins\64bit\obs-backgroundremoval.dll
And open C:\Program Files\obs-studio\obs-plugins\64bit\onnxruntime.dll

The app of Dependencies is available on the following URL:
https://github.com/lucasg/Dependencies/releases/tag/v1.11.1

スクリーンショット 2023-04-26 13 35 00

スクリーンショット 2023-04-26 13 35 48

@Sparronator9999
Copy link
Contributor Author

I think I've already figured out what the issue is...

obs-backgroundremoval.dll:
obs-backgroundremoval

onnxruntime.dll (more than 1 page of deps):
onnxruntime (page 1)
onnxruntime (page 2)

@umireon
Copy link
Member

umireon commented Apr 26, 2023

Hmm... I don't see any problems

@Sparronator9999
Copy link
Contributor Author

I just re-installed OBS and the Background Removal plugin (from #279) and I'm still getting the same dependency error

@Sparronator9999
Copy link
Contributor Author

In the first screenshot, it shows obs.dll was not found?

@umireon
Copy link
Member

umireon commented Apr 26, 2023

obs.dll is okay to be missing

@Sparronator9999
Copy link
Contributor Author

OK... I guess?

Anyways, I'm still having the same problem as before.

@umireon
Copy link
Member

umireon commented Apr 26, 2023

I doubt the DirectML.dll is too old

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@Sparronator9999 Can you paste here the v0.5.16 log? Thanks!

@Sparronator9999
Copy link
Contributor Author

Here you go:

2023-04-26 15-10-05.txt

@umireon
Copy link
Member

umireon commented Apr 26, 2023

Can you install the latest version and remove the following files

C:\Program Files\obs-studio\obs-plugins\64bit\onnxruntime.dll
C:\Program Files\obs-studio\obs-plugins\64bit\DirectML.dll

and check if it works?

@Sparronator9999
Copy link
Contributor Author

Hello again, now the plugin isn't working at all. No options show up in the filter options now.

Here's the log file:
2023-04-26 15-31-30.txt

@Sparronator9999
Copy link
Contributor Author

I tested on v0.5.17 with the two DLLs removed from your previous comment.

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@Sparronator9999 Can you try #280 ?

@Sparronator9999
Copy link
Contributor Author

The plugin still doesn't work at all.

Log:
2023-04-26 16-40-11.txt

@umireon
Copy link
Member

umireon commented Apr 26, 2023

Can you post here the screenshot of DependenciesGui on C:\Program Files\obs-studio\obs-plugins\64bit\obs-backgroundremoval.dll again?

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@Sparronator9999 Can you restart the Windows and try again, please? I doubt some DLL caching causes this problem.

@Sparronator9999
Copy link
Contributor Author

One reboot later, and the plugin still doesn't work.

Here is the screenshot of Dependencies on obs-backgroundremoval.dll:

obs-backgroundremoval dll

It seems as though onnxruntime.dll is no longer being found at all by the plugin.

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@Sparronator9999 Please try #281. I'm sure this solves the problem.

@umireon
Copy link
Member

umireon commented Apr 26, 2023

@royshil I sum the understanding of this issue up below.

The root cause of this issue

onnxruntime.dll try to delay load DirectML.dll and the Windows link the DirectML.dll under System32.
This DirectML.dll is quite older than the one we bundle and causes the compatibility issue (unsupported feature level).

How we can resolve this issue?

Load DirectML.dll without delay loading or indicate the OS to load DLL from other directories than System32.
The former way requires us to link ONNX Runtime statically.
The latter way requires us to modify the settings which affect to the whole OBS application.

Detail

Windows 11 has a bit old onnxruntime.dll and DirectML.dll and they can be combined safely.
If we stop to bundle onnxruntime.dll and DirectML.dll, the system onnxruntime.dll and DirectML.dll will be used and our plugin works on Windows 11.
However, this approach cannot support Windows 10.
Windows 10 has quite an old DirectML.dll but no onnxruntime.dll is available.
So if we want to support Windows 10, we have to bundle onnxruntime.dll and DirectML.dll to our plugin.

Delay loading is another problem.
onnxruntime.dll is set to delay load DirectML.dll.
I don't know why but Windows will only load DLLs under System32 by default when delay loading.
This causes the loaded DirectML.dll to be what the OS has and this causes the mismatch between onnxruntime.dll and DirectML.dll.
To delay load our bundled DirectML.dll, we have to modify the application-wise settings.
To stop delay loading, we have to rebuild the ONNX Runtime.

My conclusion

We can fix the System32 delay loading problem by modifying the application-wise settings temporarily, but we have to link ONNX Runtime statically if we want to support Windows 10. I feel the statically-linking way is safer than the modifying-application-wise-settings way.

@royshil
Copy link
Collaborator

royshil commented Apr 26, 2023

the static link will require building ORT from scratch on Windows - which take ~45 minutes on CI
it's a huge pain in the butt
unless we can find an official static link pre-build binary of ORT, with the all the execution providers, as static .lib files

another thing i think we can and should do is build ORT from scratch as a static link .lib binary and cache this build in the cloud which would be downloaded in CI instead of being built from scratch every time

i can host it on my private S3 if needed, it's no big deal

what do you think @umireon ?

@umireon
Copy link
Member

umireon commented Apr 26, 2023

Can ccache which we had used be an option?

@royshil
Copy link
Collaborator

royshil commented Apr 26, 2023

Can ccache which we had used be an option?

yes it helps a bit - but it still does the "compilation" just the .os are cached
so you're not saving on CMake time, post-build stuff

if you just build and cache the complete post-build folder, and then download in CI, you save up on all the build steps and jump right to building the plugin

@umireon
Copy link
Member

umireon commented Apr 26, 2023

I suppose we can store the pre-built files on GitHub Releases or GitHub Packages for free because we are open-sourced.

@royshil
Copy link
Collaborator

royshil commented Apr 26, 2023

sounds great - how do we get started?

@Sparronator9999
Copy link
Contributor Author

@Sparronator9999 Please try #281. I'm sure this solves the problem.

@umireon It seems like ORT still isn't being detected by the plugin:

obs-backgroundremoval dll

I deleted all old plugin files before installing from #281 (including old onnxruntime.dll and DirectML.dll).

Log file:
2023-04-27 07-38-01.txt

@umireon
Copy link
Member

umireon commented Apr 27, 2023

I have to conclude that the modifying-application-wise-settings way is not our option.

@umireon
Copy link
Member

umireon commented Apr 27, 2023

@Sparronator9999 Can you try #283?

@Sparronator9999
Copy link
Contributor Author

Sparronator9999 commented Apr 27, 2023

@umireon Good news, the plugin works again. Bad news, GPU - DirectML still doesn't work.

Log file:
2023-04-27 15-35-52.txt

Dependencies screenshot:

obs-backgroundremoval dll

@umireon
Copy link
Member

umireon commented Apr 27, 2023

I forgot to include DirectML.dll sorry

@umireon
Copy link
Member

umireon commented Apr 27, 2023

@Sparronator9999 Can you try #283 again?

@Sparronator9999
Copy link
Contributor Author

It works!

@umireon
Copy link
Member

umireon commented Apr 27, 2023

@royshil I have started umireon/onnxruntime-static-win to store static ONNX Runtime

@royshil
Copy link
Collaborator

royshil commented Apr 27, 2023

@royshil I have started umireon/onnxruntime-static-win to store static ONNX Runtime

that's great - thanks! we should integrate in the build scripts

@umireon
Copy link
Member

umireon commented Apr 27, 2023

#286

@royshil
Copy link
Collaborator

royshil commented Apr 27, 2023

@umireon while you're at it - do you want to prebuild OpenCV static as well?
we can cut the Windows build time to just 1-2 minutes...
there's no use in re-building opencv (& ORT) over and over

@umireon
Copy link
Member

umireon commented Apr 27, 2023

I think building a pre-built binary by ourselves is the last resort.
The maintenance cost of prebuilt OpenCV binaries will be huge but it will shorten the build only for some minutes.

@umireon
Copy link
Member

umireon commented Apr 27, 2023

The maintenance cost of pre-built ONNX Runtime is also huge but it is worth doing because it saves us for hours.

@umireon
Copy link
Member

umireon commented Apr 28, 2023

This is fixed by #286

@dakenf
Copy link

dakenf commented May 26, 2023

Hello, everyone, I've found another solution to your problem. You can link your library with directml directly and call it with random data before using ONNX runtime

const IID MY_IID = { 0x12345678, 0x1234, 0x1234, { 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0 } };
DMLCreateDevice1(nullptr, DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_4_0, MY_IID, nullptr);

That will eliminate the need in statically linking

@dakenf
Copy link

dakenf commented May 26, 2023

Well, actually there's a better one https://github.com/dakenf/onnxruntime/blob/main/js/node/src/directml_load_helper.cc
Without calling the lib with incorrect parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants