Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVR: Glitch in RadialDensityMask / Black screen without RadialDensityMask #53

Open
DavidYKay opened this issue Dec 3, 2019 · 16 comments

Comments

@DavidYKay
Copy link

DavidYKay commented Dec 3, 2019

System Information

  • Ogre Version: 2.2 - 8579b35
  • Operating System / Platform: Gentoo Linux 64-bit
  • RenderSystem: GL3+
  • GPU: NVIDIA GeForce GTX 1080

Detailed description

Hi guys,

Thanks very much for your work on OGRE.

I've used it on a few toy applications in the past and am currently integrating it into a VR application.

I'm having trouble getting the Tutorial_OpenVR sample to work correctly.

Case 1: Latest Code - Crash

The latest code, (both 8579b35 and 52a6f47), crashes with the following error:

An exception has occured: OGRE EXCEPTION(5:ItemIdentityException): Compute Job with name [Hash 0x704694c7] not found in HlmsCompute::findComputeJob at [OGRE_ROOT]/OgreMain/src/OgreHlmsCompute.cpp (line 557)

After defining OGRE_IDSTRING_ALWAYS_READABLE 1, the IdString in question turns out to be: VR/RadialDensityMaskReconstruct.

I've attempted to pass in a computer shader, using HlmsComputeJob::createComputeJob and guessing that /Samples/Media/Compute/VR/Foveated/RadialDensityMaskReconstruct_cs.glsl is the file in question, but I haven't gotten this to work yet.

Case 2: Old Code - Screen Door

In order to find a working example, I dug around in the git history and got these two commits to run, albeit with a significant rendering glitch.

ea3b3f1, 075d36b -> Both run, but with many visual artifacts.

pixelated

Case 3: Hybrid Code - Crash

Finally, I tried combining the latest OGRE code with the "working" Tutorial_OpenVR code,

OGRE commit: 52a6f47
OpenVR_Tutorial commit: ea3b3f1

But this produces the same results as first case, crashing with ItemIdentityException in HlmsCompute::findComputeJob.

Ogre.log & Callstack

Please let me know what I can do to get to the bottom of this. Thank you!

@darksylinc
Copy link
Member

Hi!

We're sorry you're experiencing these issues.

The error you're having almost always can be traced to Ogre not being built with rapidjson support, which is required to parse the Compute Shaders.

Thus even though Samples/Media/Compute/VR/VR.material.json is there, Ogre won't load it, and hence the "not found" error.

I thought the samples were handling this case to give a helpful message, but looks like not.

In CMake, Rapidjson_INCLUDE_DIR must be set as not found.

If you build Ogre following the instructions, rapidjson will be in ogredeps, which should stored in %OgreRepo%/Dependencies//include

We recommend you use the build scripts which will perform everything for you taking care of the minor issues (the scripts only write to local folders where the script is, doesn't require root)

Cheers

@DavidYKay
Copy link
Author

Thanks much for your reply!

Your suggestion of using the build scripts highlighted that I wasn't putting ogredeps/build/ogredeps in the proper location. 😅

I had originally been following these directions, which say to clone ogredeps into Ogre/Dependencies.

After fixing the issue and recompiling (on latest master, 5e624bc), the executable runs! 🙂

However, I'm now seeing a similar graphical glitch to the one I was encountering back in ea3b3f1:

bug1

  1. Any suggestions on what I might do to overcome this glitch?

  2. Could you please link me to the source behind this doc page so I can submit a PR to fix the docs?

Many thanks,
David

@DavidYKay DavidYKay changed the title OpenVR Sample: Latest won't run, older versions have rendering glitch OpenVR Sample: Rendering glitch Dec 17, 2019
@darksylinc
Copy link
Member

darksylinc commented Dec 17, 2019

  1. I have a hunch. Go to Samples/Media/2.0/scripts/Compositors/Tutorial_OpenVRWorkspace.compositor and remove every snippet that says:
store
{
	depth	dont_care
	stencil	dont_care
}

If still doesn't work, delete anything that says "dont_care" in that .compositor script.

If that does the job, then I'd ask you to pinpoint which of the "dont_care" is causing it

If that still has no solution, I'd ask you if you can create a RenderDoc capture and upload it here.

  1. The doc file is here https://github.com/OGRECave/ogre-next/blob/master/Docs/src/SettingUpOgre/SettingUpOgreLinux.md

@darksylinc
Copy link
Member

Btw this bug seems to be caused by either the HAM and/or the Radial Density Mask optimizations.

You can disable them in Tutorial_OpenVRGameState::createScene01:

Force bIsHamVrOptEnabled to false to disable the HAM
Remove this line:

const float radiuses[3] = {0.25f, 0.7f, 0.85f};
sceneManager->setRadialDensityMask( true, radiuses );

to disable the Radial Density Mask

@DavidYKay
Copy link
Author

Thank you for walking me through these scenarios!

For reference, here's a RenderDoc capture of the unadultered code from master (5e624bc).

I've tried all of your instructions and found the following:

  1. Eliminating the store { ...} snippets did not resolve the issue - RenderDoc capture

  2. Unfortunately, neither did removing the other dont_care from the file.

    I tried both of the following:

    load { all dont_care } -> load { all } - RenderDoc Capture
    load { all dont_care } -> [deleted] (No RenderDoc Capture)

  3. Disabling the RadialDensityMask had a mixed effect: in viewing RenderDoc, Colour Pass 1 no longer introduces any artifacts. However, the final output on-screen is black - RenderDoc capture

  4. I began fixing the docs and submitted a PR for comments on what to do about the GUI instructions.

Please let me know what action I should take from here.

Thanks again for your help.

@darksylinc
Copy link
Member

Ahhhhh!!!!!!!! There's a missing memory barrier, which I don't know how that happened. The compositor should've added one.

As a workaround, if you go to GL3PlusRenderSystem::_dispatch and set it to:

void GL3PlusRenderSystem::_dispatch( const HlmsComputePso &pso )
{
    glDispatchCompute( pso.mNumThreadGroups[0], pso.mNumThreadGroups[1], pso.mNumThreadGroups[2] );
    glMemoryBarrier( GL_SHADER_IMAGE_ACCESS_BARRIER_BIT );
}

it should fix the glitch.

But the real bugfix is to see why the Compositor isn't creating the barrier as expected
Thanks for bringing this to my attention

@DavidYKay
Copy link
Author

DavidYKay commented Dec 17, 2019

I'm glad that we were able to induce an epiphany. 🙂!

I added the memory barrier as instructed, but the issue still hasn't subsided. 🙁

I took two new captures (one with unmodified code, one with RadialDensityMask disabled) and looked over them. In both, I see a call to glDispatchCompute, which sits just after Colour Pass 1, and there's a call to glMemoryBarrier just after it, as expected.

As best I can tell, it seems that the issue is not the memory barrier, but that when RadialDensityMask is disabled, the "OpenVR Both Eyes" texture is completely black, so when it gets blitted to the screen, there's nothing to see!

I see the "OpenVR Both Eyes" texture in the example code and will start poking around, but some guidance would be greatly appreciated.

Thanks again,
David

@DavidYKay DavidYKay changed the title OpenVR Sample: Rendering glitch OpenVR: Rendering glitch in RadialDensityMask / Black screen without RadialDensityMask Dec 17, 2019
@DavidYKay DavidYKay changed the title OpenVR: Rendering glitch in RadialDensityMask / Black screen without RadialDensityMask OpenVR: Glitch in RadialDensityMask / Black screen without RadialDensityMask Dec 17, 2019
@DavidYKay
Copy link
Author

I'm also open to going down the path of fixing the compute shader for RadialDensityMask, but would likely need some input on that as well. Thanks.

@darksylinc
Copy link
Member

I'm in a hurry so I'll be brief:

Removing the RDM ends up with a black output because of a compositor pass that expects the RDM. I didn't realize that.

Try glMemoryBarrier before and after the glDispatch call, and use GL_ALL_BARRIER_BITS (I may have misspelled it, recalling from memory). Note that all bits may cause RenderDoc capture to take forever (due to GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT, perhaps you want to mask that one out)

What doesn't make sense is that the corrupted call is the final glDraw, which performs a copy. It's a very simple shader. And if you select other draws and go back and forth, you'll see that the corruption changes slightly (pay attention to the top borders), implying a race condition, thus implying a missing barrier

@DavidYKay
Copy link
Author

Good news: I was able to workaround the issue by changing the configuration of the compositor. 🙂

Bad news: I haven't been able to fix the RDM using glMemoryBarrier. 🙁

I'm unblocked for now, but it'd be great to fix the RDM. Let me know if you'd like me to investigate, and, if so, any suggestions on how to proceed.

Thanks again,
David

@peetCreative
Copy link

Hi, I'm stuck in the same problem. Thanks for your booth investigations.
@DavidYKay Can you maybe share the workaround for the Compositor?
I'm having a look as well now.

@darksylinc
Copy link
Member

Hi!

Could you try replacing GL3PlusRenderSystem::_dispatch with this ?

glMemoryBarrier( GL_ALL_BARRIER_BITS ^ GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT );
glDispatchCompute( pso.mNumThreadGroups[0], pso.mNumThreadGroups[1], pso.mNumThreadGroups[2] );
glMemoryBarrier( GL_ALL_BARRIER_BITS ^ GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT );

This is a nuclear workaround, but it will greatly help me in finding the issue to this problem. If that doesn't fix it, then it may be a problem with either NVIDIA drivers or OpenVR

@peetCreative
Copy link

Thank you for your reply! The workaround didn't help. However dispatch seems to be called correctly.
I changed in RenderSystems/GL3Plus/src/OgreGL3PlusRenderSystem.cpp

    void GL3PlusRenderSystem::_dispatch( const HlmsComputePso &pso )
    {
        glMemoryBarrier( GL_ALL_BARRIER_BITS^ GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT  );
        glDispatchCompute( pso.mNumThreadGroups[0], pso.mNumThreadGroups[1], pso.mNumThreadGroups[2] );
        glMemoryBarrier( GL_ALL_BARRIER_BITS ^ GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT );        
//         glDispatchCompute( pso.mNumThreadGroups[0], pso.mNumThreadGroups[1], pso.mNumThreadGroups[2] );
       //  glMemoryBarrier( GL_SHADER_IMAGE_ACCESS_BARRIER_BIT );
    }

How can I findout, if there is a memory barrier or not?

@peetCreative
Copy link

peetCreative commented Feb 19, 2020

From my side I found a fix.
I removed in ogre-next/Samples/Media/2.0/scripts/Compositors/Tutorial_OpenVRWorkspace.compositor the whole target stereo_output{...} and

 	texture rtt target_width target_height target_format msaa 4

I changed

-       in 0 stereo_output
+       in 0 rtt

This result in this view:
screenshot_good

So something goes wrong in ogre-next/Samples/Media/Compute/VR/Foveated/RadialDensityMaskReconstruct_cs.glsl
I guess this code should smooth this texture so the pixels are "distributed" to the black around.

Another question: is there a way to disable RDM?
EDIT: Aah, now I can also disable RDM. Stupid question

darksylinc added a commit that referenced this issue Apr 2, 2020
RDM since to be causing glitches in NVIDIA GPUs on Linux
The value is hardcoded in
Affects #53
darksylinc added a commit that referenced this issue Apr 2, 2020
RDM since to be causing glitches in NVIDIA GPUs on Linux
The value is hardcoded in
Affects #53
@darksylinc
Copy link
Member

I've pushed a couple bugfixes (including a race condition / missing memory barrier) however based on that the workarounds (forcing a memory barrier everywhere) did not work; I doubt this is enough to fix everything.

In #81 @peetCreative claims turning off MSAA solves the glitches. This is an interesting hint I am now investigating

@darksylinc
Copy link
Member

Is this still an issue?

I just realized that a couple months ago we fixed a bug where we invalidated the wrong colour buffer when MSAA is enabled, which could have caused this problem, specially on modern NVIDIA GPUs which don't ignore framebuffer invalidation (older GPUs used to ignore it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants