Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Crash on PC with AMD GPU (capturer = wlroots) #81

Closed
ar7eniyan opened this issue Jul 25, 2023 · 10 comments · Fixed by #82
Closed

[Bug]: Crash on PC with AMD GPU (capturer = wlroots) #81

ar7eniyan opened this issue Jul 25, 2023 · 10 comments · Fixed by #82

Comments

@ar7eniyan
Copy link
Contributor

Steps for reproducing the issue

I have an AMD RX 6600 GPU, probably it's what causes the problem. When I run wluma with capturer set to wlroots in config.toml, it crashes shortly after starting.

What is the buggy behavior?

$ wluma
[2023-07-25T10:53:34Z INFO  wluma] Continue adjusting brightness and wluma will learn your preference over time.
ac_compute_device_uuid's output is based on invalid pci bus info.
thread 'predictor-BenQ GW2270' panicked at 'Unable to compute luma percent: ERROR_UNKNOWN', src/frame/capturer/wlroots.rs:128:26

What is the expected behavior?

It should run properly, like when I set capturer to none in config

Logs

$ sudo pacman -S vulkan-validation-layers
...
$ export VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation
$ RUST_LOG=debug wluma
[2023-07-25T11:05:11Z DEBUG wluma] Using Config {
        als: Time {
            thresholds: {
                11: "normal",
                20: "night",
                18: "dark",
                9: "dim",
                16: "normal",
                0: "night",
                7: "dark",
                13: "bright",
            },
        },
        output: [
            DdcUtil(
                DdcUtilOutput {
                    name: "BenQ GW2270",
                    capturer: Wlroots,
                    min_brightness: 1,
                },
            ),
        ],
    }
[2023-07-25T11:05:13Z DEBUG wluma::brightness::ddcutil] Discovered displays: ["BenQ GW2270 W9G06281SL0"]
[2023-07-25T11:05:13Z DEBUG wluma::brightness::ddcutil] Using display 'BenQ GW2270 W9G06281SL0' for config 'BenQ GW2270'
[2023-07-25T11:05:13Z INFO  wluma] Continue adjusting brightness and wluma will learn your preference over time.
ac_compute_device_uuid's output is based on invalid pci bus info.
[2023-07-25T11:05:13Z DEBUG wluma::frame::capturer::wlroots] Using output 'BNQ BenQ GW2270 W9G06281SL0 (HDMI-A-1)' for config 'BenQ GW2270'
VUID-vkBindImageMemory-size-01049(ERROR / SPEC): msgNum: 694023407 - Validation Error: [ VUID-vkBindImageMemory-size-01049 ] Object 0: handle = 0xcb3ee80000000007, type = VK_OBJECT_TYPE_IMAGE; Object 1: handle = 0xead9370000000008, type = VK_OBJECT_TYPE_DEVICE_MEMORY; | MessageID = 0x295df4ef | vkBindImageMemory(): memory size minus memoryOffset is 0x7e9000 but must be at least as large as VkMemoryRequirements::size value 0x870000, returned from a call to vkGetImageMemoryRequirements with image. The Vulkan spec states: The difference of the size of memory and memoryOffset must be greater than or equal to the size member of the VkMemoryRequirements structure returned from a call to vkGetImageMemoryRequirements with the same image (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkBindImageMemory-size-01049)
    Objects: 2
        [0] 0xcb3ee80000000007, type: 10, name: NULL
        [1] 0xead9370000000008, type: 8, name: NULL
thread 'predictor-BenQ GW2270' panicked at 'Unable to compute luma percent: ERROR_UNKNOWN', src/frame/capturer/wlroots.rs:128:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Version

wluma-git 4.2.0.r4.g54874c1-1 installed from AUR

Environment

- Arch Linux
- kernel version: 6.4.4-arch1-1
- sway 1:1.8.1-1
- vulkan-radeon 23.1.4-2
- amdvlk 2023.Q2.3-1 (set as a default vulkan implementation)
- mesa 23.1.4-2
@maximbaz
Copy link
Owner

Thanks for reporting, and including these details! Do you have multiple GPUs (e.g. integrated and discrete), or this is the only one?

@ar7eniyan
Copy link
Contributor Author

No, RX 6600 is the only one installed

@maximbaz
Copy link
Owner

Very curious! This is the key point:

vkBindImageMemory(): memory size minus memoryOffset is 0x7e9000 but must be at least as large as VkMemoryRequirements::size value 0x870000, returned from a call to vkGetImageMemoryRequirements with image.

We call vkBindImageMemory() in two places in the code:

self.device.bind_image_memory(image, image_memory, 0)?;

wluma/src/frame/vulkan.rs

Lines 279 to 280 in 54874c1

self.device
.bind_image_memory(frame_image, frame_image_memory, 0)?;

I suspect that the first line is where the issue occurs (would be nice if you could confirm), because the error also speaks about VkMemoryRequirements, which is only called here:

let image_memory_req = unsafe { self.device.get_image_memory_requirements(image) };

The way I understand the error is that VkMemoryRequirements tells us that we need enough space to put image of 0x870000 number of pixels (equal to 8847360 RGBA values in decimal) into the memory we pre-created, but the image we actually want to store is only 0x7e9000 number of pixels (8294400 RGBA values, which corresponds to 4 (RGBA) * 1920 * 1080 - likely your resolution).

The 0x7e9000 value sounds correct, so it sounds like we have a 1920x1080 image, but the image requirements call somehow returns a bigger size than necessary?

To be honest I don't know if I'll be able to debug it without being able to reproduce, would you be interested to dig a little bit into this issue?

I'd start from figuring out where exactly the issue occurs (in init_image or init_frame_image), and then adding lots of debug logs or simply println!() to log everything possible, all sizes, to try to find where the discrepancy begins.

If you have any questions or findings, just post them here, and I'm sure together we'll be able to find a solution 😉

@ar7eniyan
Copy link
Contributor Author

ar7eniyan commented Jul 26, 2023

I've spent some time debugging this, and here's what I've found:

  1. The error occurs here:

    wluma/src/frame/vulkan.rs

    Lines 278 to 281 in 54874c1

    unsafe {
    self.device
    .bind_image_memory(frame_image, frame_image_memory, 0)?;
    }

  2. As far as I understand this tutorial, Image requires some memory for its internals, so the correct memory size for an allocation is provided by vkGetImageMemoryRequirements like this:

    frame_image_memory_req = unsafe { self.device.get_image_memory_requirements(frame_image) };
    ...
    frame_image_allocate_info = vk::MemoryAllocateInfo::builder()
        .push_next(&mut frame_import_memory_info)
        .allocation_size(frame_image_memory_req.size)
        .memory_type_index(0);

    Also, it's the way how the memory is allocated in init_image(). But that still crashes, oddly displaying no error from validation layers.

  3. That line from above with memory_type_index being set to 0 is potentially incorrect; the correct value for it is set like here (the process is the same for both Image and Buffer):

    wluma/src/frame/vulkan.rs

    Lines 102 to 120 in 54874c1

    let buffer_memory_req = unsafe { device.get_buffer_memory_requirements(buffer) };
    let device_memory_properties =
    unsafe { instance.get_physical_device_memory_properties(physical_device) };
    let memory_type_index = find_memory_type_index(
    &buffer_memory_req,
    &device_memory_properties,
    vk::MemoryPropertyFlags::HOST_VISIBLE | vk::MemoryPropertyFlags::HOST_COHERENT,
    )
    .ok_or("Unable to find suitable memory type for the buffer")?;
    let allocate_info = vk::MemoryAllocateInfo {
    allocation_size: buffer_memory_req.size,
    memory_type_index,
    ..Default::default()
    };
    let buffer_memory = unsafe { device.allocate_memory(&allocate_info, None)? };
    Anyway, it's not the cause of the crash.

  4. The purpose of these lines is unclear to me:

    wluma/src/frame/vulkan.rs

    Lines 264 to 266 in 54874c1

    let mut frame_import_memory_info = vk::ImportMemoryFdInfoKHR::builder()
    .handle_type(vk::ExternalMemoryHandleTypeFlags::DMA_BUF_EXT)
    .fd(frame.fds[0]);
    However, commenting out .push_next(&mut frame_import_memory_info) seems to fix the crash. The memory is successfully bound to an image, and the program continues normal execution.

I haven't tested if it calculates lightness percent properly with the above corrections yet, but at least the crash doesn't happen now.

@ar7eniyan
Copy link
Contributor Author

ar7eniyan commented Jul 26, 2023

I have just checked the values returned by luma_percent(), and they were randomly changing while the image was still. So the values don't seem to make any sense, sadly.

@maximbaz
Copy link
Owner

Great findings!

As far as I understand this tutorial, Image requires some memory for its internals, so the correct memory size for an allocation is provided by vkGetImageMemoryRequirements like this:

frame_image_memory_req = unsafe { self.device.get_image_memory_requirements(frame_image) };
...
frame_image_allocate_info = vk::MemoryAllocateInfo::builder()
.push_next(&mut frame_import_memory_info)
.allocation_size(frame_image_memory_req.size)
.memory_type_index(0);

This sounds logical! The fact that validation layer stops complaining is also a positive sign, even if the app still crashes, it sounds like the right thing to do. Once we fix the issue and it starts working for you, we can try again without this snippet to see if it will crash again, but I'm quite sure you are right.

  • Could you log out of curiosity the frame_image_memory_req.size and frame.sizes, just to see the values?
  • Without any issues reported by validation layers, does it still crash with Unable to compute luma percent: ERROR_UNKNOWN?
  • I remember that in order to debug exactly what's going on in Vulkan calls and where exactly things crash, there was something called "API dump" layer, which literally prints every single API call that Vulkan executes, until it crashes. It will produce a huge log, but I think it would be very helpful to get it, to be able to understand on what operation exactly it crashes. Could you try to figure out how to get it, and share a pastebin of it?

The purpose of these lines is unclear to me

In case of "frame image", we already have the image of your screen contents inside the GPU, somewhere in memory, and the goal is to get access to its pixels directly, without a copy. For this purpose, we use the DMA_BUF extension, which gives us the ability to get a pointer to that memory, and the pointer is the frame.fds[0] (it's always an array with a single value while WLR_DRM_NO_MODIFIERS is set due to #8).

Once we have the information about the frame memory in frame_import_memory_info, I believe the reason to pass it via .push_next(&mut frame_import_memory_info) is that so the allocate_memory call actually links to it. Without linking frame_import_memory_info, you are probably getting a random garbage in the image object.


That line from above with memory_type_index being set to 0 is potentially incorrect; the correct value for it is set like here (the process is the same for both Image and Buffer)...

You are absolutely correct, the honest reason for the hacky zero is that I have no hardware to test whether this is actually working, on a system with only one GPU the index will anyway be always zero (to confirm you can simply print the value of memory_type_index when using your snippet).

Happy to take it, but probably a less important issue for you, as I don't think it would make any difference to your setup (unless it doesn't actually equal to zero for you?).

ar7eniyan added a commit to ar7eniyan/wluma that referenced this issue Jul 27, 2023
- Handle the case when Vulkan image needs dedicated memory
- Fixes maximbaz#81
ar7eniyan added a commit to ar7eniyan/wluma that referenced this issue Jul 27, 2023
- Handle the case when Vulkan image needs dedicated memory
- Fixes maximbaz#81
@ar7eniyan
Copy link
Contributor Author

I've finally resolved the issue. The violated part of specification is VUID-vkBindImageMemory-image-01445:

If image requires a dedicated allocation (as reported by vkGetImageMemoryRequirements2 in VkMemoryDedicatedRequirements::requiresDedicatedAllocation for image), memory must have been created with VkMemoryDedicatedAllocateInfo::image equal to image

To handle the case correctly, we need to add a check for a dedicated allocation requirement and pass a VkMemoryDedicatedAllocateInfo extending MemoryAllocateInfo if necessary. I'll create a PR to apply the fix.

@maximbaz
Copy link
Owner

This is extremely interesting, thanks for sharing and submitting a PR! Could you tell me how you nailed it down to VUID-vkBindImageMemory-image-01445?

@ar7eniyan
Copy link
Contributor Author

Could you tell me how you nailed it down to VUID-vkBindImageMemory-image-01445?

While I was debugging it, I was checking the spec for the "valid usage" requirements on vkBindImageMemory() one by one and found this:

Constraints on the values returned for image resources are:

And in the init_frame_image() code, the image was created using VkExternalMemoryImageCreateInfo.

@maximbaz
Copy link
Owner

It's all very very cool, and I'm really glad that you managed to fix it for your system, and were willing to share your learnings and contribute your findings in a PR 🙏 I just tagged a new release, in case it helps 😉 Enjoy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants