Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare example to isolate shader miscompilation #199

Closed
wants to merge 3 commits into from

Conversation

raphlinus
Copy link
Contributor

When tile_alloc uses a workgroup shared variable to broadcase tile_offset, it gives wrong answers seemingly consistent with a nonexistent barrier on mac.

When tile_alloc uses a workgroup shared variable to broadcase `tile_offset`, it gives wrong answers seemingly consistent with a nonexistent barrier on mac.
@raphlinus
Copy link
Contributor Author

Uploading not to merge but to help with investigating a potential shader miscompilation. When run on mac (just cargo run in the piet-wgsl subdir), it gives this output:

0: 6
1: 6
2: 20
3: 11
4: 3741d830 0.0000115540315
8: 9
9: 9
10: 23
11: 14
12: 3741d94e 0.000011554292
16: c
17: c
18: 26
19: 17
20: 3741da6c 0.000011554552

With exact values varying from run to run.

The correct output is:

0: 6
1: 6
2: 20
3: 11
8: 9
9: 9
10: 23
11: 14
12: 11e
16: c
17: c
18: 26
19: 17
20: 23c

The values at offsets 4, 12, and 20 are instances of tile_offset + tile_subix being written by tile_alloc.wgsl. The behavior is consistent with a missing barrier right before the let tile_offset = sh_tile_offset; statement; it seems to be reading garbage values, while threads in the same subgroup do appear to be getting correct values (this particular example doesn't display any values read from the same subgroup, but it should be reasonably straightforward to modify it so).

I looked at the naga output and the threadgroup_barrier is present, which casts suspicion on the Metal compiler itself. I think it's worth trying to isolate this further, as it's a worrisome miscompilation.

This version of the repro just runs a single very simple shader with two bindings, and doesn't use the preprocessor.

It should print nothing (buffer is all zero), but prints garbage.
@raphlinus
Copy link
Contributor Author

I have indeed reduced the example a lot. Currently it's just running tile_alloc.wgsl, which is now a very simple shader that just does an atomic add (on one thread), followed by a barrier and writing the result of that add.

Make it so correct output is [1, 1, 1] rather than [0, 0, 0]
@raphlinus
Copy link
Contributor Author

Not intended to merge, so closing. Working around the miscompilation is still a relevant issue, though, and this could be done at many possible layers in the stack: in the shader (as is currently the case), disabling buffer robustness (or possibly changing the configuration to use clamping rather than conditionalization), or in naga to avoid patterns (such as passing threadgroup shared variables by reference to the main entry point) that trigger miscompilation. Probably an issue should be opened to track that work.

@raphlinus raphlinus closed this Nov 27, 2022
@raphlinus raphlinus deleted the miscompile branch November 27, 2022 16:47
raphlinus added a commit that referenced this pull request Mar 21, 2024
We have an ugly workaround for miscompilation of a workgroup uniform load (see #199). The miscompilation no longer repros, and it's not clear we still need it. Possibly something changed in naga, or possibly Apple drivers have improved.

It would be possible to go a little further and trim the allocation of the Paths array so it's not rounded up to workgroup size, but the practical benefit of that is marginal.

Sending this as a PR to see if there still may be problems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant