Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backdrop_dyn] Handle upstream pipeline failure #553

Merged
merged 1 commit into from
May 9, 2024

Conversation

armansito
Copy link
Collaborator

Following #537 it is possible for the flatten stage to fail and flag a failure. In some cases this can cause invalid / corrupt bounding box data to propagate downstream, leading to a hang in the per-tile backdrop calculation loop.

Triggering this is highly subtle, so I don't have a test case as part of vello scenes that can reliably reproduce this. Regardless, it makes sense to check for the upstream failures and terminate the work in general.

I made backdrop_dyn check for any upstream failure and I didn't make it signal its own failure flag. I also didn't change the logic in the CPU shader since the other stages I checked (flatten, coarse) do not implement error signaling in their CPU counterparts. Let me know if you'd like me to work on those.

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks as expected. This is among my planned changes for robust dynamic memory, so it's good to see this change made in the minimal form.

I've not ran it, but the CI smoke tests should catch this being wrong.

Has any thought been put into applying this more consistently throughout the pipeline?

@@ -34,6 +46,9 @@ fn main(
sh_row_width[local_id.x] = path.bbox.z - path.bbox.x;
row_count = path.bbox.w - path.bbox.y;
sh_offset[local_id.x] = path.tiles;
} else {
// Explicitly zero the row width, just in case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't very clear to me what "just in case" here means.
I see that we're not doing a scan over sh_row_width, and so in theory we won't ever be reading this value.

To be clear, I think this change is fine - especially once we start to use gfx-rs/wgpu#5508.
But is this actually fixing an issue, or just programming defensively?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My analysis matches Daniel's. In any case, this code won't be around very long, I want to replace it with partition-wide prefix sum of the backdrop values, so being defensive seems preferable to over-optimizing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed being defensive for the case where workgroup memory initialization might be turned off. The scan in the following lines is over sh_row_count but I think it's technically possible for the loop at the very bottom to read this sh_row_width value if el_ix happens to match local_id.x?

At any rate, I think I'll leave this in and land this as is.

@waywardmonkeys waywardmonkeys added this to the Vello 0.2 release milestone May 3, 2024
Following #537 it is possible for the flatten stage to fail and flag
a failure. In some cases this can cause invalid / corrupt bounding box
data to propagate downstream, leading to a hang in the per-tile
backdrop calculation loop.

Triggering this is highly subtle, so I don't have a test case as part of
vello scenes that can reliably reproduce this. Regardless, it makes
sense to check for the upstream failures and terminate the work in
general.
@armansito armansito enabled auto-merge May 9, 2024 22:45
@armansito armansito added this pull request to the merge queue May 9, 2024
Merged via the queue into main with commit f7ecbd7 May 9, 2024
15 checks passed
@armansito armansito deleted the backdrop-dyn-robust branch May 9, 2024 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants