Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.4.x] Fix launch bounds in spatial transformer #13898

Conversation

KellenSunderland
Copy link
Contributor

@KellenSunderland KellenSunderland commented Jan 16, 2019

Description

Without launch_bounds compiler is not required to use small enough number of registers to fit 1024 threads per block. Our internal CI with CUDA 10 build was failing on V100 because of this.

Original PR: #13188

Checklist

Essentials

Please feel free to remove inapplicable items for your PR:

  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Added launch_bounds guards around BilinearSampling[Forward,Backward]Kernel to ensure that the compiled operator works on each supported GPU.

Comments

@KellenSunderland KellenSunderland changed the title Fix launch bounds in spatial transformer [v1.4.x] Fix launch bounds in spatial transformer Jan 16, 2019
@stu1130
Copy link
Contributor

stu1130 commented Jan 16, 2019

@mxnet-label-bot add [pr-awaiting-review]
Thanks for the contribution @KellenSunderland

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jan 16, 2019
Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the limit of 1 thread was verified at runtime or what is the best way to get those limits in general? I remember we dealt with these in the past.

@larroy
Copy link
Contributor

larroy commented Jan 17, 2019

@mxnet-label-bot update [pr-awaiting-merge]

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Jan 17, 2019
* Fix launch bounds in spatial transformer

* Adding explanation in comment.
@KellenSunderland KellenSunderland force-pushed the pr_bilinear_backward_launch_bounds_1_4 branch from cb0d731 to 0cd3cde Compare January 17, 2019 16:22
@KellenSunderland
Copy link
Contributor Author

Yeah I also read this as limiting to a single thread, but it's actually limiting to 1024 threads and setting a minimum number of block launches to 1. See correspondence in the original PR: #13188

@KellenSunderland
Copy link
Contributor Author

Didn't get a committer review by deadline so I'm closing this one off. Fix is already in master and should be available if we have many user reports for this error.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants