Fix launch bounds in spatial transformer #13188

ptrendx · 2018-11-08T23:05:48Z

Description

Without launch_bounds compiler is not required to use small enough number of registers to fit 1024 threads per block. Our internal CI with CUDA 10 build was failing on V100 because of this.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR:

To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Added launch_bounds guards around BilinearSampling[Forward,Backward]Kernel to ensure that the compiled operator works on each supported GPU.

Comments

harshp8l · 2018-11-09T02:54:20Z

@mxnet-label-bot add [pr-awaiting-review]

stu1130 · 2018-11-20T03:07:57Z

@samskalicky @access2rohit @anirudh2290 could you please review it thanks!

vandanavk · 2018-11-27T21:22:40Z

@mxnet-label-bot add [Operator]

@apeforest for review

Roshrini · 2018-12-21T21:34:12Z

@eric-haibin-lin @anirudh2290 @samskalicky can you please review this PR?

samskalicky · 2018-12-21T21:48:49Z

@ptrendx was there a corresponding issue filed for the V100 CI failure?

ptrendx · 2018-12-21T21:53:39Z

It was our internal (NVIDIA's) CI failure, not external MXNet's CI, so no, there was not an issue for it.

samskalicky

@ptrendx Changes look straightfoward, no issues but I dont quite understand the fix.

Is 1 thread the correct value? Why choose that value (ie. how about 2)?

Please add some comment regarding the reasoning for the 1 thread value.

Are there unit tests to check this change, how can we validate the correctness?

ptrendx · 2018-12-21T22:21:19Z

It is 1 block, not 1 thread. Basically what this change does is tell the compiler: I'm going to run this kernel with cuda::maxThreadsPerBlock threads and so make sure that at least 1 block fits on SM. It does not limit the kernel to run only 1 block, this is minimum value.
Basically without this change compiler does not know how many threads will be used to run the kernel and so is free to generate code that uses as many registers as it wants. However, the size of register file is not infinite and so the more registers are used by the kernel, the less threads can run at the same time and trying to run the kernel with more threads results in a failure to launch (with too many resources requested for launch error). Since the kernel is launched with cuda::maxThreadsPerBlock threads, we need to make sure that it can be launched on every architecture.

Actually, I just found an issue about spatial transformer giving exactly this error (that was "fixed" by changing the kernel and so luckily being below the threshold again): #11568

anirudh2290

@ptrendx makes sense, #11568 had two issues of which assert issue was fixed but the too many resources requested for launch was hidden.

Roshrini · 2019-01-02T22:01:20Z

@samskalicky Can you check if your comment is addressed? Thanks

samskalicky · 2019-01-02T22:14:24Z

@ptrendx Can you add a comment in the code to make it clear that these launch bounds are required to be set this way to avoid the "too many resources requested for launch" error?

Maybe something like:

/*
 * __launch_bounds__(cuda::kMaxThreadsPerBlock, 1)
 * This sets the number of threads per block to 1 which
 * reduces the number of register resources needed.
 * Running the kernel with more threads results in a 
 * failure to launch (too many resources requested error).
 */

ptrendx · 2019-01-02T23:21:13Z

I added a comment with explanation.

samskalicky

Thanks @ptrendx for your help with this issue!

ptrendx · 2019-01-04T18:08:40Z

Is there anything else needed for this PR?

Roshrini · 2019-01-04T18:50:10Z

@anirudh2290 Can you merge this PR if it looks good?
@mxnet-label-bot Update [Operator, pr-awaiting-merge]

KellenSunderland · 2019-01-14T22:31:48Z

Looks similar to a few fixes we've provided in the past when we had too many registers for a few kernels to run on a TX1. LGTM.

* Fix launch bounds in spatial transformer * Adding explanation in comment.

ptrendx requested a review from anirudh2290 as a code owner November 8, 2018 23:05

marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 9, 2018

marcoabreu added the Operator label Nov 27, 2018

Fix launch bounds in spatial transformer

c184b07

ptrendx force-pushed the pr_bilinear_backward_launch_bounds branch from f322d69 to c184b07 Compare December 10, 2018 18:22

samskalicky suggested changes Dec 21, 2018

View reviewed changes

anirudh2290 approved these changes Dec 22, 2018

View reviewed changes

Adding explanation in comment.

cb20d53

samskalicky approved these changes Jan 2, 2019

View reviewed changes

marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Jan 4, 2019

KellenSunderland merged commit 0faa5b7 into apache:master Jan 14, 2019

KellenSunderland pushed a commit to KellenSunderland/incubator-mxnet that referenced this pull request Jan 17, 2019

Fix launch bounds in spatial transformer (apache#13188)

0cd3cde

* Fix launch bounds in spatial transformer * Adding explanation in comment.

KellenSunderland mentioned this pull request Jan 17, 2019

[v1.4.x] Fix launch bounds in spatial transformer #13898

Closed

2 tasks

anirudh2290 mentioned this pull request Mar 5, 2019

[MXNET-1227] Adding CornerPooling operator #13401

Closed

8 tasks

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

Fix launch bounds in spatial transformer (apache#13188)

958b22c

* Fix launch bounds in spatial transformer * Adding explanation in comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix launch bounds in spatial transformer #13188

Fix launch bounds in spatial transformer #13188

ptrendx commented Nov 8, 2018 •

edited

Loading

harshp8l commented Nov 9, 2018

stu1130 commented Nov 20, 2018

vandanavk commented Nov 27, 2018

Roshrini commented Dec 21, 2018

samskalicky commented Dec 21, 2018

ptrendx commented Dec 21, 2018

samskalicky left a comment

ptrendx commented Dec 21, 2018

anirudh2290 left a comment

Roshrini commented Jan 2, 2019

samskalicky commented Jan 2, 2019 •

edited

Loading

ptrendx commented Jan 2, 2019

samskalicky left a comment

ptrendx commented Jan 4, 2019

Roshrini commented Jan 4, 2019

KellenSunderland commented Jan 14, 2019

Fix launch bounds in spatial transformer #13188

Fix launch bounds in spatial transformer #13188

Conversation

ptrendx commented Nov 8, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

harshp8l commented Nov 9, 2018

stu1130 commented Nov 20, 2018

vandanavk commented Nov 27, 2018

Roshrini commented Dec 21, 2018

samskalicky commented Dec 21, 2018

ptrendx commented Dec 21, 2018

samskalicky left a comment

Choose a reason for hiding this comment

ptrendx commented Dec 21, 2018

anirudh2290 left a comment

Choose a reason for hiding this comment

Roshrini commented Jan 2, 2019

samskalicky commented Jan 2, 2019 • edited Loading

ptrendx commented Jan 2, 2019

samskalicky left a comment

Choose a reason for hiding this comment

ptrendx commented Jan 4, 2019

Roshrini commented Jan 4, 2019

KellenSunderland commented Jan 14, 2019

ptrendx commented Nov 8, 2018 •

edited

Loading

samskalicky commented Jan 2, 2019 •

edited

Loading