Optimise the gridDim.n * blockDim.m idiom #1468

AlexVlx · 2019-09-26T15:00:15Z

No description provided.

whchung

Although this PR doesn't solve the fundamental issue of hipGridDim_x/y/z but it does address a common pattern seen among all Eigen kernels on ROCm.

AlexVlx · 2019-09-26T16:19:00Z

Not quite sure what the fundamental problem is if I'm honest. If it's the integral division required to get the number of groups, we can probably address that one fairly easily too, although I suspect that this idiom is far more commonly encountered.

whchung · 2019-09-26T16:22:48Z

Yes, the fundamental issue is the integer division used in the implementation of __ockl_get_num_groups, and also the apparent omission of enable_sgpr_grid _workgroup_count_X/Y/Z

https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state

Indeed with this PR merged it should already address quite a lot of inefficient codes in many HIP kernels as the idiom is used literally everywhere.

b-sumner · 2019-09-26T16:29:32Z

Is it really the fundamental problem? In a separate context, I thought you said that the performance impact is minimal.

AlexVlx · 2019-09-26T16:39:27Z

Yeah I don't know how fundamental that one is, but "fixing" it should be straightforward. Let us discuss separately.

whchung · 2019-09-26T16:40:22Z

@b-sumner the performance impact was small, for the kernels I was studying at that moment. So I wasn't pushing for a change. And this PR should nicely optimize away issues found in those kernels.

But there are other kernels from other workloads which directly access hipGridDim_x/y/z without following this idiom in performance-critical loops. Again it's hard to quantify the potential impact yet without truly dive down into the ISA for those kernels. So I'm not requesting having a change either, yet. We might revisit this issue again in a couple of weeks when we have better understanding on these kernels.

b-sumner · 2019-09-26T16:47:13Z

@whchung I agree it is a problem if the division is not lifted out of a performance critical loop. But the fundamental issue then is not the specifics of the computation, but rather that an invariant computation is not being identified and lifted out of that loop.

whchung · 2019-09-26T17:01:19Z

@b-sumner perhaps we are interpreting the term "fundamental issue" differently.

To me the "fundamental issue" is that reading hipGridDim_x/y/z involves integer division and makes derivative values reside in VGPR, not in SGPR. On the other hand, all other indexing values in HIP/ROCm-Device-Libs are nicely stored in SGPRs.

Should v_readlane be used in ROCm-Device-Libs implementation, to put hipGridDim_x/y/z values back into SGPR we may be able to at least keep most derivative values in SGPR, which may reduce some instructions and VGPR register usage.

As a side note, we've been starting to upstream our works on MLIR and the very first step is to enable integrating ROCm-Device-Libs into MLIR, so functions written in GPU dialect could be properly indexed on ROCm with ROCm-Device-Libs.
tensorflow/mlir#146

b-sumner · 2019-09-26T17:08:52Z

@whchung I think you're observing side effects of the base or "fundamental" problem which has not been identified. Using readfirstlane in the device libs would simply be a workaround that could affect other optimizations and I would oppose it until found necessary after the fundamental problem has been identified.

whchung · 2019-09-26T17:12:37Z

@b-sumner I agree we postpone this discussion until further understanding the nature of these new kernels which don't follow idiom fixed in this PR.

bensander

LGTM

Divide et impera is not always ideal.

9905edb

AlexVlx requested review from b-sumner, bensander and mangupta September 26, 2019 15:00

AlexVlx changed the title ~~Divide et impera is not always ideal.~~ Optimise the gridDim.n * blockDim.m idiom Sep 26, 2019

AlexVlx requested a review from whchung September 26, 2019 15:10

whchung reviewed Sep 26, 2019

View reviewed changes

bensander approved these changes Sep 26, 2019

View reviewed changes

mangupta approved these changes Sep 30, 2019

View reviewed changes

mangupta added the pr:needs_updates PR initially approved. But needs rework label Sep 30, 2019

mangupta merged commit ab8fe8a into master Sep 30, 2019

mangupta deleted the feature_better_grid_dim branch September 30, 2019 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimise the gridDim.n * blockDim.m idiom #1468

Optimise the gridDim.n * blockDim.m idiom #1468

Uh oh!

AlexVlx commented Sep 26, 2019

Uh oh!

whchung left a comment

Uh oh!

AlexVlx commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019 •

edited

Loading

Uh oh!

b-sumner commented Sep 26, 2019

Uh oh!

AlexVlx commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019

Uh oh!

b-sumner commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019 •

edited

Loading

Uh oh!

b-sumner commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019

Uh oh!

bensander left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Optimise the gridDim.n * blockDim.m idiom #1468

Optimise the gridDim.n * blockDim.m idiom #1468

Uh oh!

Conversation

AlexVlx commented Sep 26, 2019

Uh oh!

whchung left a comment

Choose a reason for hiding this comment

Uh oh!

AlexVlx commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-sumner commented Sep 26, 2019

Uh oh!

AlexVlx commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019

Uh oh!

b-sumner commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-sumner commented Sep 26, 2019

Uh oh!

whchung commented Sep 26, 2019

Uh oh!

bensander left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

whchung commented Sep 26, 2019 •

edited

Loading

whchung commented Sep 26, 2019 •

edited

Loading