[NFC][SYCL] Add pragma unroll in accessor.hpp#4347
Conversation
It shall improve performance on various targets in case if we have 3 dimensional calculations within nested loops. Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
|
/summary:run |
1 similar comment
|
/summary:run |
|
@s-kanaev please take a look |
s-kanaev
left a comment
There was a problem hiding this comment.
Other than the questions the patch looks good.
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
b7c7724 to
3cdd2a4
Compare
erichkeane
left a comment
There was a problem hiding this comment.
I'm more in favor of this approach, I think it at least is less of a 'big hammer'.
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
s-kanaev
left a comment
There was a problem hiding this comment.
Should there be a test for it?
|
Hah, build for cuda is still failing with: that's a bit unexpected, I'll check tomorrow why it happens.
@s-kanaev I can barely imagine a proper test for this patch. One option would be to add a test to check_device_code and see unroll metadata appearing on the appropriate branch instruction, but I guess clang itself has lots of tests for that. Also it doesn't include host code compilation on various compilers. Another option would be to, hm, dump assembly? But I can barely imagine, how we can do it (for example) for MSVC compilation in LIT environment. Or there is another option that you have in mind? |
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
|
/summary:run |
@s-kanaev, please, approve using GitHub UI to unblock the merge. |
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
9559942 to
e9d4e64
Compare
Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
|
@MrSidims, I think, it's more likely that the failure on Linux machine was introduced by @alexbatashev with 739487c rather than this patch. @romanovvlad, FYI. |
| #else | ||
| #define __SYCL_UNROLL(x) | ||
| #endif // compiler switch | ||
| #endif // __SYCL_UNROLL |
There was a problem hiding this comment.
While looking at this I wonder why just using a metaprogrammed loop instead is not better than this pragmatic spaghetti plate.
There was a problem hiding this comment.
It also works with one-iteration loops, unlike this pragma-based solution: #6560
It shall improve performance on various targets in case if we
have 3 dimensional calculations within nested loops.
Signed-off-by: Dmitry Sidorov dmitry.sidorov@intel.com