Skip to content

Add Macro to remove asserts in hip kernel#376

Merged
whchung merged 1 commit into
develop-upstreamfrom
develop-upstream-zyin-assert-linker-issue
Mar 26, 2019
Merged

Add Macro to remove asserts in hip kernel#376
whchung merged 1 commit into
develop-upstreamfrom
develop-upstream-zyin-assert-linker-issue

Conversation

@jerryyin
Copy link
Copy Markdown
Member

@jerryyin jerryyin commented Mar 25, 2019

HIP does not currently support assert() calls inside kernels. Instead of removing assert calls which are all over our code base, This pull request pick the gpu_device_functions.h that is the dependency to hip kernels to add the macro. This way, "assert()" macro is redefined to nop, and dependency kernels will pick up the re-definition in the build process. The macro is used to convert assert() calls to nop, so that we can create tf-rocm debug builds.

The reason why the issue does not exist in a release build is due to asserts being expanded to NOP in release build. This is done by having NDEBUG set.

After this update together with rocPRIM PR 51, a debug build of TensorFlow-Rocm can be achieved by:
$ bazel build --config=rocm --copt=-g --compilation_mode=dbg --strip=never //tensorflow/tools/pip_package_build_pip_package

@ROCm-Apps-Test
Copy link
Copy Markdown

Can one of the admins verify this patch?

2 similar comments
@ROCm-Apps-Test
Copy link
Copy Markdown

Can one of the admins verify this patch?

@ROCm-Apps-Test
Copy link
Copy Markdown

Can one of the admins verify this patch?

@jerryyin jerryyin requested review from deven-amd and whchung March 25, 2019 18:15
@deven-amd
Copy link
Copy Markdown

@parallelo, can you add @jerryyin to the cool-kids list so that his PRs can trigger CI automatically :)

thanks

@whchung
Copy link
Copy Markdown
Collaborator

whchung commented Mar 25, 2019

jenkins: ok to test

@deven-amd
Copy link
Copy Markdown

@jerryyin

interesting solution to the assert problem. I had run into the same issue when I was trying to do a debug build. I remember running trying out something similar (i.e. redefining assert to be a no-op, though not in the same file as you), but it never completely worked for me.

eventually I had taken the nuclear option, and added +define+NDEBUG as "always on" compile option (just like TENSORFLOW_USE_ROCM) to get around the assert issue. At that point I had run into the rocprim issue (which you have identified the fix for :) ). I never checked in the change to disable asserts since I could not get the debug build to work.

@parallelo
Copy link
Copy Markdown

FYI - @jerryyin has been added to the CI admin list

@jerryyin
Copy link
Copy Markdown
Member Author

@jerryyin

interesting solution to the assert problem. I had run into the same issue when I was trying to do a debug build. I remember running trying out something similar (i.e. redefining assert to be a no-op, though not in the same file as you), but it never completely worked for me.

eventually I had taken the nuclear option, and added +define+NDEBUG as "always on" compile option (just like TENSORFLOW_USE_ROCM) to get around the assert issue. At that point I had run into the rocprim issue (which you have identified the fix for :) ). I never checked in the change to disable asserts since I could not get the debug build to work.

Thanks! I got lucky in chasing down some of the HIP kernel call chains and down to this file. When I add the re-definition, all the linker issue just immediately go away at once. For me, I have to address the issue in building a debug build so that I can do operator level debugging. (The workaround to add symbols only to stream executor isn't enough to me)

Copy link
Copy Markdown
Collaborator

@whchung whchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@whchung
Copy link
Copy Markdown
Collaborator

whchung commented Mar 26, 2019

@jerryyin Thanks for submitting this PR, as well as ROCm/rocPRIM#51 . Should the PR in rocPRIM gets merged, please submit another PR to revise workspace.bzl to help move the pointer for rocPRIM

@whchung whchung merged commit dbf59b0 into develop-upstream Mar 26, 2019
@jerryyin
Copy link
Copy Markdown
Member Author

@jerryyin Thanks for submitting this PR, as well as ROCmSoftwarePlatform/rocPRIM#51 . Should the PR in rocPRIM gets merged, please submit another PR to revise workspace.bzl to help move the pointer for rocPRIM

@whchung Will do!

@jerryyin jerryyin deleted the develop-upstream-zyin-assert-linker-issue branch March 27, 2019 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants