resolve Nan issue when using tf - SWDEV-285232#911
Conversation
… add more valid check condition for fp16 fwd
|
Shall we test it on CI right now? |
No, not yet. |
|
Self test done, ready for review. |
carlushuang
left a comment
There was a problem hiding this comment.
LGTM. This fix all the fwd kernels by setting the proper range check in sgpr[2:3] of buffer load instruction
atamazov
left a comment
There was a problem hiding this comment.
I do not see anything bad))
|
@shaojiewang Can you please add regression tests for this issue? |
The test is done with tensorflow as described in SWDEV-285232. And the problem is not observed when tested by MIOpenDriver. Do we pass the e2e test to QA? |
|
@atamazov CI passed, large number of file while actual changes are small and have benign looks. As @shaojiewang commented "tested for about 1 day. No nan issue is observed." Ready to merge? |
|
Yes. But we need to think about regression tests -- I mean the ones that ran on our CI and ensure that the same bug will not appear in the future. |
[MIOpen] Address some leftover issues from the Batchnorm tuning API (#911) The API version of MIOPEN_FIND_ENFORCE was merged via a separate PR, but there were a few leftover review issues and documentation updates pending that are addressed by these changes.
Regarding https://ontrack-internal.amd.com/browse/SWDEV-285232. resolve Nan issue.
With this branch, nan issue is gone.