Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[1.8] Include oneDNN gemm fix #19251

Merged
merged 3 commits into from
Oct 5, 2020
Merged

[1.8] Include oneDNN gemm fix #19251

merged 3 commits into from
Oct 5, 2020

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Sep 29, 2020

@mxnet-bot
Copy link

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, windows-gpu, windows-cpu, sanity, clang, unix-cpu, website, centos-gpu, edge, miscellaneous, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@TaoLv
Copy link
Member

TaoLv commented Sep 30, 2020

Hi @leezu, are you trying to include a unreleased version of oneDNN to MXNet v1.8? Or you just want to verify the fix? We're asking oneDNN team for a v1.6.4 patch release.

@leezu
Copy link
Contributor Author

leezu commented Sep 30, 2020

Could you provide more details on what fixes are missing from oneapi-src/oneDNN@5ce95ef that will go into 1.6.4?

I think it's up to @samskalicky as release manager to decide if he just includes oneapi-src/oneDNN@5ce95ef or wait for 1.6.4 (or don't include the fix at all)

@samskalicky
Copy link
Contributor

samskalicky commented Sep 30, 2020

Other than waiting for 1.6.4 patch release, what are our options for avoiding the issue in 1.8? Will reverting any/all of #18822 #18867 #19161 help?

I dont see any issue with setting 3rdparty/mkldnn to a unreleased commit to fix this bug either.

Copy link
Contributor

@samskalicky samskalicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test now that is passing with this fix?

@leezu
Copy link
Contributor Author

leezu commented Sep 30, 2020

We don't have a test in 1.x branch. One can construct a test by first calling 'tests/python/unittest/test_contrib_intgemm.py::test_contrib_intgemm_multiply[api1-16-192-4]' followed by a different test calling a hybridized Dense layer and ensuring that the output doesn't contain NaN, as per #19185 (comment)

@samskalicky
Copy link
Contributor

We don't have a test in 1.x branch. One can construct a test by first calling 'tests/python/unittest/test_contrib_intgemm.py::test_contrib_intgemm_multiply[api1-16-192-4]' followed by a different test calling a hybridized Dense layer and ensuring that the output doesn't contain NaN, as per #19185 (comment)

Ok, np. did we verify the fix actually fixes the issue in 1.8.x manually?

@pioy
Copy link

pioy commented Sep 30, 2020

Let's see if #19260 results can show that the fix really fixes the issue.

Update: It seems, that the failing tests don't exist on 1.8, so the PR is irrelevant.

@samskalicky
Copy link
Contributor

Thanks @leezu do you plan to open another PR for this fix on v1.x as well?

@leezu
Copy link
Contributor Author

leezu commented Oct 2, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@leezu leezu added the pr-awaiting-review PR is waiting for code review label Oct 2, 2020
@bartekkuncer
Copy link
Contributor

bartekkuncer commented Oct 2, 2020

@leezu The number of tags has changed so for the test to pass you have to change it from 205 to 219 in MemFormat test (103 line in mkldnn_test.cc file).
If that is not a problem I will create PR for v1.x branch.
PR for v1.x branch:
#19276

@samskalicky
Copy link
Contributor

@bartekkuncer @pioy looks like the CI is passing now. Is this PR ready to merge? Can you review/approve it?

@bartekkuncer
Copy link
Contributor

@bartekkuncer @pioy looks like the CI is passing now. Is this PR ready to merge? Can you review/approve it?

@samskalicky Change looks good now. Ready to be merged.

@szha szha merged commit 25bebdf into apache:v1.8.x Oct 5, 2020
@leezu leezu deleted the 2020-09/onednn-gemm branch October 5, 2020 15:09
@leezu
Copy link
Contributor Author

leezu commented Oct 5, 2020

This change is included in v1.x as part of #19276

@samskalicky
Copy link
Contributor

This change is included in v1.x as part of #19276

Thanks @leezu for the tracking!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants