-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-74]Update mkldnn to the newest & Add clang build test with mkldnn. #9918
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add clang 3.9 as well
You're missing MKL runtime dependencies, see https://github.com/apache/incubator-mxnet/blob/master/tests/ci_build/Dockerfile.cpu_mklml#L14 for reference |
tests/ci_build/Dockerfile.cpu_clang
Outdated
@@ -19,3 +19,9 @@ RUN wget -O - http://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \ | |||
apt-get install -y clang-3.9 clang-5.0 && \ | |||
clang-3.9 --version && \ | |||
clang-5.0 --version | |||
|
|||
# Add MKLML library, compatiable with Ubuntu16.04 | |||
RUN wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/01org/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update all dockerfiles in that directory in order to use the correct mkl version
@@ -14,3 +14,9 @@ RUN /install/install_testdeps.sh | |||
RUN /install/install_julia.sh | |||
RUN /install/install_maven.sh | |||
RUN /install/install_library.sh | |||
|
|||
# Add MKLML library, compatiable with Ubuntu16.04 | |||
RUN wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/01org/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MKL-DNN repo is moved to github/intel/mkl-dnn. Please use the new repo :)
Please only UPDATE dockerfiles that already have mkl stuff in it. Don't add it unless necessary. |
case mkldnn_gOihw16o: | ||
case mkldnn_gOhwi8o: | ||
case mkldnn_gOhwi16o: | ||
case mkldnn_gOhIw16o4i: | ||
return mkldnn_goihw; | ||
default: | ||
LOG(FATAL) << "Unknown MKLDNN format for 4 dimensions: " << desc.data.format; | ||
LOG(FATAL) << "Unknown MKLDNN format for 5 dimensions: " << desc.data.format; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We notice that there is a test case failed here and it was caused by a missing data format "mkldnn_Goihw8g". Then we found the error message is misleading.
Please find the error message here: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9918/4/pipeline/513
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah great. But the error message is indeed pretty misleading. Would it be possible to make a better fitting test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a negative test case which can run into the default statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because all the 5D data formats that mkldnn supported currently are covered by the swich-case statement here. So I don't think we can create a mkldnn::memory::desc
with another 5D data format to touch the default statement.
One feasible method is to set a test case to check whether there is a new format added into mkldnn data format list. If the test case is failed, maybe we need check the code here and add the new format into the switch-case statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a good idea! Especially considering that updating the MKLDNN submodule requires a PR, problems like this would be caught during an early stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu Sorry for late response. It seems a little difficult to monitor the change of a enum
type in mkldnn package. I have asked mkldnn team for support and am waiting for their reply. Since the mkldnn version updated in this pr works well with this part of code, I would like this pr be merged firstly and it will address the compilation issue on OSX. I will add unit test for it later if I get support from mkldnn team. If needed, I can help create a JIRA ticket to track this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you create the jira issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created. MXNET-98
@@ -38,3 +38,9 @@ RUN add-apt-repository -y ppa:marutter/rdev | |||
RUN apt-get update && apt-get -y upgrade | |||
RUN DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew" install r-base r-base-dev | |||
RUN Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')" | |||
|
|||
# Add MKLML library, compatiable with Ubuntu16.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
tests/ci_build/Dockerfile.cpu
Outdated
@@ -10,3 +10,9 @@ COPY install/ubuntu_install_r.sh /install/ | |||
RUN /install/ubuntu_install_r.sh | |||
COPY install/ubuntu_install_perl.sh /install/ | |||
RUN /install/ubuntu_install_perl.sh | |||
|
|||
# Add MKLML library, compatiable with Ubuntu16.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
@@ -14,3 +14,9 @@ RUN /install/install_testdeps.sh | |||
RUN /install/install_julia.sh | |||
RUN /install/install_maven.sh | |||
RUN /install/install_library.sh | |||
|
|||
# Add MKLML library, compatiable with Ubuntu16.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
Does this add two more full builds to CI? How does that affect CI performance? What's the reasoning for testing both 3.9 and 5? |
It does not harm CI performance as it is not on the critical path. The builds are happening fairly quickly and we're considering adding ccache. Clang 3.9 and 5.0 are tested for Mac support. @KellenSunderland might provide some background here |
@marcoabreu I come across the unit test error again(this time is R pkg building). Could you help me fix it? Thanks. |
It seems like there is a problem with CI due to the update of an R dependency that deprecated some features. We are working on it. |
tests/ci_build/Dockerfile.cpu_clang
Outdated
@@ -19,3 +19,9 @@ RUN wget -O - http://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \ | |||
apt-get install -y clang-3.9 clang-5.0 && \ | |||
clang-3.9 --version && \ | |||
clang-5.0 --version | |||
|
|||
# Add MKLML library, compatiable with Ubuntu16.04 | |||
RUN wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/intel/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could put this in the same RUN to save some space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for comments! Do these dockerfiles run at the same time and share the same /tmp dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to put mklml script in a single run file, but get the following error:
Step 13/14 : RUN /install/ubuntu_install_mklml.sh
---> Running in 53baae7b5275
/bin/sh: 1: /install/ubuntu_install_mklml.sh: Permission denied
The command '/bin/sh -c /install/ubuntu_install_mklml.sh' returned a non-zero code: 126
ERROR: docker build failed.
Can you help me fix it? By the way, I think the run file should first judge whether the file already exists. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filesystem is transparent in between the layers, but each execution of a docker container has their own view of the file system. There is no shared writeable filesystem as far as it concerns your usecase.
I'm a bit confused by that error atm. Could you debug this problem a bit further (e.g. running ls -l
inside the container at /install/ etc) and hit me up if you don't know how to continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just built this version, which includes my nit, with no permission issues. https://gist.github.com/KellenSunderland/b97596a3322bdef17129b122d99721f3
Hope it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it does work. But I'm a bit confused why it can save some space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I listed the space and in fact it didn't save disk space ... which was suspicious until I saw what was going on.
I was assuming you were downloading a tgz in the first RUN command, and then deleting it in the second. As docker stores the changes to the file system as a unique layer for each command this would mean we're storing an unnecessary tgz file in a redundant layer. In fact we weren't deleting the tgz file in the second RUN.
I've created a new gist I recommend we use here https://gist.github.com/KellenSunderland/14778ec80b7bf1a3433811670932242a. In the gist there's an optimized version and an unoptimized version. Here's the space usage reports for each version:
$ docker history optimized
IMAGE CREATED CREATED BY SIZE COMMENT
4aeffe9e465d 6 minutes ago /bin/sh -c #(nop) ENV LD_LIBRARY_PATH=:/usr… 0B
2d48605c4d2b 6 minutes ago /bin/sh -c wget --no-check-certificate -O /t… 306MB
40a1e5139182 About an hour ago /bin/sh -c wget -O - http://apt.llvm.org/llv… 742MB
a49366f71d6e About an hour ago /bin/sh -c /install/ubuntu_install_perl.sh 71.8MB
483f846d172f About an hour ago /bin/sh -c #(nop) COPY file:1d182312c7e82760… 1.01kB
af21eeafe2d8 About an hour ago /bin/sh -c /install/ubuntu_install_r.sh 146MB
42a4f146b71e About an hour ago /bin/sh -c #(nop) COPY file:3ce9646ada9a9303… 1.14kB
9ad14f368022 About an hour ago /bin/sh -c /install/ubuntu_install_scala.sh 660MB
7609e2e90cf9 About an hour ago /bin/sh -c #(nop) COPY file:edf150d8fa20ff4f… 1.21kB
8436fb76fb8b About an hour ago /bin/sh -c /install/ubuntu_install_python.sh 843MB
b23977a31829 About an hour ago /bin/sh -c #(nop) COPY file:239352104979d07c… 1.25kB
efc666edaccf About an hour ago /bin/sh -c /install/ubuntu_install_core.sh 833MB
48529b485203 About an hour ago /bin/sh -c #(nop) COPY file:9f5c02599ecd6c52… 1.17kB
0458a4468cbc 5 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 5 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B
<missing> 5 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$… 2.76kB
<missing> 5 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0B
<missing> 5 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:a3344b835ea6fdc56… 112MB
~/Development/incubator-mxnet/tests/ci_build (dockerfiles)
$ docker history unoptimized
IMAGE CREATED CREATED BY SIZE COMMENT
84090a06ad50 32 seconds ago /bin/sh -c #(nop) ENV LD_LIBRARY_PATH=:/usr… 0B
b7848b89a535 32 seconds ago /bin/sh -c tar -zxvf /tmp/mklml.tgz && c… 306MB
80dd38efa252 About an hour ago /bin/sh -c wget --no-check-certificate -O /t… 77.8MB
40a1e5139182 About an hour ago /bin/sh -c wget -O - http://apt.llvm.org/llv… 742MB
a49366f71d6e About an hour ago /bin/sh -c /install/ubuntu_install_perl.sh 71.8MB
483f846d172f About an hour ago /bin/sh -c #(nop) COPY file:1d182312c7e82760… 1.01kB
af21eeafe2d8 About an hour ago /bin/sh -c /install/ubuntu_install_r.sh 146MB
42a4f146b71e About an hour ago /bin/sh -c #(nop) COPY file:3ce9646ada9a9303… 1.14kB
9ad14f368022 About an hour ago /bin/sh -c /install/ubuntu_install_scala.sh 660MB
7609e2e90cf9 About an hour ago /bin/sh -c #(nop) COPY file:edf150d8fa20ff4f… 1.21kB
8436fb76fb8b About an hour ago /bin/sh -c /install/ubuntu_install_python.sh 843MB
b23977a31829 About an hour ago /bin/sh -c #(nop) COPY file:239352104979d07c… 1.25kB
efc666edaccf About an hour ago /bin/sh -c /install/ubuntu_install_core.sh 833MB
48529b485203 About an hour ago /bin/sh -c #(nop) COPY file:9f5c02599ecd6c52… 1.17kB
0458a4468cbc 5 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 5 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B
<missing> 5 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$… 2.76kB
<missing> 5 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0B
<missing> 5 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B
<missing> 5 weeks ago /bin/sh -c #(nop) ADD file:a3344b835ea6fdc56… 112MB
~/Development/incubator-mxnet/tests/ci_build (dockerfiles)
Pretty small savings, but in the optimized version there's one less layer and there's 77.8MB saved. Again this is getting pretty nitpicky. Both versions are totally functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thank you very much:)
Hi, the community has passed to vote about associating the code changes with JIRA (https://lists.apache.org/thread.html/ab22cf0e35f1bce2c3bf3bec2bc5b85a9583a3fe7fd56ba1bbade55f@%3Cdev.mxnet.apache.org%3E) We have updated the guidelines for contributors in https://cwiki.apache.org/confluence/display/MXNET/Development+Process, please ensure that you have created a JIRA at https://issues.apache.org/jira/projects/MXNET/issues/ to describe your work in this pull request and include the JIRA title in your PR as [MXNET-xxxx] your title where MXNET-xxxx is the JIRA id Thanks! |
@marcoabreu @KellenSunderland @cjolivier01 please help confirm the final changes. |
@marcoabreu @KellenSunderland @cjolivier01 please help confirm the latest changes, Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor tweak to Jenkinsfile requested. LGTM otherwise.
Jenkinsfile
Outdated
@@ -175,6 +175,24 @@ try { | |||
} | |||
} | |||
}, | |||
'CPU: Clang 3.9 MKLDNN': { | |||
node('mxnetlinux-cpu') { | |||
ws('workspace/build-cpu-clang39') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're trying to give each build task its own unique workspace. I'd recommend we change this to:
ws('workspace/build-cpu-mkldnn-clang39')
Jenkinsfile
Outdated
}, | ||
'CPU: Clang 5 MKLDNN': { | ||
node('mxnetlinux-cpu') { | ||
ws('workspace/build-cpu-clang50') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and
ws('workspace/build-cpu-mkldnn-clang50')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I am going to fix it:)
@marcoabreu @cjolivier01 please help confirm the latest changes, Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small copy & paste mistakes, otherwise lgtm
Jenkinsfile
Outdated
ws('workspace/build-cpu-mkldnn-clang39') { | ||
init_git() | ||
sh "ci/build.py --build --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang39_mkldnn" | ||
pack_lib('mkldnn_cpu', mx_mkldnn_lib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate name
Jenkinsfile
Outdated
ws('workspace/build-cpu-mkldnn-clang50') { | ||
init_git() | ||
sh "ci/build.py --build --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang50_mkldnn" | ||
pack_lib('mkldnn_cpu', mx_mkldnn_lib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate name
case mkldnn_gOihw16o: | ||
case mkldnn_gOhwi8o: | ||
case mkldnn_gOhwi16o: | ||
case mkldnn_gOhIw16o4i: | ||
return mkldnn_goihw; | ||
default: | ||
LOG(FATAL) << "Unknown MKLDNN format for 4 dimensions: " << desc.data.format; | ||
LOG(FATAL) << "Unknown MKLDNN format for 5 dimensions: " << desc.data.format; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you create the jira issue?
…dnn. (apache#9918) * add clang test with mkldnn * update mkldnn * Update mkldnn to the newest & Add clang build test with mkldnn. apache#9918 * update all dockerfiles in order to use the correct mkl version * fix bugs in mkldnn_base * update mklml link and modify dockerfiles * delete blank line * no changes, just retrigger. * no changes, just retriggering * put mklml in a single run file * debug mklml run file * debug mklml script * add mkldnn clang test * give each build task its own unique workspace * change lib name
…dnn. (apache#9918) * add clang test with mkldnn * update mkldnn * Update mkldnn to the newest & Add clang build test with mkldnn. apache#9918 * update all dockerfiles in order to use the correct mkl version * fix bugs in mkldnn_base * update mklml link and modify dockerfiles * delete blank line * no changes, just retrigger. * no changes, just retriggering * put mklml in a single run file * debug mklml run file * debug mklml script * add mkldnn clang test * give each build task its own unique workspace * change lib name
…dnn. (apache#9918) * add clang test with mkldnn * update mkldnn * Update mkldnn to the newest & Add clang build test with mkldnn. apache#9918 * update all dockerfiles in order to use the correct mkl version * fix bugs in mkldnn_base * update mklml link and modify dockerfiles * delete blank line * no changes, just retrigger. * no changes, just retriggering * put mklml in a single run file * debug mklml run file * debug mklml script * add mkldnn clang test * give each build task its own unique workspace * change lib name
Description
Update mkldnn to the newest to fix bugs about macOS building in #9828
Add clang build test with mkldnn.
Checklist
Essentials
make lint
)Changes