Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-74]Update mkldnn to the newest & Add clang build test with mkldnn. #9918

Merged
merged 17 commits into from
Mar 14, 2018

Conversation

xinyu-intel
Copy link
Contributor

@xinyu-intel xinyu-intel commented Feb 28, 2018

Description

Update mkldnn to the newest to fix bugs about macOS building in #9828
Add clang build test with mkldnn.

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Update mkldnn to the newest
  • Add clang build test with mkldnn.

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add clang 3.9 as well

@marcoabreu
Copy link
Contributor

@@ -19,3 +19,9 @@ RUN wget -O - http://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
apt-get install -y clang-3.9 clang-5.0 && \
clang-3.9 --version && \
clang-5.0 --version

# Add MKLML library, compatiable with Ubuntu16.04
RUN wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/01org/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update all dockerfiles in that directory in order to use the correct mkl version

@@ -14,3 +14,9 @@ RUN /install/install_testdeps.sh
RUN /install/install_julia.sh
RUN /install/install_maven.sh
RUN /install/install_library.sh

# Add MKLML library, compatiable with Ubuntu16.04
RUN wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/01org/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKL-DNN repo is moved to github/intel/mkl-dnn. Please use the new repo :)

@marcoabreu
Copy link
Contributor

Please only UPDATE dockerfiles that already have mkl stuff in it. Don't add it unless necessary.

case mkldnn_gOihw16o:
case mkldnn_gOhwi8o:
case mkldnn_gOhwi16o:
case mkldnn_gOhIw16o4i:
return mkldnn_goihw;
default:
LOG(FATAL) << "Unknown MKLDNN format for 4 dimensions: " << desc.data.format;
LOG(FATAL) << "Unknown MKLDNN format for 5 dimensions: " << desc.data.format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We notice that there is a test case failed here and it was caused by a missing data format "mkldnn_Goihw8g". Then we found the error message is misleading.
Please find the error message here: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9918/4/pipeline/513

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah great. But the error message is indeed pretty misleading. Would it be possible to make a better fitting test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a negative test case which can run into the default statement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because all the 5D data formats that mkldnn supported currently are covered by the swich-case statement here. So I don't think we can create a mkldnn::memory::desc with another 5D data format to touch the default statement.
One feasible method is to set a test case to check whether there is a new format added into mkldnn data format list. If the test case is failed, maybe we need check the code here and add the new format into the switch-case statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea! Especially considering that updating the MKLDNN submodule requires a PR, problems like this would be caught during an early stage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu Sorry for late response. It seems a little difficult to monitor the change of a enum type in mkldnn package. I have asked mkldnn team for support and am waiting for their reply. Since the mkldnn version updated in this pr works well with this part of code, I would like this pr be merged firstly and it will address the compilation issue on OSX. I will add unit test for it later if I get support from mkldnn team. If needed, I can help create a JIRA ticket to track this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you create the jira issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created. MXNET-98

@@ -38,3 +38,9 @@ RUN add-apt-repository -y ppa:marutter/rdev
RUN apt-get update && apt-get -y upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew" install r-base r-base-dev
RUN Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"

# Add MKLML library, compatiable with Ubuntu16.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@@ -10,3 +10,9 @@ COPY install/ubuntu_install_r.sh /install/
RUN /install/ubuntu_install_r.sh
COPY install/ubuntu_install_perl.sh /install/
RUN /install/ubuntu_install_perl.sh

# Add MKLML library, compatiable with Ubuntu16.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@@ -14,3 +14,9 @@ RUN /install/install_testdeps.sh
RUN /install/install_julia.sh
RUN /install/install_maven.sh
RUN /install/install_library.sh

# Add MKLML library, compatiable with Ubuntu16.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@cjolivier01
Copy link
Member

Does this add two more full builds to CI? How does that affect CI performance? What's the reasoning for testing both 3.9 and 5?

@marcoabreu
Copy link
Contributor

It does not harm CI performance as it is not on the critical path. The builds are happening fairly quickly and we're considering adding ccache.

Clang 3.9 and 5.0 are tested for Mac support. @KellenSunderland might provide some background here

@xinyu-intel
Copy link
Contributor Author

@marcoabreu I come across the unit test error again(this time is R pkg building). Could you help me fix it? Thanks.

@marcoabreu
Copy link
Contributor

It seems like there is a problem with CI due to the update of an R dependency that deprecated some features. We are working on it.

@@ -19,3 +19,9 @@ RUN wget -O - http://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
apt-get install -y clang-3.9 clang-5.0 && \
clang-3.9 --version && \
clang-5.0 --version

# Add MKLML library, compatiable with Ubuntu16.04
RUN wget --no-check-certificate -O /tmp/mklml.tgz https://github.com/intel/mkl-dnn/releases/download/v0.12/mklml_lnx_2018.0.1.20171227.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could put this in the same RUN to save some space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for comments! Do these dockerfiles run at the same time and share the same /tmp dir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to put mklml script in a single run file, but get the following error:

Step 13/14 : RUN /install/ubuntu_install_mklml.sh
 ---> Running in 53baae7b5275
/bin/sh: 1: /install/ubuntu_install_mklml.sh: Permission denied
The command '/bin/sh -c /install/ubuntu_install_mklml.sh' returned a non-zero code: 126
ERROR: docker build failed.

Can you help me fix it? By the way, I think the run file should first judge whether the file already exists. Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filesystem is transparent in between the layers, but each execution of a docker container has their own view of the file system. There is no shared writeable filesystem as far as it concerns your usecase.

I'm a bit confused by that error atm. Could you debug this problem a bit further (e.g. running ls -l inside the container at /install/ etc) and hit me up if you don't know how to continue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just built this version, which includes my nit, with no permission issues. https://gist.github.com/KellenSunderland/b97596a3322bdef17129b122d99721f3

Hope it helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it does work. But I'm a bit confused why it can save some space?

Copy link
Contributor

@KellenSunderland KellenSunderland Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I listed the space and in fact it didn't save disk space ... which was suspicious until I saw what was going on.

I was assuming you were downloading a tgz in the first RUN command, and then deleting it in the second. As docker stores the changes to the file system as a unique layer for each command this would mean we're storing an unnecessary tgz file in a redundant layer. In fact we weren't deleting the tgz file in the second RUN.

I've created a new gist I recommend we use here https://gist.github.com/KellenSunderland/14778ec80b7bf1a3433811670932242a. In the gist there's an optimized version and an unoptimized version. Here's the space usage reports for each version:

$ docker history optimized
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
4aeffe9e465d        6 minutes ago       /bin/sh -c #(nop)  ENV LD_LIBRARY_PATH=:/usr…   0B                  
2d48605c4d2b        6 minutes ago       /bin/sh -c wget --no-check-certificate -O /t…   306MB               
40a1e5139182        About an hour ago   /bin/sh -c wget -O - http://apt.llvm.org/llv…   742MB               
a49366f71d6e        About an hour ago   /bin/sh -c /install/ubuntu_install_perl.sh      71.8MB              
483f846d172f        About an hour ago   /bin/sh -c #(nop) COPY file:1d182312c7e82760…   1.01kB              
af21eeafe2d8        About an hour ago   /bin/sh -c /install/ubuntu_install_r.sh         146MB               
42a4f146b71e        About an hour ago   /bin/sh -c #(nop) COPY file:3ce9646ada9a9303…   1.14kB              
9ad14f368022        About an hour ago   /bin/sh -c /install/ubuntu_install_scala.sh     660MB               
7609e2e90cf9        About an hour ago   /bin/sh -c #(nop) COPY file:edf150d8fa20ff4f…   1.21kB              
8436fb76fb8b        About an hour ago   /bin/sh -c /install/ubuntu_install_python.sh    843MB               
b23977a31829        About an hour ago   /bin/sh -c #(nop) COPY file:239352104979d07c…   1.25kB              
efc666edaccf        About an hour ago   /bin/sh -c /install/ubuntu_install_core.sh      833MB               
48529b485203        About an hour ago   /bin/sh -c #(nop) COPY file:9f5c02599ecd6c52…   1.17kB              
0458a4468cbc        5 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B                  
<missing>           5 weeks ago         /bin/sh -c mkdir -p /run/systemd && echo 'do…   7B                  
<missing>           5 weeks ago         /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$…   2.76kB              
<missing>           5 weeks ago         /bin/sh -c rm -rf /var/lib/apt/lists/*          0B                  
<missing>           5 weeks ago         /bin/sh -c set -xe   && echo '#!/bin/sh' > /…   745B                
<missing>           5 weeks ago         /bin/sh -c #(nop) ADD file:a3344b835ea6fdc56…   112MB               
~/Development/incubator-mxnet/tests/ci_build (dockerfiles)
$ docker history unoptimized
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
84090a06ad50        32 seconds ago      /bin/sh -c #(nop)  ENV LD_LIBRARY_PATH=:/usr…   0B                  
b7848b89a535        32 seconds ago      /bin/sh -c tar -zxvf /tmp/mklml.tgz &&     c…   306MB               
80dd38efa252        About an hour ago   /bin/sh -c wget --no-check-certificate -O /t…   77.8MB              
40a1e5139182        About an hour ago   /bin/sh -c wget -O - http://apt.llvm.org/llv…   742MB               
a49366f71d6e        About an hour ago   /bin/sh -c /install/ubuntu_install_perl.sh      71.8MB              
483f846d172f        About an hour ago   /bin/sh -c #(nop) COPY file:1d182312c7e82760…   1.01kB              
af21eeafe2d8        About an hour ago   /bin/sh -c /install/ubuntu_install_r.sh         146MB               
42a4f146b71e        About an hour ago   /bin/sh -c #(nop) COPY file:3ce9646ada9a9303…   1.14kB              
9ad14f368022        About an hour ago   /bin/sh -c /install/ubuntu_install_scala.sh     660MB               
7609e2e90cf9        About an hour ago   /bin/sh -c #(nop) COPY file:edf150d8fa20ff4f…   1.21kB              
8436fb76fb8b        About an hour ago   /bin/sh -c /install/ubuntu_install_python.sh    843MB               
b23977a31829        About an hour ago   /bin/sh -c #(nop) COPY file:239352104979d07c…   1.25kB              
efc666edaccf        About an hour ago   /bin/sh -c /install/ubuntu_install_core.sh      833MB               
48529b485203        About an hour ago   /bin/sh -c #(nop) COPY file:9f5c02599ecd6c52…   1.17kB              
0458a4468cbc        5 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B                  
<missing>           5 weeks ago         /bin/sh -c mkdir -p /run/systemd && echo 'do…   7B                  
<missing>           5 weeks ago         /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$…   2.76kB              
<missing>           5 weeks ago         /bin/sh -c rm -rf /var/lib/apt/lists/*          0B                  
<missing>           5 weeks ago         /bin/sh -c set -xe   && echo '#!/bin/sh' > /…   745B                
<missing>           5 weeks ago         /bin/sh -c #(nop) ADD file:a3344b835ea6fdc56…   112MB               
~/Development/incubator-mxnet/tests/ci_build (dockerfiles)

Pretty small savings, but in the optimized version there's one less layer and there's 77.8MB saved. Again this is getting pretty nitpicky. Both versions are totally functional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you very much:)

@CodingCat
Copy link
Contributor

Hi, the community has passed to vote about associating the code changes with JIRA (https://lists.apache.org/thread.html/ab22cf0e35f1bce2c3bf3bec2bc5b85a9583a3fe7fd56ba1bbade55f@%3Cdev.mxnet.apache.org%3E)

We have updated the guidelines for contributors in https://cwiki.apache.org/confluence/display/MXNET/Development+Process, please ensure that you have created a JIRA at https://issues.apache.org/jira/projects/MXNET/issues/ to describe your work in this pull request and include the JIRA title in your PR as [MXNET-xxxx] your title where MXNET-xxxx is the JIRA id

Thanks!

@xinyu-intel
Copy link
Contributor Author

@cjolivier01

@pengzhao-intel
Copy link
Contributor

@marcoabreu @KellenSunderland @cjolivier01 please help confirm the final changes.
And this issue will lead the build failure in OSX system so I think we need to merge it.

@xinyu-intel xinyu-intel changed the title Update mkldnn to the newest & Add clang build test with mkldnn. [MXNET-74]Update mkldnn to the newest & Add clang build test with mkldnn. Mar 12, 2018
@xinyu-intel
Copy link
Contributor Author

@marcoabreu @KellenSunderland @cjolivier01 please help confirm the latest changes, Thanks!

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweak to Jenkinsfile requested. LGTM otherwise.

Jenkinsfile Outdated
@@ -175,6 +175,24 @@ try {
}
}
},
'CPU: Clang 3.9 MKLDNN': {
node('mxnetlinux-cpu') {
ws('workspace/build-cpu-clang39') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're trying to give each build task its own unique workspace. I'd recommend we change this to:

ws('workspace/build-cpu-mkldnn-clang39')

Jenkinsfile Outdated
},
'CPU: Clang 5 MKLDNN': {
node('mxnetlinux-cpu') {
ws('workspace/build-cpu-clang50') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and

ws('workspace/build-cpu-mkldnn-clang50')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I am going to fix it:)

@xinyu-intel
Copy link
Contributor Author

@marcoabreu @cjolivier01 please help confirm the latest changes, Thanks!

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small copy & paste mistakes, otherwise lgtm

Jenkinsfile Outdated
ws('workspace/build-cpu-mkldnn-clang39') {
init_git()
sh "ci/build.py --build --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang39_mkldnn"
pack_lib('mkldnn_cpu', mx_mkldnn_lib)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate name

Jenkinsfile Outdated
ws('workspace/build-cpu-mkldnn-clang50') {
init_git()
sh "ci/build.py --build --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang50_mkldnn"
pack_lib('mkldnn_cpu', mx_mkldnn_lib)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate name

case mkldnn_gOihw16o:
case mkldnn_gOhwi8o:
case mkldnn_gOhwi16o:
case mkldnn_gOhIw16o4i:
return mkldnn_goihw;
default:
LOG(FATAL) << "Unknown MKLDNN format for 4 dimensions: " << desc.data.format;
LOG(FATAL) << "Unknown MKLDNN format for 5 dimensions: " << desc.data.format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you create the jira issue?

@marcoabreu marcoabreu merged commit f202f10 into apache:master Mar 14, 2018
@pengzhao-intel pengzhao-intel mentioned this pull request Mar 15, 2018
5 tasks
@xinyu-intel xinyu-intel deleted the master branch March 25, 2018 02:19
jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request Mar 30, 2018
…dnn. (apache#9918)

* add clang test with mkldnn

* update mkldnn

* Update mkldnn to the newest & Add clang build test with mkldnn. apache#9918

* update all dockerfiles in order to use the correct mkl version

* fix bugs in mkldnn_base

* update mklml link and modify dockerfiles

* delete blank line

* no changes, just retrigger.

* no changes, just retriggering

* put mklml in a single run file

* debug mklml run file

* debug mklml script

* add mkldnn clang test

* give each build task its own unique workspace

* change lib name
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
…dnn. (apache#9918)

* add clang test with mkldnn

* update mkldnn

* Update mkldnn to the newest & Add clang build test with mkldnn. apache#9918

* update all dockerfiles in order to use the correct mkl version

* fix bugs in mkldnn_base

* update mklml link and modify dockerfiles

* delete blank line

* no changes, just retrigger.

* no changes, just retriggering

* put mklml in a single run file

* debug mklml run file

* debug mklml script

* add mkldnn clang test

* give each build task its own unique workspace

* change lib name
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…dnn. (apache#9918)

* add clang test with mkldnn

* update mkldnn

* Update mkldnn to the newest & Add clang build test with mkldnn. apache#9918

* update all dockerfiles in order to use the correct mkl version

* fix bugs in mkldnn_base

* update mklml link and modify dockerfiles

* delete blank line

* no changes, just retrigger.

* no changes, just retriggering

* put mklml in a single run file

* debug mklml run file

* debug mklml script

* add mkldnn clang test

* give each build task its own unique workspace

* change lib name
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants