Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix the incorrect MKLDNN/MKL logic in cmake #14877

Merged
merged 20 commits into from
May 16, 2019

Conversation

yinghu5
Copy link
Contributor

@yinghu5 yinghu5 commented May 5, 2019

This is a re-creation of #14829 . I've accidentally deleted my fork of incubator-mxnet.

Description

We actually want to unlock the MKLDNN (MKLML) and MKL dependency, which means use MKLDNN and USE_BLAS=mkl completely independent.

  1. when MKLDNN is on, it will download MKLML automatically, don't use full MKL package.
  2. when USE_BLAS=mkl, only use the full MKL (MKL_ROOT), don't detect MKLML (from MKLDNN)

@yinghu5 yinghu5 requested a review from szha as a code owner May 5, 2019 07:02
update MKL_ROOT
@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label May 6, 2019
ws('workspace/build-cpu-nomkldnn-mkl') {
timeout(time: max_time, unit: 'MINUTES') {
utils.init_git_win()
powershell 'py -3 ci/build_windows.py -f WIN_CPU_NOMKLDNN_MKL'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build flavor(-f) should be WIN_CPU_MKL instead of WIN_CPU_NOMKLDNN_MKL since it has been modified from the latter in ci/build_windows.py (line 93).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for fix. change it now.

@@ -218,6 +246,8 @@ def main():
os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
if 'CUDA_PATH' not in os.environ:
os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
if 'MKL_ROOT' not in os.environ:
os.environ["MKL_ROOT"] = "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the effects of the escape characters here. But for the path string, I think using a raw string prefix flag (r'the/path/string'), in this case, is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we test both format should be ok. anyway change it now. thanks

fix nomkldnn issue.
change MKL_ROOT
@yinghu5
Copy link
Contributor Author

yinghu5 commented May 8, 2019

@Chancebair @marcoabreu Could you help to install a full MKL library for window CI? we supposed the full MKL will be installed to C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl on windows. You can get the executable here: http://registrationcenter-download.intel.com/akdlm/irc_nas/tec/15247/w_mkl_2019.3.203.exe thanks

@Chancebair
Copy link
Contributor

@yinghu5 I created a ticket with my teams oncall to look into this

@TaoLv
Copy link
Member

TaoLv commented May 9, 2019

Hi @Chancebair, do you have any update for the ticket? I don't want to rush you but I hope this PR can catch the code freeze of 1.5.0 release as I see there are quite a few issues about Windows build. It will also help improving the coverage of existing CI system. Thank you.

@Chancebair
Copy link
Contributor

MKL will be installed on windows sometime today thanks to @lebeg

@TaoLv
Copy link
Member

TaoLv commented May 10, 2019

Great! Thank you @Chancebair and @lebeg . @yinghu5 please try if the windows CI works now~

@yinghu5
Copy link
Contributor Author

yinghu5 commented May 10, 2019

@Chancebair and @lebeg , thank you a lot! is the MKL installed to C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl ?
In CI we define the environment MKL_ROOT
=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl

the current CI test shows,
Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_RT_LIBRARY). seems MKL still not finded.
Please help check, thanks.

@yinghu5
Copy link
Contributor Author

yinghu5 commented May 13, 2019

I test again in local machine today. it works when >set MKL_ROOT=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl
mxnet_dnn_512
thanks

 powershell 'cd $Env:MKL_ROOT'
@lebeg
Copy link
Contributor

lebeg commented May 13, 2019

I've updated the scripts for AMI creation, but currently we have a blocker on @Chancebair side. Once he deploys the new AMI's MKL should be available at the exact path you've mentioned.

@perdasilva
Copy link
Contributor

I've updated the windows instances to the new AMI. It may take some time to cycle them with the autoscaler, but please give it a go.

@yinghu5
Copy link
Contributor Author

yinghu5 commented May 15, 2019

@perdasilva @marcoabreu May I have your comments on the CI status? Seems it doesn't work now.

@perdasilva
Copy link
Contributor

@yinghu5 I've checked the windows instances running on prod, and they are running with the new AMIs.
There's an unrelated issue with TensorRT, however. I've taken a look at your latest run, and it seems there is a problem in the PowerShell. Could you please address that and try again?

@perdasilva
Copy link
Contributor

I also see that there's been a successful run since the update. Could it be that the TensorRT issue was throwing you off? The current error is on the powershell side.

@lebeg
Copy link
Contributor

lebeg commented May 15, 2019

This line probably doesn't like backslashes.

ws('workspace/build-cpu-mkl') {
timeout(time: max_time, unit: 'MINUTES') {
utils.init_git_win()
powershell '$Env:MKL_ROOT="C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this code into build_windows.py? We prefer to have these scripts self-contained and the Jenkinsfile just being an orchestration tool.

@yinghu5
Copy link
Contributor Author

yinghu5 commented May 16, 2019

@lebeg @marcoabreu @TaoLv @pengzhao-intel thank you very much for the support! after remove the redundancy test MKL_ROOT code as you suggested, the CI was triggered today. and all windows CPU parts, windows-cpu, which the PR want to fix passed successfully. Thank you!

@pengzhao-intel
Copy link
Contributor

It's great @yinghu5, please rebase the code again. The GPU failure is fixed now.

@yinghu5
Copy link
Contributor Author

yinghu5 commented May 16, 2019

All CI passed, except the unix-gpu tensorRT test error. thanks

@pengzhao-intel
Copy link
Contributor

@lebeg @marcoabreu @perdasilva the Windows CI is built, thanks all of your help.
Please help to take a final review :) I will merge the PR if no further comment in order to keep the timeline of r1.5.

Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - maybe @lebeg could you have a look at the cmake changes?

@pengzhao-intel
Copy link
Contributor

It's a nice first PR for MXNet, thanks @yinghu5.
Merging now.

@pengzhao-intel pengzhao-intel merged commit d87bd2a into apache:master May 16, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* fix cmake for win and mkl

* Update build_windows.py

update MKL_ROOT

* Update Jenkins_steps.groovy

fix nomkldnn issue.

* Update build_windows.py

change MKL_ROOT

* msshadow_blas

* update mshasow

* Update DownloadMKLML.cmake

windows md5

* Update DownloadMKLML.cmake

windows md5

* Update Jenkins_steps.groovy

export MKLROOT

* Update Jenkins_steps.groovy

 powershell 'cd $Env:MKL_ROOT'

* Update Jenkins_steps.groovy

trigger MKL_ROOT test

* Update Jenkins_steps.groovy

test MKL_ROOT line 552

* Update Jenkins_steps.groovy

update 538 $Env:MKL_ROOT

* update mshadow

* revert gitmodules

* remove comments

* remove test code

* remove test powershell
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants