Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Using MKL causes C++ layer blow up while running lstm_bucketing example #5314

Closed
sergeykolychev opened this issue Mar 9, 2017 · 19 comments
Closed

Comments

@sergeykolychev
Copy link
Contributor

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

WARNING: discarded 89 sentences longer than the largest bucket.
WARNING: discarded 4 sentences longer than the largest bucket.
[01:06:12] /home/ubuntu/mxnet/dmlc-core/include/dmlc/./logging.h:300: [01:06:12] src/operator/./mkl/mkl_concat-inl.h:196: Check failed: e == E_SUCCESS (-1 vs.
0)

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fa00e11cc1c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op11MKLConcatOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKSt
6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD
+0xc10) [0x7fa00ecfd950]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(+0xec2092) [0x7fa00ed9a092]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0
8OprBlockE+0x8c) [0x7fa00ed5531c]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice1
3PushToExecuteEPNS2_8OprBlockEbENKUlvE_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x2e) [0x7fa00ed57bbe]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa006208c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa01cadf6ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa01c81582d]

[01:06:12] /home/ubuntu/mxnet/dmlc-core/include/dmlc/./logging.h:300: [01:06:12] src/engine/./threaded_engine.h:336: [01:06:12] src/operator/./mkl/mkl_concat-i
nl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fa00e11cc1c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op11MKLConcatOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKSt
6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD
+0xc10) [0x7fa00ecfd950]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(+0xec2092) [0x7fa00ed9a092]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0
8OprBlockE+0x8c) [0x7fa00ed5531c]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice1
3PushToExecuteEPNS2_8OprBlockEbENKUlvE_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x2e) [0x7fa00ed57bbe]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fa006208c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa01cadf6ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa01c81582d]

Environment info

Operating System:
LInux Ubuntu 16.04
Compiler:
gcc 4.8
Package used (Python/R/Scala/Julia):
Python
MXNet version:
0.9.4
Or if installed from source:

MXNet commit hash (git rev-parse HEAD):
55bb4cd
If you are using python package, please provide

Python version and distribution:
python2.7
If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

~/mxnet/example/rnn$ python lstm_bucketing.py

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

  1. Compile with USE_BLAS=mkl
  2. run mxnet/example/rnn$ python lstm_bucketing.py

What have you tried to solve it?

@piiswrong
Copy link
Contributor

@glingyan @zhenlinluo

@glingyan
Copy link
Contributor

glingyan commented Mar 9, 2017

find the problem
concat input is 20 , current API limitation is only 8
will report to mkl team
the work around is
$ git diff src/
diff --git a/src/operator/concat.cc b/src/operator/concat.cc
index fc54123..d13106e 100644
--- a/src/operator/concat.cc
+++ b/src/operator/concat.cc
@@ -18,7 +18,8 @@ template<>
Operator* CreateOp(ConcatParam param, int dtype) {
Operator *op = NULL;
#if MXNET_USE_MKL2017 == 1

  • if (1 == param.dim) {
  • if (1 == param.dim &&
  • param.num_args < (dnnResourceMultipleDst - dnnResourceMultipleSrc) ) {
    switch (dtype) {
    case mshadow::kFloat32:
    return new MKLConcatOp<cpu, float>(param);

@sergeykolychev
Copy link
Contributor Author

@glingyan Thank you! I am training char-level rnn on Intel Xeon E5-2666 v3 (Haswell)
and mxnet compiled without MKL processes 3.46 samples/sec and with MKL the speed is about 220 samples/sec! I feel like this work-around needs to be in master until the MKL supports more concat inputs.
@piiswrong In my tests MKL gives better or equal results on Linux when compared to 'apple' blas on OSX (which is also extremely fast).
OpenBLAS or BLAS are very slow in the orders of at least one magnitude.
May be it'll make sense to recommend users to compile MXNet with MKL instead of OpenBLAS on this page http://mxnet.io/get_started/ubuntu_setup.html ?

@glingyan
Copy link
Contributor

glingyan commented Mar 9, 2017

@sergeykolychev sure , there will a big upstream patch these day

@piiswrong
Copy link
Contributor

piiswrong commented Mar 9, 2017

Does mkl work for mac? If it does then we should change the tutorial

@glingyan Do we need the user to have a full mkl installation for using BLAS=mkl?

@sergeykolychev
Copy link
Contributor Author

@piiswrong I tried to use MKL on mac and it did not compile, it has different api compared to mklml that we use on Linux. I also was unable to find mklml for mac, though I saw reports on google that some individuals were able to compile mklml from the sources on OSX but did not pursue that route yet.
However it seems that stock blas from Apple is on par with MKL and can continue to be used on OSX.
The tutorial page I referred to is specifically for Ubuntu, so it can be changed independently of mac related tutorials.

@piiswrong
Copy link
Contributor

I think blas defaults to apple on for osx.mk. Or at least it used to

@sergeykolychev
Copy link
Contributor Author

@piiswrong yes, it does default to 'apple' on osx, which is correct behavior, however we are probably doing disservice to linux not defaulting to mklml. Even if wide-spread usage of mkl will lead to some issues , it's a good thing, cause they'll get quickly fixed seeing how responsive @glingyan is.

@sergeykolychev
Copy link
Contributor Author

@piiswrong , @glingyan I want to apologize and correct myself, the 3.46 I was getting with openblas were related to problems on my end, not to openblas, while MKL is still faster than openblas but the difference is not drastic.
it's more like 120 vs 200
What's more I see my char rnn network converging really fast and reliably on openblas and reaching high plateau and not converging with MKL, as well performance with MKL reliably drops like 75% in the middle of second epoch, I can replicate it reliably. Its seems like there's some bug in MKL implementation. I'll try to write python example over weekend to prove that. (my current code is in perl so not really to be trusted at this point)

@glingyan
Copy link
Contributor

@sergeykolychev there will be a fix for converage on some model tonight or tomorrow , please waiting my patch , upstream test is on going
if the patch failed, I will help you to debug

@sergeykolychev
Copy link
Contributor Author

@glingyan thank you, will wait.

@glingyan
Copy link
Contributor

@glingyan
Copy link
Contributor

@zhenlinluo for mkl on MAC issue

@sergeykolychev
Copy link
Contributor Author

sergeykolychev commented Mar 10, 2017

@glingyan , the issues are not fixed, here is what I see. my code is really basic char lstm rnn network and the data is tiny shakespreare. It's written in perl but frankly I do not think it matters.
This is output of my code with USE_BLAS=mkl compiled from your master

$ ./char_lstm.pl 
Epoch[0] Batch [50]	Speed: 218.15 samples/sec	Train-Perplexity=22.038119 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [100]	Speed: 217.41 samples/sec	Train-Perplexity=14.247312 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [150]	Speed: 216.87 samples/sec	Train-Perplexity=13.642289 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [200]	Speed: 217.11 samples/sec	Train-Perplexity=13.410031 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [250]	Speed: 217.15 samples/sec	Train-Perplexity=12.963284 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [300]	Speed: 217.47 samples/sec	Train-Perplexity=12.734377 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [350]	Speed: 217.86 samples/sec	Train-Perplexity=12.310390 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [400]	Speed: 219.83 samples/sec	Train-Perplexity=12.098077 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [450]	Speed: 219.63 samples/sec	Train-Perplexity=12.117380 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [500]	Speed: 211.98 samples/sec	Train-Perplexity=11.890713 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [550]	Speed: 198.78 samples/sec	Train-Perplexity=11.584888 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [600]	Speed: 189.23 samples/sec	Train-Perplexity=11.388555 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [650]	Speed: 187.45 samples/sec	Train-Perplexity=11.326587 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [700]	Speed: 189.07 samples/sec	Train-Perplexity=11.295736 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [750]	Speed: 200.36 samples/sec	Train-Perplexity=11.263378 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [800]	Speed: 215.10 samples/sec	Train-Perplexity=11.140880 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [850]	Speed: 220.77 samples/sec	Train-Perplexity=11.090139 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [900]	Speed: 220.96 samples/sec	Train-Perplexity=11.052934 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [950]	Speed: 220.78 samples/sec	Train-Perplexity=10.915363 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1000]	Speed: 221.17 samples/sec	Train-Perplexity=10.952525 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1050]	Speed: 221.20 samples/sec	Train-Perplexity=10.986085 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Train-Perplexity=10.845770 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Time cost=164.381 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [50]	Speed: 221.72 samples/sec	Train-Perplexity=11.260175 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [100]	Speed: 221.25 samples/sec	Train-Perplexity=10.971702 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [150]	Speed: 215.08 samples/sec	Train-Perplexity=10.753926 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [200]	Speed: 194.21 samples/sec	Train-Perplexity=10.765214 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [250]	Speed: 164.44 samples/sec	Train-Perplexity=10.594932 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [300]	Speed: 125.37 samples/sec	Train-Perplexity=10.773817 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [350]	Speed: 75.13 samples/sec	Train-Perplexity=10.838403 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [400]	Speed: 33.06 samples/sec	Train-Perplexity=10.564098 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [450]	Speed: 14.29 samples/sec	Train-Perplexity=10.620187 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [500]	Speed: 8.02 samples/sec	Train-Perplexity=10.499926 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [550]	Speed: 6.13 samples/sec	Train-Perplexity=10.723671 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.

And this is the same code with USE_BLAS=openblas

./char_lstm.pl 
Epoch[0] Batch [50]	Speed: 116.49 samples/sec	Train-Perplexity=25.415275 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [100]	Speed: 115.85 samples/sec	Train-Perplexity=11.252742 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [150]	Speed: 115.76 samples/sec	Train-Perplexity=9.379204 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [200]	Speed: 115.67 samples/sec	Train-Perplexity=8.617612 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [250]	Speed: 115.80 samples/sec	Train-Perplexity=7.716166 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [300]	Speed: 115.91 samples/sec	Train-Perplexity=7.254024 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [350]	Speed: 115.95 samples/sec	Train-Perplexity=6.983290 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [400]	Speed: 115.73 samples/sec	Train-Perplexity=6.731437 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [450]	Speed: 115.62 samples/sec	Train-Perplexity=6.464352 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [500]	Speed: 115.75 samples/sec	Train-Perplexity=6.426460 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [550]	Speed: 116.06 samples/sec	Train-Perplexity=6.127249 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [600]	Speed: 116.28 samples/sec	Train-Perplexity=6.209482 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [650]	Speed: 116.49 samples/sec	Train-Perplexity=5.889429 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [700]	Speed: 116.33 samples/sec	Train-Perplexity=6.064267 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [750]	Speed: 116.34 samples/sec	Train-Perplexity=5.679001 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [800]	Speed: 116.55 samples/sec	Train-Perplexity=5.825945 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [850]	Speed: 116.67 samples/sec	Train-Perplexity=5.764707 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [900]	Speed: 116.09 samples/sec	Train-Perplexity=5.535110 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [950]	Speed: 115.97 samples/sec	Train-Perplexity=5.466780 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1000]	Speed: 115.94 samples/sec	Train-Perplexity=5.636686 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Batch [1050]	Speed: 116.10 samples/sec	Train-Perplexity=5.457590 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Train-Perplexity=5.296764 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[0] Time cost=300.178 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [50]	Speed: 115.82 samples/sec	Train-Perplexity=5.349508 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [100]	Speed: 115.90 samples/sec	Train-Perplexity=5.343474 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [150]	Speed: 116.04 samples/sec	Train-Perplexity=5.330013 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [200]	Speed: 115.77 samples/sec	Train-Perplexity=5.214058 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [250]	Speed: 115.81 samples/sec	Train-Perplexity=5.051733 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [300]	Speed: 116.05 samples/sec	Train-Perplexity=5.270308 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [350]	Speed: 115.78 samples/sec	Train-Perplexity=5.246875 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [400]	Speed: 115.42 samples/sec	Train-Perplexity=5.383535 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [450]	Speed: 115.43 samples/sec	Train-Perplexity=5.192108 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [500]	Speed: 115.56 samples/sec	Train-Perplexity=5.231138 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [550]	Speed: 115.44 samples/sec	Train-Perplexity=5.204995 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [600]	Speed: 115.41 samples/sec	Train-Perplexity=5.093277 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [650]	Speed: 115.50 samples/sec	Train-Perplexity=5.244379 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [700]	Speed: 115.42 samples/sec	Train-Perplexity=5.104197 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [750]	Speed: 115.28 samples/sec	Train-Perplexity=5.007315 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [800]	Speed: 114.98 samples/sec	Train-Perplexity=4.979710 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [850]	Speed: 115.50 samples/sec	Train-Perplexity=4.895686 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [900]	Speed: 115.41 samples/sec	Train-Perplexity=5.103383 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [950]	Speed: 115.41 samples/sec	Train-Perplexity=5.112778 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [1000]	Speed: 115.39 samples/sec	Train-Perplexity=4.990302 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.
Epoch[1] Batch [1050]	Speed: 115.69 samples/sec	Train-Perplexity=4.994740 at /home/ubuntu/perl5/lib/perl5/AI/MXNet/Logging.pm line 5.

As you can see when MKL is used it starts twice as fast compared to OpenBLAS, but then in the middle of the second epoch slows to a crawl, as well the perplexity metric gets stuck at ~ 11
(btw I had a bug in the perl layer which was essentially of not linking states between lstm sequence layers and my network was getting stuck at the same 11 , I wonder if you have similar bug in mkl implementation).
Now looking at the OpenBLAS (running the same exact code, one difference is that mshadow uses openblas instead of mkl) , the performance starts twice as slow but never degrades and the network converges reliably and quickly with perplexity metric going down to ~ 5 as it should (julia examples have the same number)
And when I run the same code on OSX using USE_BLAS=apple I get the same exact results as with OpenBLAS on Linux.

@glingyan
Copy link
Contributor

@sergeykolychev why closed ,not problem for you now?

@sergeykolychev
Copy link
Contributor Author

sergeykolychev commented Mar 13, 2017

@glingyan sorry, thought you do not need it anymore, of course the issue still exist if you did not add new code since that preview. Though I moved my calculations to GPU box, so not really concerned with MKL right now.

@glingyan
Copy link
Contributor

@sergeykolychev will help to debug, but where to setup the env , or use example/rnn is enough ?

@sergeykolychev
Copy link
Contributor Author

@glingyan Thanks!
Here is a minimal example that you can use to debug the problem
Make this change to the code enable adam optimizer for this example

diff --git a/example/rnn/lstm_bucketing.py b/example/rnn/lstm_bucketing.py
index 4bc934a..7ab3c95 100644
--- a/example/rnn/lstm_bucketing.py
+++ b/example/rnn/lstm_bucketing.py
@@ -100,7 +100,6 @@ if __name__ == '__main__':
         kvstore             = args.kv_store,
         optimizer           = args.optimizer,
         optimizer_params    = { 'learning_rate': args.lr,
-                                'momentum': args.mom,
                                 'wd': args.wd },
         initializer         = mx.init.Xavier(factor_type="in", magnitude=2.34),
         num_epoch           = args.num_epochs,

From the output below you can see that in the middle of second epoch the performance starts to degrade and the network is not converging
with openblas the perplexity would be under 200 by the middle of the second epoch.

:~/mxnet/example/rnn$ python lstm_bucketing.py --optimizer adam
WARNING: discarded 89 sentences longer than the largest bucket.
WARNING: discarded 4 sentences longer than the largest bucket.
2017-03-13 07:42:59,877 Epoch[0] Batch [50]	Speed: 134.00 samples/sec	Train-Perplexity=2966.442802
2017-03-13 07:43:12,753 Epoch[0] Batch [100]	Speed: 124.27 samples/sec	Train-Perplexity=1133.826717
2017-03-13 07:43:27,474 Epoch[0] Batch [150]	Speed: 108.69 samples/sec	Train-Perplexity=1011.551690
2017-03-13 07:43:40,795 Epoch[0] Batch [200]	Speed: 120.12 samples/sec	Train-Perplexity=986.641991
2017-03-13 07:43:54,045 Epoch[0] Batch [250]	Speed: 120.76 samples/sec	Train-Perplexity=1043.929538
2017-03-13 07:44:07,208 Epoch[0] Batch [300]	Speed: 121.55 samples/sec	Train-Perplexity=1026.777324
2017-03-13 07:44:20,063 Epoch[0] Batch [350]	Speed: 124.47 samples/sec	Train-Perplexity=1003.001102
2017-03-13 07:44:32,424 Epoch[0] Batch [400]	Speed: 129.45 samples/sec	Train-Perplexity=1037.213538
2017-03-13 07:44:44,309 Epoch[0] Batch [450]	Speed: 134.62 samples/sec	Train-Perplexity=919.465923
2017-03-13 07:44:57,358 Epoch[0] Batch [500]	Speed: 122.63 samples/sec	Train-Perplexity=803.629447
2017-03-13 07:45:11,036 Epoch[0] Batch [550]	Speed: 116.98 samples/sec	Train-Perplexity=731.199299
2017-03-13 07:45:24,050 Epoch[0] Batch [600]	Speed: 122.94 samples/sec	Train-Perplexity=767.911830
2017-03-13 07:45:37,369 Epoch[0] Batch [650]	Speed: 120.13 samples/sec	Train-Perplexity=777.475126
2017-03-13 07:45:50,049 Epoch[0] Batch [700]	Speed: 126.19 samples/sec	Train-Perplexity=735.073373
2017-03-13 07:46:02,074 Epoch[0] Batch [750]	Speed: 133.07 samples/sec	Train-Perplexity=683.973815
2017-03-13 07:46:14,973 Epoch[0] Batch [800]	Speed: 124.04 samples/sec	Train-Perplexity=648.091700
2017-03-13 07:46:28,155 Epoch[0] Batch [850]	Speed: 121.39 samples/sec	Train-Perplexity=610.641153
2017-03-13 07:46:41,153 Epoch[0] Batch [900]	Speed: 123.10 samples/sec	Train-Perplexity=615.271286
2017-03-13 07:46:54,415 Epoch[0] Batch [950]	Speed: 120.64 samples/sec	Train-Perplexity=580.477461
2017-03-13 07:47:07,536 Epoch[0] Batch [1000]	Speed: 121.95 samples/sec	Train-Perplexity=595.476506
2017-03-13 07:47:19,853 Epoch[0] Batch [1050]	Speed: 129.90 samples/sec	Train-Perplexity=591.306123
2017-03-13 07:47:32,885 Epoch[0] Batch [1100]	Speed: 122.78 samples/sec	Train-Perplexity=604.687834
2017-03-13 07:47:46,234 Epoch[0] Batch [1150]	Speed: 119.86 samples/sec	Train-Perplexity=606.537159
2017-03-13 07:48:00,723 Epoch[0] Batch [1200]	Speed: 110.43 samples/sec	Train-Perplexity=596.381659
2017-03-13 07:48:18,747 Epoch[0] Batch [1250]	Speed: 88.77 samples/sec	Train-Perplexity=579.731480
2017-03-13 07:48:37,840 Epoch[0] Batch [1300]	Speed: 83.80 samples/sec	Train-Perplexity=568.241532
2017-03-13 07:48:40,894 Epoch[0] Train-Perplexity=561.096900
2017-03-13 07:48:40,895 Epoch[0] Time cost=353.188
2017-03-13 07:48:54,760 Epoch[0] Validation-Perplexity=516.515529
2017-03-13 07:49:16,767 Epoch[1] Batch [50]	Speed: 73.92 samples/sec	Train-Perplexity=496.200794
2017-03-13 07:49:38,937 Epoch[1] Batch [100]	Speed: 72.17 samples/sec	Train-Perplexity=486.276388
2017-03-13 07:50:06,259 Epoch[1] Batch [150]	Speed: 58.56 samples/sec	Train-Perplexity=474.950573
2017-03-13 07:50:48,366 Epoch[1] Batch [200]	Speed: 38.00 samples/sec	Train-Perplexity=478.531360
2017-03-13 07:51:57,539 Epoch[1] Batch [250]	Speed: 23.13 samples/sec	Train-Perplexity=484.835826
2017-03-13 07:53:58,770 Epoch[1] Batch [300]	Speed: 13.20 samples/sec	Train-Perplexity=522.462073

@yajiedesign
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants