-
Notifications
You must be signed in to change notification settings - Fork 6.8k
MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928
Comments
@wkcn Thanks for raising this issue. The performance degradation is indeed concerning. @mxnet-label-bot Add [Gluon, Performance] @szha Any thoughts here ? |
@wkcn What are these numbers specifically? Training speed for Faster-RCNN? If so, what is the network? |
what is the different between 1.3.0 and 1.5.0 in memory allocation? |
@piyushghai Thanks. |
@adaaaaaa I don't know. I found the speeds are the same between two versions when input shapes are fixed. |
what are the typical input sizes? |
@szha |
@zhreshold @szha I test it on a machine which owns Tesla M40 (22945MiB) x 4. MXNet is installed by I test several versions of MXNet.
Some pre-build versions don't support CUDA9.0, so I cound't test it. |
@wkcn I've tested it using V100 x4 Actually I tested 1.3.1b20181001, it is slower (120+-20 images/sec on average) than any of the previous three builds. In summary, my experimental results are reversed version of @wkcn 's results. |
@zhreshold Thank you! It’s flaky. |
@zhreshold I think the performance drops because of driver rather than MXNet. |
THanks for the update, can be resolve this issue? |
@zhreshold solved. Thank you! |
I'm currently seeing some of my inference stuff slowing down a lot on mxnet versions over 1.3.1 with cuda 9.2 (run on a docker container), but I do not know how to check if it is the same thing you ran into. |
@mikeobr You can run this code: It seems that the performance of dilated Convolutional layer drops in CUDA 9. |
Hello, I have a problem: envs: p40/cuda8/cudnn5.1.10/nvidia-driver384.81 |
@IvyGongoogle Is there any dilated convolutional layers in your model? |
I met the same problem, I have a project with dilation convolution (resnet backbone). If I use a mxnet1.3.1-cu80 (pip install), the speed is 0.18-0.19s one iter. However, when I switch to mxnet1.4.0-cute80(pip install), the speed drop to 0.19-0.20s one iter. The speed drop slightly, I confused with this problem. |
@wkcn no dilated convolutional layers in my model which is a ocr recognition model with simple cnn and rnn |
Based on experience, you should use newer version of CUDA and CUDNN to get more performance. In my opinion, cuda 8.0 is obsoleted. |
Close it since dilated convolution is not optimized in the old version of CUDNN as chinakook said. |
Description
Hi! I have an experiment about Object Counting, which needs variant inputs.
I write the code with Gluon, and hybridize the model with
static_alloc=True
I found there is obvious difference between MXNet 1.5.0 and MXNet 1.3.0, and I checked it on two servers.
I think the method of memory allocation for Gluon may be changed after MXNet 1.3.0.Thanks!
Update:
When there are dilated Convolutional layers in the model, and the input size is variational, the performance will drop.
I think it may be related to one of the two PRs: #11742 #12722
Environment info (Required)
OS: ubuntu 14.04
GPU: Tesla M40 x 4
Minimum reproducible example
I write a minimum reproducible example without dataset.
Code
MXNet 1.5.0: 10 images / sec
MXNet 1.3.0: 40+ images / sec
The performances are the same when input shape is fixed.
Input shape: (9, 3, 300~512, 300~512) in NCHW order
Package used (Python/R/Scala/Julia):
Python 2.7.12, 3.7.1
MXNet is installed by pip:
Steps to reproduce
Download the test code.
Run the test code in different version (1.3.0 and 1.5.0) of MXNet.
Performance
I test several versions of MXNet.
Some pre-build versions don't support CUDA9.0, so I cound't test it.
The performance drops during 20181004 to 20181010.
If changing the dilation of dilated conv to 1, the performance will be normal.
It seems the problem occurs in dilated conv.
The text was updated successfully, but these errors were encountered: