Significant slowdown in some DGL models #16603

zheng-da · 2019-10-24T02:39:30Z

I recently compare the performance of DGL KGE models on MXNet 1.5 and the current master branch. I noticed significant slowdown. On MXNet 1.5, it takes 12 seconds to run 1000 batches, and now takes 20 seconds. It seems there is no slowdown on operators after some profiling.

To reproduce the problem, please install DGL 0.4, download the DGL KGE package by cloning the DGL repo. The DGL KGE package is under apps/kg. Run the following command:

DGLBACKEND=mxnet python3 train.py --model DistMult --dataset FB15k --batch_size 1024 \
    --neg_sample_size 256 --hidden_dim 2000 --gamma 500.0 --lr 0.1 --max_step 100000 \
    --batch_size_eval 16 --gpu 0 --valid --test -adv

zheng-da · 2019-10-24T02:41:08Z

Here is the profiling result on MXNet 1.5

Here is the profiling result in MXNet 1.6

anirudh2290 · 2019-10-24T04:36:34Z

which commit on 1.6 ?

ChaiBapchya · 2019-10-25T22:18:08Z

If there is no slowdown on operators, but the models suffer significantly, what's your guess? What might be the cause? Any direction we can start deep diving?

access2rohit · 2019-10-28T18:44:50Z

@zheng-da can you share the output of
python -c "from mxnet.runtime import Features; print(Features())"
for both mxnet-1.5 and mxnet-1.6

ddavydenko · 2019-10-28T18:44:56Z

@zachgk , please assign to accesstorohit@?

anirudh2290 · 2019-10-29T17:51:37Z

@access2rohit , @ChaiBapchya are you guys looking at this ?

ChaiBapchya · 2019-10-29T20:26:02Z

Waiting on @zheng-da for some details
@mxnet-label-bot add [Pending requester info]

anirudh2290 · 2019-10-29T20:28:34Z

@ChaiBapchya it would be worth checking if this PR : #16526 caused some slowdown.

ChaiBapchya · 2019-10-30T00:20:37Z

Tried reproducing
Got a segfault

Build flags

[✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✖ OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔ DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG]

DGL & MXNet versions

python3 -c "import mxnet;import dgl;print(mxnet.__version__);print(dgl.__version__)"
1.5.1
0.4

Log

/workspace/dgl_issue/dgl/apps/kg$ DGLBACKEND=mxnet python3 train.py --model DistMult --dataset FB15k --batch_size 1024     --neg_sample_size 256 --hidden_dim 2000 --gam
ma 500.0 --lr 0.1 --max_step 100000     --batch_size_eval 16 --gpu 0 --valid --test -adv
Logs are being recorded at: ckpts/DistMult_FB15k_1/train.log
File not found. Downloading from https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/FB15k.zip
8it [00:00, 11.36it/s]
Download finished. Unzipping the file...
Unzip finished.
|Train|: 483142
|valid|: 50000
|test|: 59071
eval on 50000 edges
eval on 50000 edges
eval on 59071 edges
eval on 59071 edges
[Train](0/100000) average pos_loss: 0.6942696571350098
[Train](0/100000) average neg_loss: 0.7048686146736145
[Train](0/100000) average loss: 0.6995691061019897
[Train](0/100000) average regularization: 0.05789175629615784
0.23716044425964355
[Train](1000/100000) average pos_loss: 0.2992950127273798
[Train](1000/100000) average neg_loss: 0.571561429977417
[Train](1000/100000) average loss: 0.43542822140455245
[Train](1000/100000) average regularization: 0.12885531912744044
16.814379453659058

...
Trains for rest of the steps

[Train](99000/100000) average pos_loss: 0.18922672630846502
[Train](99000/100000) average neg_loss: 0.3200621349364519
[Train](99000/100000) average loss: 0.2546444304138422
[Train](99000/100000) average regularization: 0.10238696759194135
13.827665090560913
training takes 1850.2152254581451 seconds
Test average MRR at [99999/100000]: 0.7777071453668755
Test average MR at [99999/100000]: 44.82855377427164
Test average HITS@1 at [99999/100000]: 0.7043134533019587
Test average HITS@3 at [99999/100000]: 0.8327182543041425
Test average HITS@10 at [99999/100000]: 0.8948722723502226

Segmentation fault: 11

Stack trace:
  [bt] (0) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2e6b140) [0x7f933e364140]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f9414ae14b0]
  [bt] (2) /usr/local/cuda/lib64/libcudart.so.10.0(+0x1d9fe) [0x7f94111fd9fe]
  [bt] (3) /usr/local/cuda/lib64/libcudart.so.10.0(+0x2296b) [0x7f941120296b]
  [bt] (4) /usr/local/cuda/lib64/libcudart.so.10.0(cudaSetDevice+0x47) [0x7f941122a087]
  [bt] (5) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25bf40a) [0x7f933dab840a]
  [bt] (6) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25cc21e) [0x7f933dac521e]
  [bt] (7) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25b4d8e) [0x7f933daadd8e]
  [bt] (8) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25b5cd4) [0x7f933daaecd4]

access2rohit · 2019-10-31T19:03:21Z

@zheng-da can you share the flags for mxnet 1.6.x/master branch that you used ?
Also was it mxnet.15.1 or 1.5.0 from pypi ? Can you confirm ?
for flags you can fire the following command directly from bash
python -c "from mxnet.runtime import Features; print(Features())"

access2rohit · 2019-10-31T19:04:45Z

@anirudh2290 We need to know the exact build flags used for 1.6.x before we start our deep dive.
@ChaiBapchya lets wait for @zheng-da 's reply

zheng-da · 2019-10-31T19:40:50Z

I tested with nightly build. so the build flags are what nightly build usually has.

access2rohit · 2019-10-31T20:38:28Z

@zheng-da which version was it ? cu100 or 101. Was it w/ or w/o MKL ? It would be much easier if you could simply give us the thr output of this:
python -c "from mxnet.runtime import Features; print(Features())"
Can you also clarify whether it was 1.5.0 or 1.5.1 that was used for comparison ?

zheng-da · 2019-11-04T20:55:37Z

I just tried the experiment again and there is no problem. The command to run the experiment:

python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 --hidden_dim 2000 --gamma 500.0 --lr 0.1 --max_step 2000 --gpu 0

You can use the following commands to install MXNet. The problem is very easy to reproduce. You can install the MKLDNN version if you want. It makes no difference.

pip3 install mxnet-cu100

pip3 install --pre mxnet-cu100

anirudh2290 assigned access2rohit Oct 28, 2019

lanking520 added the Pending Requester Info label Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant slowdown in some DGL models #16603

Significant slowdown in some DGL models #16603

zheng-da commented Oct 24, 2019

zheng-da commented Oct 24, 2019

anirudh2290 commented Oct 24, 2019

ChaiBapchya commented Oct 25, 2019

access2rohit commented Oct 28, 2019 •

edited

Loading

ddavydenko commented Oct 28, 2019

anirudh2290 commented Oct 29, 2019

ChaiBapchya commented Oct 29, 2019

anirudh2290 commented Oct 29, 2019

ChaiBapchya commented Oct 30, 2019

access2rohit commented Oct 31, 2019

access2rohit commented Oct 31, 2019

zheng-da commented Oct 31, 2019

access2rohit commented Oct 31, 2019 •

edited

Loading

zheng-da commented Nov 4, 2019

Significant slowdown in some DGL models #16603

Significant slowdown in some DGL models #16603

Comments

zheng-da commented Oct 24, 2019

zheng-da commented Oct 24, 2019

anirudh2290 commented Oct 24, 2019

ChaiBapchya commented Oct 25, 2019

access2rohit commented Oct 28, 2019 • edited Loading

ddavydenko commented Oct 28, 2019

anirudh2290 commented Oct 29, 2019

ChaiBapchya commented Oct 29, 2019

anirudh2290 commented Oct 29, 2019

ChaiBapchya commented Oct 30, 2019

access2rohit commented Oct 31, 2019

access2rohit commented Oct 31, 2019

zheng-da commented Oct 31, 2019

access2rohit commented Oct 31, 2019 • edited Loading

zheng-da commented Nov 4, 2019

access2rohit commented Oct 28, 2019 •

edited

Loading

access2rohit commented Oct 31, 2019 •

edited

Loading