-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix problematic backward of take & embedding #11795
Conversation
Benchmark script: import mxnet as mx
import numpy as np
import time
N = 50000
ctx = mx.gpu(0)
embedding = mx.gluon.nn.Embedding(N, 300)
embedding.initialize(ctx=ctx)
i = 0
np.random.seed(1)
idx = mx.nd.array(np.random.randint(0, N, size=(1024, 160)), ctx=ctx)
print(np.max(np.bincount(idx.asnumpy().flatten().astype(np.int64))))
a = time.time()
for i in range(500000):
with mx.autograd.record():
emb_in = embedding(idx)
loss = emb_in.sum()
loss.backward()
print(time.time() - a) Benchmark results: Slowdown is negligible after switching to new kernel. |
Thanks @haojin2! Regarding your test, did you run on Tesla V100 GPUs with Cuda 9.2? Also it seems that the AddTakeGradLargeBatch kernel and related code is unused now and should be removed? |
@leezu I'm encountering some unit test problems in this and will debug that a bit. The benchmarks were on a single K80 on my dev machine, I'll also do a benchmark on a p3 tmr. Regarding the original kernels, I'll run a check on whether they are still used elsewhere or we may even need to bring them back if we observe a severe performance regression on V100, I'll keep it for now and marking this PR as WIP. Will let you know once I've got the latest results. Thanks for your quick reply! |
@leezu I found that my fix is only slightly faster than turning MXNET_FORCE_ADDTAKEGRAD on with the previously shown benchmark scripts, I think we're good with the current implementation now? |
Ok, thanks for looking into this! In that case we should make |
@Roshrini I believe this should be addressed before freezing code for MXNet 1.3. The MXNet 1.3 release is likely to be used a lot on P3 instances with Cuda 9.2, meaning that the buggy backward pass of the Embedding operator will hit many users. |
@leezu So I can change my PR to make the default behavior be AddTakeGrad, does that sound good to you? |
Yes, I think either your improved take operator should be used or the |
@leezu I would be preferring the original AddTakeGrad as that required minimal code change. |
Is this complete? I see some discussions on dev@ about this bug |
@eric-haibin-lin This PR will be modified to make AddTakeGrad as the new default. |
First of all, great work diving into this @haojin2. One concern I have is that the overhead of running the imperative python loop would dominate the performance, and make the two implementations seem equivalent in terms of execution time. Would it be possible to run the same test prefixed with nvprof and then paste the kernel execution time summary? |
@KellenSunderland One argument that I would like to make here is that end-to-end performance is what matters most to actual users, since that's what they would experience on their end. On the other hand, as the LargeBatch kernel is exhibiting flaky problematic behavior, I don't think it should be kept (I think one other guy on the dev list thread about this is also agreeing with this point), not to mention here it's not even making any fundamental performance difference. |
Yeah I agree with that logic (and the arguments on the list). I just think the numbers are suspiciously close, and I've had the case many times in the past that when numbers are this close it's actually due to the fact that python imperative calls and cuda overhead are the bottlenecks for my measurement. Edit: Since I'm bringing it up maybe the onus is on me to verify this, I'll do a quick test and LYK what the results are. |
I'll also do some researches on my side, thanks for bringing this issue up! |
@KellenSunderland So I think there should be not much difference between the two versions of the kernel? |
Yes, I'd agree based on these measurements.
…On Wed, Jul 25, 2018 at 11:08 AM Hao Jin ***@***.***> wrote:
@KellenSunderland <https://github.com/KellenSunderland> So I think there
should be not much difference between the two versions of the kernel?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11795 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHGTE2YxPEWgjyoRWT3YV-NnM3XgyAzbks5uKLQwgaJpZM4VT6UG>
.
|
@leezu The AddTakeGrad kernel can pass more than 3500 trials on the reproduction script, I think this PR should be a good fix without any fundamental performance loss. |
@KellenSunderland Thanks for the help on profiling! |
I find the previous performance test was conducted using time.time(). It’s not safe to do that due to the tremendous overhead of the imperative API in MXNet. We should rely on nvprof in the future. |
Could be a valid point Xingjian. Are you seeing differences when measured
with nvprof? Can you post a summary?
…On Mon, Aug 26, 2019, 8:45 AM Xingjian Shi ***@***.***> wrote:
I find the previous performance test was conducted using time.time(). It’s
not safe to do that due to the tremendous overhead of the imperative API in
MXNet. We should rely on nvprof in the future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11795>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYZGEYRTSCQE3Q55QNSOH3QGP3BDANCNFSM4FKPUUDA>
.
|
Would you help profile the change? I think the current change should not have passed the code interview.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Kellen Sunderland <[email protected]>
Sent: Friday, August 30, 2019 5:42:43 AM
To: apache/incubator-mxnet <[email protected]>
Cc: Xingjian SHI <[email protected]>; Comment <[email protected]>
Subject: Re: [apache/incubator-mxnet] Fix problematic backward of take & embedding (#11795)
Could be a valid point Xingjian. Are you seeing differences when measured
with nvprof? Can you post a summary?
On Mon, Aug 26, 2019, 8:45 AM Xingjian Shi ***@***.***> wrote:
I find the previous performance test was conducted using time.time(). It’s
not safe to do that due to the tremendous overhead of the imperative API in
MXNet. We should rely on nvprof in the future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11795>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYZGEYRTSCQE3Q55QNSOH3QGP3BDANCNFSM4FKPUUDA>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#11795>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABHQH3RIP6DA7SN26CH5JI3QHEIUHANCNFSM4FKPUUDA>.
|
I’ll do that later if you do not have time. Currently, there are lots of other urgent issues for me so I’m asking for help.
Thanks,
Xingjian
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Xingjian SHI
Sent: Friday, August 30, 2019 7:43:41 AM
To: apache/incubator-mxnet <[email protected]>; apache/incubator-mxnet <[email protected]>
Cc: Comment <[email protected]>
Subject: Re: [apache/incubator-mxnet] Fix problematic backward of take & embedding (#11795)
Would you help profile the change? I think the current change should not have passed the code interview.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Kellen Sunderland <[email protected]>
Sent: Friday, August 30, 2019 5:42:43 AM
To: apache/incubator-mxnet <[email protected]>
Cc: Xingjian SHI <[email protected]>; Comment <[email protected]>
Subject: Re: [apache/incubator-mxnet] Fix problematic backward of take & embedding (#11795)
Could be a valid point Xingjian. Are you seeing differences when measured
with nvprof? Can you post a summary?
On Mon, Aug 26, 2019, 8:45 AM Xingjian Shi ***@***.***> wrote:
I find the previous performance test was conducted using time.time(). It’s
not safe to do that due to the tremendous overhead of the imperative API in
MXNet. We should rely on nvprof in the future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11795>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYZGEYRTSCQE3Q55QNSOH3QGP3BDANCNFSM4FKPUUDA>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#11795>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABHQH3RIP6DA7SN26CH5JI3QHEIUHANCNFSM4FKPUUDA>.
|
Description
Fix for #11314
Checklist
Essentials
Changes
Comments
@leezu I've used the script in #11314 and verified this can pass more than 3000 trials.