-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix the bug of MXEnginePushAsyncND
and MXEnginePushSyncND
#15751
Conversation
EngineFnPropertyHandle prop_handle, int priority, | ||
const char* opr_name, bool wait) { | ||
API_BEGIN(); | ||
NDArray** const_nds = reinterpret_cast<NDArray**>(const_nds_handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret_cast
is necessary for the cast from void**
to NDArray**
.
Why did the code pass the previous tests? |
@sxjscience The previous test is also wrong. In the previous test, I passed the memory address of a NDArray into these two APIs. However, the correct method is to pass the memory address of an array of NDarray pointers. |
@sxjscience @anirudh2290 @eric-haibin-lin |
Should we backport this to the v1.5.x branch as well? |
@KellenSunderland Yes. |
@@ -258,47 +258,50 @@ TEST(Engine, PushFunc) { | |||
TEST(Engine, PushFuncND) { | |||
auto ctx = mxnet::Context{}; | |||
mxnet::NDArray nd(ctx); | |||
std::vector<mxnet::NDArray*> nds; | |||
nds.push_back(&nd); | |||
void** pnds = reinterpret_cast<void**>(nds.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want some tests for cases with more than 1 NDArray present in the argument call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @KellenSunderland, I have added the testcase with more than 1 NDArray : )
LGTM. After considering the API a little I think I wouldn't' consider this a breaking change. Consumers of this package would have gotten errors when trying to pass lists of NDArrays to the function. It's clear that this was the intent of the API. Instead I'd consider this a patch/bugfix. |
Ignore this comment. |
LOG(INFO) << "===== Test #8: PushSyncND invalid number of mutable nds ====="; | ||
res = MXEnginePushSyncND(FooSyncFunc, nullptr, nullptr, &ctx, nullptr, 0, &nd, -1); | ||
EXPECT_EQ(res, -1); | ||
std::vector<mxnet::NDArray*> nds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I'm thinking if it's more appropriate to use
std::vector<mxnet::NDArray> nds;
Do we have to use std::vector<mxnet::NDArray*>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to use std::vector<mxnet::NDArray*>
, because the type of the argument of the two APIs is an array pointer of NDArray*
. Besides, it avoids the potential copy of NDArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an argument Context in the constructor of NDArray. I do not know how to use std::vector<mxnet::NDArray>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to use std::vector<NDArray*>
if we keep the interface to be NDArrayHandle *
. However, I'm thinking whether we could directly use the std::vector<mxnet::NDArray> vec;
and use nds.emplace_back(mxnet::NDArray(ctx))
or nds.push_back(std::move(temp_arr))
to fill the vector. In that case, the existing API will not be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good, but I am worry that the type of arguments (const_vars_handle
and mutable_vars_handle
) of the two APIs MXEnginePushAsync
and MXEnginePushSync
is EngineVarHandle
, namely void*
. It casts void*
to VarHandle*
, namely Var**
in https://github.com/apache/incubator-mxnet/blob/master/src/c_api/c_api.cc#L1475. Therefore, I don't know how to decide the type of const_nds_handle
and mutable_nds_handle
in MXEnginePushAsyncND
and MXEnginePushSyncND
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the consistency, const_var_handle
and mutable_var_handle
is the pointer of an array of VarHandle
, and const_nds_handle
and mutable_var_handle
is the pointer of an array of NDArrayHandle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now understand the logic here. To make the API consistent, I think we should also change the interface of MXEnginePushAsync
and MXEnginePushSync
. We should be safe to replace EngineVarHandle
with VarHandle*
. Am I right here ? @apeforest @yuxihu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The question left is whether to change the interface of MXEnginePushAsync
and MXEnginePushSync
to make the API consistent.
ee98c12
to
14d4eb9
Compare
…#15751) * fix push sync nd api * align code * update test for syncnd * fix bug in tests/cpp/engine/threaded_engine_test * add more testcases for MXEnginePushSyncND and MXEnginePushAsyncND * fix test * fix * fix * lint * ci * retrigger CI
#15792) * fix push sync nd api * align code * update test for syncnd * fix bug in tests/cpp/engine/threaded_engine_test * add more testcases for MXEnginePushSyncND and MXEnginePushAsyncND * fix test * fix * fix * lint * ci * retrigger CI
…#15751) * fix push sync nd api * align code * update test for syncnd * fix bug in tests/cpp/engine/threaded_engine_test * add more testcases for MXEnginePushSyncND and MXEnginePushAsyncND * fix test * fix * fix * lint * ci * retrigger CI
Description
Sorry that I wrote a bug in the two APIs
MXEnginePushAsyncND
andMXEnginePushSyncND
, whose argument should be an array pointer of NDArrayHandle.Issue: #15774
master branch: #15751
v1.5.x branch: #15775
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments