-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Large Tensor] Fix cumsum op #17677
[Large Tensor] Fix cumsum op #17677
Conversation
@mxnet-label-bot add [pr-awaiting-review] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fd32e6a
to
c065cdc
Compare
|
@sandeep-krishnamurthy thanks for your questions!
|
e8dee74
to
8b849f1
Compare
src/operator/numpy/np_cumsum-inl.h
Outdated
@@ -98,11 +98,11 @@ void CumsumForwardImpl(const OpContext& ctx, | |||
} | |||
|
|||
Stream<xpu> *s = ctx.get_stream<xpu>(); | |||
MSHADOW_TYPE_SWITCH_WITH_BOOL(in.type_flag_, IType, { | |||
MSHADOW_TYPE_SWITCH_WITH_BOOL(in.type_flag_, index_t, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will not need this switch macro, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, reverted to IType.
src/operator/numpy/np_cumsum-inl.h
Outdated
@@ -157,10 +157,10 @@ void CumsumBackwardImpl(const OpContext& ctx, | |||
} | |||
} | |||
Stream<xpu> *s = ctx.get_stream<xpu>(); | |||
MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, IType, { | |||
MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, index_t, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this change as well ? rest LGTM !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, reverted to IType.
src/operator/numpy/np_cumsum-inl.h
Outdated
MSHADOW_TYPE_SWITCH(ograd.type_flag_, OType, { | ||
Kernel<cumsum_backward, xpu>::Launch( | ||
s, igrad.Size() / middle, igrad.dptr<IType>(), | ||
s, igrad.Size() / middle, igrad.dptr<index_t>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted, thanks!
src/operator/numpy/np_cumsum-inl.h
Outdated
@@ -157,10 +157,10 @@ void CumsumBackwardImpl(const OpContext& ctx, | |||
} | |||
} | |||
Stream<xpu> *s = ctx.get_stream<xpu>(); | |||
MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, IType, { | |||
MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, index_t, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need a macro if the type is predefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Implemented fix and nightly test for cumsum * Changed IType to index_t * Also changed in backward * Reverting to IType * Added type assertion on first element to force evaluation of output NDArray * Reverted to IType in relevant places * Last reversion * Changed type assertion to value check
* Implemented fix and nightly test for cumsum * Changed IType to index_t * Also changed in backward * Reverting to IType * Added type assertion on first element to force evaluation of output NDArray * Reverted to IType in relevant places * Last reversion * Changed type assertion to value check
Description
The cumsum op was previously breaking on large tensor (dimension > 2^32) data. With the following input:
the following error was thrown:
To root cause this issue, I ran the previous command in a Python script with GDB, and found that the underlying problem was in the data type of several variables in the forward/backward structs in
np_cumsum-inl.h.h
. These variables used theint
dtype when they should have been usingindex_t
to properly handle long int indices. I switched these variables toindex_t
in the struct header and, after rebuilding, the previous input command displayed the correct output:To ensure completeness and to prevent future breaking changes, I also added a nightly test for the cumsum op with large tensor data in
tests/nightly/test_large_array.py
.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments
Tested on r5dn.24xl-ubuntu 16.04 with
Results
Single operator test - cumsum op (CPU)
Full OpPerf test (CPU)
@apeforest @access2rohit @ChaiBapchya