Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1413] Adding Large Tensor support for sort operators #15170

Merged
merged 1 commit into from
Jun 21, 2019

Conversation

access2rohit
Copy link
Contributor

@access2rohit access2rohit commented Jun 6, 2019

Description

ops supported: sort, topk, argmax

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-1413], where Torch nondeterministic segfault #1413 refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@access2rohit access2rohit requested a review from szha as a code owner June 6, 2019 23:42
@access2rohit access2rohit changed the title Adding Large Tensor support for sort operators [MXNET-1413] Adding Large Tensor support for sort operators Jun 6, 2019
@access2rohit
Copy link
Contributor Author

@mxnet-label-bot Add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jun 6, 2019
@access2rohit
Copy link
Contributor Author

@apeforest please review

mxnet_op::Kernel<range_fwd, xpu>::Launch(s, batch_size * element_num, 1, 0, 1,
kWriteTo, indices.dptr_);
mxnet_op::Kernel<range_fwd, xpu>::Launch(s, batch_size * element_num, 1, static_cast<index_t>(0),
static_cast<index_t>(1), kWriteTo, indices.dptr_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just using data initializer:

Suggested change
static_cast<index_t>(1), kWriteTo, indices.dptr_);
index_t{0}, index_t{1}

@access2rohit access2rohit changed the title [MXNET-1413] Adding Large Tensor support for sort operators [WIP] [MXNET-1413] Adding Large Tensor support for sort operators Jun 7, 2019
@access2rohit access2rohit force-pushed the lts_sort branch 3 times, most recently from c3af3fa to e62d8da Compare June 7, 2019 20:24
@access2rohit access2rohit force-pushed the lts_sort branch 8 times, most recently from a9e2fff to cedab68 Compare June 20, 2019 00:57
@@ -819,7 +821,11 @@ def get_large_matrix():

# test for ret_typ=indices
nd_ret_topk = mx.nd.topk(a_nd, axis=1, ret_typ="indices", k=3, is_ascend=True).asnumpy()
assert nd_ret_topk.dtype == np.float32 # Test the default dtype
# Test the default dtype
if is_large_tensor_enabled:
Copy link
Contributor Author

@access2rohit access2rohit Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed default type from float to int32/64 based on whether large tensor support is enabled or not

@@ -860,7 +866,10 @@ def get_large_matrix():
nd_ret_topk_val = nd_ret_topk_val.asnumpy()
nd_ret_topk_ind = nd_ret_topk_ind.asnumpy()
assert nd_ret_topk_val.dtype == dtype
assert nd_ret_topk_ind.dtype == np.float32
if is_large_tensor_enabled:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@access2rohit
Copy link
Contributor Author

@apeforest please review

@access2rohit access2rohit changed the title [WIP] [MXNET-1413] Adding Large Tensor support for sort operators [MXNET-1413] Adding Large Tensor support for sort operators Jun 20, 2019
Tensor<xpu, 1, index_t> workspace =
ctx.requested[0].get_space_typed<xpu, 1, index_t>(Shape1(batch_size * k + batch_size), s);
Tensor<xpu, 1, index_t> sel_indices =
Tensor<xpu, 1, index_t>((workspace.dptr_), Shape1(batch_size * k), s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parenthesis around workspace.dptr_ seems redundant

CHECK(type_assign(&(*out_attrs)[1], mshadow::kInt32))
<< "Failed to set the type of ret_indices to int32.";
#endif
<< "Failed to set the type of ret_indices to int32.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message should be different when MSHADOW_USE_INT64_TENSOR_SIZE is on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better if we logged "Failed to set the type of ret_indices"

k = nd.topk(b, k=10, axis=0, dtype=np.int64)
assert np.sum(k.asnumpy() == (LARGE_X - 1)) == SMALL_Y
b = create_2d_tensor(rows=SMALL_Y, columns=LARGE_X)
l = nd.topk(b, k=1, axis=-1, dtype=np.int64, ret_typ="value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also test when ret_typ is "both" and "indices"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Default is indices. I can check for "both" as well

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change with iterations of updates to make the API backward compatible. LGTM overall except a few minor changes.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comment about uninitalized variables.

@@ -605,30 +633,32 @@ void TopKBackwardImpl(const OpContext &ctx,
using namespace mshadow::expr;
Stream<xpu> *s = ctx.run_ctx.get_stream<xpu>();
CHECK(param.ret_typ == topk_enum::kReturnValue || param.ret_typ == topk_enum::kReturnBoth);
int batch_size, element_num; // number of batches + the size of each batch
size_t batch_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get into the good practice of not leaving variables uninitialized?
This is verboten in many serious circumstances, see automotive, MISRA, aerospatial.... I know we are parsing the arguments later, but still.
Same thing for above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I can do this for primitive types which come with default garbage values. I will refrain from intializing class objects, since they are null by default unless explicitly assigned memory.
@larroy does that sound good ?
@apeforest what are your thoughts on this ?

Copy link
Contributor

@larroy larroy Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was referring to primitive types. Which class objects you referr to? When initializing a class the fields are not null, it actually depends on what's on the ctor. If there's no ctor and the type is pod is garbage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @larroy leaving no uninit variables is a good SE practice. Class object should have its own default constructors, if not, then the design of class need improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larroy I was talking about objects of Tensor class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only pods need to be initialized in any circumstance including class ctor (otherwise garbage), so answering to your original question, yes. And seems we all agree here. Objects don't need explicit = initialization as per ctor as @apeforest pointed out.

@access2rohit access2rohit force-pushed the lts_sort branch 3 times, most recently from d7c78fa to 496dd9f Compare June 20, 2019 23:17
b = create_2d_tensor(rows=SMALL_Y, columns=LARGE_X)
l = nd.topk(b, k=1, axis=-1, dtype=np.int64, ret_typ="value")
assert l.sum() == np.sum(np.arange(0, SMALL_Y))
b = create_2d_tensor(rows=LARGE_X, columns=SMALL_Y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we create b multiple times? Can we reuse one to save computation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reuse 1st but 2nd still needs to be created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine. don't need to do in this PR.

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot Add [pr-awaiting-merge]

@marcoabreu marcoabreu added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Jun 21, 2019
@apeforest
Copy link
Contributor

@access2rohit Can you check the CI status? If it got staled, please retrigger it again.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lebeg
Copy link
Contributor

lebeg commented Jun 26, 2019

Unfortunately this change has broken NightlyTestsForBinaries #15374

access2rohit added a commit to access2rohit/incubator-mxnet that referenced this pull request Sep 25, 2019
access2rohit added a commit to access2rohit/incubator-mxnet that referenced this pull request Sep 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants