Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix a memory misalignment in topk operator #15948

Merged
merged 7 commits into from
Aug 23, 2019

Conversation

apeforest
Copy link
Contributor

Description

Current memory alignment in topk operator is incorrect if index_t is using int64_t. This PR fixes the potential issue. It partially fixes #15703

The PR also fixes an incorrect data type usage in mshadow/tensor.h

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Bug fix

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@apeforest
Copy link
Contributor Author

@access2rohit @ChaiBapchya Please also help review. Thanks

@@ -69,15 +69,15 @@ struct Shape {
* \param idx dimension index
* \return the corresponding dimension size
*/
MSHADOW_XINLINE index_t &operator[](index_t idx) {
MSHADOW_XINLINE index_t &operator[](int idx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe this change a bit more in detail ? Why is this required ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly fixing this bug. However, while checking the tensor struct, I noticed this data type should be int because it is indexing the dimension (which is set to int)

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// Temp space needed by the full sorts.
size_t temp_size = std::max(
mxnet::op::SortByKeyWorkspaceSize<index_t, DType, xpu>(src.Size()),
mxnet::op::SortByKeyWorkspaceSize<DType, index_t, xpu>(src.Size()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also include mxnet::op::SortByKeyWorkspaceSize<index_t, index_t, xpu>(src.Size()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return shape_[idx];
}
/*!
* \brief get corresponding index
* \param idx dimension index
* \return the corresponding dimension size
*/
MSHADOW_XINLINE const index_t &operator[](index_t idx) const {
MSHADOW_XINLINE const index_t &operator[](int idx) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other places in the same file that assumes idx have index_t type. we need to fix all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sxjscience sxjscience merged commit 73a692e into apache:master Aug 23, 2019
@apeforest apeforest deleted the bugfix/topk-oom branch August 23, 2019 05:03
@TaoLv
Copy link
Member

TaoLv commented Aug 23, 2019

Thank you for the fix, @apeforest . Will this be picked to the v1.5.x branch?

apeforest added a commit that referenced this pull request Aug 23, 2019
* fix alignment

* use correct type for shape index

* clean up unnecessary space in topk

* fix lint

* add additional temp space

* address reviewer comment

* fix incorrect nidex type
@apeforest
Copy link
Contributor Author

@TaoLv I have cherry-picked it to v1.5.x (commit hash: 42746bc)

Somehow the v1.5.x branch was not PR protected so when I push it to origin it got merged automatically. Please let me know if this is okay. Thanks.

@TaoLv
Copy link
Member

TaoLv commented Aug 24, 2019

@apeforest Could you please open a dummy PR against the v1.5.x branch to make sure the branch can properly pass the CI?

@apeforest
Copy link
Contributor Author

See #15999

apeforest added a commit to apeforest/incubator-mxnet that referenced this pull request Aug 25, 2019
sxjscience pushed a commit that referenced this pull request Aug 27, 2019
* Revert "Fix a memory misalignment in topk operator (#15948)"

This reverts commit 42746bc.
shuokay added a commit to shuokay/mshadow that referenced this pull request Aug 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

topk regression in v1.5
5 participants