[Large Tensor] Fix cumsum op #17677

connorgoggins · 2020-02-24T23:14:03Z

Description

The cumsum op was previously breaking on large tensor (dimension > 2^32) data. With the following input:

run_performance_test(nd.cumsum, inputs=[{"a": nd.random_normal(shape=(2**32 + 1, 1))}], run_backward=True, warmup=1, runs=1)

the following error was thrown:

Segmentation fault (core dumped)

To root cause this issue, I ran the previous command in a Python script with GDB, and found that the underlying problem was in the data type of several variables in the forward/backward structs in np_cumsum-inl.h.h. These variables used the int dtype when they should have been using index_t to properly handle long int indices. I switched these variables to index_t in the struct header and, after rebuilding, the previous input command displayed the correct output:

INFO:root:Begin Benchmark - cumsum
INFO:root:Complete Benchmark - cumsum
[{'cumsum': [{'inputs': {'a': '<NDArray 4294967297x1 @cpu(0)>'}, 'max_storage_mem_alloc_cpu/0': 33285996.0, 'avg_time_forward_cumsum': 4366.7148, 'avg_time_backward_cumsum': 12744.9971}]}]

To ensure completeness and to prevent future breaking changes, I also added a nightly test for the cumsum op with large tensor data in tests/nightly/test_large_array.py.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

M src/operator/numpy/np_cumsum-inl.h
M tests/nightly/test_large_array.py

Comments

Tested on r5dn.24xl-ubuntu 16.04 with

Individual op run
Full OpPerf run

Results

Single operator test - cumsum op (CPU)

Full OpPerf test (CPU)

@apeforest @access2rohit @ChaiBapchya

connorgoggins · 2020-02-24T23:15:00Z

@mxnet-label-bot add [pr-awaiting-review]

haojin2

LGTM

ChaiBapchya

LGTM

src/operator/numpy/np_cumsum-inl.h

sandeep-krishnamurthy · 2020-02-26T17:31:31Z

Are there any performance regression associated with this index type change? (Performance on Before and after the change)
Thanks for fix this issue, is there a way to know if there are any other broken operators with similar issue? (Not intended to fix with this PR, but, wanted to see the blast radius and possibly create a tracking issue while we are on it here)

connorgoggins · 2020-02-26T19:35:50Z

@sandeep-krishnamurthy thanks for your questions!

If you look at performance of the cumsum op before vs. after the changes in this PR, you can see that there is no regression on the time required for a forward pass.
My current assignment has been to systematically fix all ops with similar issues (i.e. throwing errors when handling large tensor data). As of now, I have created PRs with fixes for every problematic op, and once all my remaining op fix PRs (3 remaining PRs, including this one) are merged, all similar issues will be resolved.

…DArray

apeforest · 2020-02-27T18:15:51Z

src/operator/numpy/np_cumsum-inl.h

@@ -98,11 +98,11 @@ void CumsumForwardImpl(const OpContext& ctx,
  }

  Stream<xpu> *s = ctx.get_stream<xpu>();
-  MSHADOW_TYPE_SWITCH_WITH_BOOL(in.type_flag_, IType, {
+  MSHADOW_TYPE_SWITCH_WITH_BOOL(in.type_flag_, index_t, {


I think you will not need this switch macro, right?

Great point, reverted to IType.

access2rohit · 2020-02-27T18:17:04Z

src/operator/numpy/np_cumsum-inl.h

@@ -157,10 +157,10 @@ void CumsumBackwardImpl(const OpContext& ctx,
    }
  }
  Stream<xpu> *s = ctx.get_stream<xpu>();
-  MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, IType, {
+  MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, index_t, {


Can you revert this change as well ? rest LGTM !

Great point, reverted to IType.

access2rohit · 2020-02-27T18:17:15Z

src/operator/numpy/np_cumsum-inl.h

    MSHADOW_TYPE_SWITCH(ograd.type_flag_, OType, {
      Kernel<cumsum_backward, xpu>::Launch(
-        s, igrad.Size() / middle, igrad.dptr<IType>(),
+        s, igrad.Size() / middle, igrad.dptr<index_t>(),


Reverted, thanks!

apeforest · 2020-02-27T18:18:25Z

src/operator/numpy/np_cumsum-inl.h

@@ -157,10 +157,10 @@ void CumsumBackwardImpl(const OpContext& ctx,
    }
  }
  Stream<xpu> *s = ctx.get_stream<xpu>();
-  MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, IType, {
+  MSHADOW_TYPE_SWITCH_WITH_BOOL(igrad.type_flag_, index_t, {


You don't need a macro if the type is predefined.

access2rohit

LGTM!

* Implemented fix and nightly test for cumsum * Changed IType to index_t * Also changed in backward * Reverting to IType * Added type assertion on first element to force evaluation of output NDArray * Reverted to IType in relevant places * Last reversion * Changed type assertion to value check

lanking520 added the pr-awaiting-review PR is waiting for code review label Feb 24, 2020

haojin2 approved these changes Feb 25, 2020

View reviewed changes

ChaiBapchya approved these changes Feb 25, 2020

View reviewed changes

connorgoggins force-pushed the fix_cumsum_large_tensor branch 2 times, most recently from fd32e6a to c065cdc Compare February 25, 2020 18:48

access2rohit reviewed Feb 26, 2020

View reviewed changes

src/operator/numpy/np_cumsum-inl.h Show resolved Hide resolved

access2rohit reviewed Feb 26, 2020

View reviewed changes

src/operator/numpy/np_cumsum-inl.h Show resolved Hide resolved

connorgoggins added 5 commits February 26, 2020 14:48

Implemented fix and nightly test for cumsum

3ad2c8a

Changed IType to index_t

10788ed

Also changed in backward

4ba0216

Reverting to IType

29aa1d4

Added type assertion on first element to force evaluation of output N…

8b849f1

…DArray

connorgoggins force-pushed the fix_cumsum_large_tensor branch from e8dee74 to 8b849f1 Compare February 26, 2020 22:49

apeforest reviewed Feb 27, 2020

View reviewed changes

access2rohit reviewed Feb 27, 2020

View reviewed changes

apeforest reviewed Feb 27, 2020

View reviewed changes

connorgoggins added 3 commits February 27, 2020 10:20

Reverted to IType in relevant places

9e18161

Last reversion

e121cc0

Changed type assertion to value check

98ffb36

access2rohit approved these changes Feb 27, 2020

View reviewed changes

apeforest approved these changes Feb 29, 2020

View reviewed changes

apeforest merged commit 2527553 into apache:master Feb 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Large Tensor] Fix cumsum op #17677

[Large Tensor] Fix cumsum op #17677

connorgoggins commented Feb 24, 2020 •

edited

Loading

connorgoggins commented Feb 24, 2020

haojin2 left a comment

ChaiBapchya left a comment

sandeep-krishnamurthy commented Feb 26, 2020

connorgoggins commented Feb 26, 2020

apeforest Feb 27, 2020

connorgoggins Feb 27, 2020

access2rohit Feb 27, 2020

connorgoggins Feb 27, 2020

access2rohit Feb 27, 2020

connorgoggins Feb 27, 2020

apeforest Feb 27, 2020

access2rohit left a comment

[Large Tensor] Fix cumsum op #17677

[Large Tensor] Fix cumsum op #17677

Conversation

connorgoggins commented Feb 24, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

Results

connorgoggins commented Feb 24, 2020

haojin2 left a comment

Choose a reason for hiding this comment

ChaiBapchya left a comment

Choose a reason for hiding this comment

sandeep-krishnamurthy commented Feb 26, 2020

connorgoggins commented Feb 26, 2020

apeforest Feb 27, 2020

Choose a reason for hiding this comment

connorgoggins Feb 27, 2020

Choose a reason for hiding this comment

access2rohit Feb 27, 2020

Choose a reason for hiding this comment

connorgoggins Feb 27, 2020

Choose a reason for hiding this comment

access2rohit Feb 27, 2020

Choose a reason for hiding this comment

connorgoggins Feb 27, 2020

Choose a reason for hiding this comment

apeforest Feb 27, 2020

Choose a reason for hiding this comment

access2rohit left a comment

Choose a reason for hiding this comment

connorgoggins commented Feb 24, 2020 •

edited

Loading