Allow large array operation in MXNet #348

apeforest · 2018-07-12T04:44:49Z

This change is a dependency for the integer overflow issue in MXNet where a large ND array is created and converted to numpy.

An earlier PR was reviewed but it makes too significant change to mshadow module and is not necessary. This PR is a new one and it has been verified in the MXNet issue filed above.

apeforest · 2018-07-12T04:45:48Z

@eric-haibin-lin @tqchen @piiswrong Please review this PR for large array support in MXNet. Thx.

piiswrong · 2018-07-16T18:51:16Z

If we are going to change this we should change it to signed int64_t instead of unsigned size_t.

Although there could be pitfalls. @tqchen any thoughts?

tqchen · 2018-07-16T20:37:12Z

int64_t is a better choice than size_t

apeforest · 2018-07-16T21:27:21Z

@tqchen Could you elaborate a little more on the advantage of using int64_t over size_t here? Curious to learn. Thanks!

tqchen · 2018-07-16T22:04:14Z

The main reason is that int64_t is used by default by DLPack and a few other frameworks. It also allows introducing -1 as some special placeholder when necessary.

size_t is usually not preferred in data-structure members, because it has a cross-platform issue(size_t can be 32 bit in 32bit platforms). I have explained the reason to favor int64_t over uint64_t

apeforest · 2018-07-18T03:44:41Z

@tqchen Thanks for your explanation. I have change to use int64_t as type for the tensor dimension. For the completeness of this change, instead of only changing return type of the size() function, I updated the type definition of index_t from unsigned to int64_t.

I have written simple example to check runtime impact to softmax and relu operators in MXNet and did not see noticeable difference. Any other suggestion on performance test is appreciated.

import mxnet as mx
from mxnet.test_utils import check_speed

NUM_ITER = 1000

def check_softmax_cpu():
    data_shape=(1,10000)
    data = mx.sym.Variable(name='data',shape=data_shape)
    ctx=mx.cpu(0)
    avg_runtime = 0
    for iter in range(NUM_ITER):
        x = mx.nd.random.normal(0, 1.0, shape=data_shape, ctx=ctx)
        net = mx.sym.softmax(data=data)
        avg_runtime += check_speed(sym=net, location={'data': x}, ctx=ctx, N=10, grad_req='null', typ='forward') * 1000
    avg_runtime /= NUM_ITER
    print('runtime of softmax: ' + str(avg_runtime))

def check_relu_cpu():
    data_shape=(1,10000)
    data = mx.sym.Variable(name='data',shape=data_shape)
    ctx=mx.cpu(0)
    avg_runtime = 0
    for iter in range(NUM_ITER):
        x = mx.nd.random.normal(0, 1.0, shape=data_shape, ctx=ctx)
        net = mx.sym.relu(data=data)
        avg_runtime += check_speed(sym=net, location={'data': x}, ctx=ctx, N=10, grad_req='null', typ='forward') * 1000
    avg_runtime /= NUM_ITER
    print('runtime of relu: ' + str(avg_runtime))

check_softmax_cpu()
check_relu_cpu()

apeforest · 2018-07-23T16:29:51Z

@tqchen I will appreciate if you could help to review this at your convenience or relay to other reviewers for me. We have another PR (apache/mxnet#11742) dependent on this. Thanks!

apeforest · 2018-07-30T18:52:01Z

@piiswrong @tqchen Would appreciate if you could help review this at your earliest convenience. There is a MXNet issue depending on this. Thanks!

This reverts commit d68d369.

Revert "Allow large array operation in MXNet (#348)"

Re-Revert PR #348 to support large tensor size in MXNet

Support large array

dfb7ba5

Change stride type to size_t

b22d8db

apeforest mentioned this pull request Jul 13, 2018

[MXNET-623] Fixing an integer overflow bug in large NDArray apache/mxnet#11742

Merged

3 tasks

apeforest added 4 commits July 17, 2018 14:04

Change type index_t from unsigned to int64_t

f41b77d

update README

6285f47

Update README

5db8115

Change size_t of shape dimension to index_t

01be25e

tqchen approved these changes Jul 31, 2018

View reviewed changes

tqchen merged commit d68d369 into dmlc:master Jul 31, 2018

apeforest mentioned this pull request Sep 25, 2018

[MXNET-953] Fix oob memory read apache/mxnet#12631

Merged

5 tasks

azai91 mentioned this pull request Sep 28, 2018

allow index to be larger data type apache/tvm#1780

Closed

azai91 added a commit to azai91/mshadow that referenced this pull request Sep 28, 2018

Revert "Allow large array operation in MXNet (dmlc#348)"

9a16ba3

This reverts commit d68d369.

eric-haibin-lin added a commit that referenced this pull request Sep 28, 2018

Merge pull request #357 from azai91/revert/d68d3

839bcc8

Revert "Allow large array operation in MXNet (#348)"

apeforest mentioned this pull request Sep 29, 2018

Re-Revert PR #348 to support large tensor size in MXNet #359

Merged

szha added a commit that referenced this pull request Oct 4, 2018

Merge pull request #359 from apeforest/bugfix/large-array

696803b

Re-Revert PR #348 to support large tensor size in MXNet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow large array operation in MXNet #348

Allow large array operation in MXNet #348

apeforest commented Jul 12, 2018

apeforest commented Jul 12, 2018

piiswrong commented Jul 16, 2018

tqchen commented Jul 16, 2018

apeforest commented Jul 16, 2018

tqchen commented Jul 16, 2018 •

edited

Loading

apeforest commented Jul 18, 2018

apeforest commented Jul 23, 2018

apeforest commented Jul 30, 2018

Allow large array operation in MXNet #348

Allow large array operation in MXNet #348

Conversation

apeforest commented Jul 12, 2018

apeforest commented Jul 12, 2018

piiswrong commented Jul 16, 2018

tqchen commented Jul 16, 2018

apeforest commented Jul 16, 2018

tqchen commented Jul 16, 2018 • edited Loading

apeforest commented Jul 18, 2018

apeforest commented Jul 23, 2018

apeforest commented Jul 30, 2018

tqchen commented Jul 16, 2018 •

edited

Loading