Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Allow large array operation in MXNet #348

Merged
merged 6 commits into from
Jul 31, 2018

Conversation

apeforest
Copy link
Contributor

This change is a dependency for the integer overflow issue in MXNet where a large ND array is created and converted to numpy.

An earlier PR was reviewed but it makes too significant change to mshadow module and is not necessary. This PR is a new one and it has been verified in the MXNet issue filed above.

@apeforest
Copy link
Contributor Author

@eric-haibin-lin @tqchen @piiswrong Please review this PR for large array support in MXNet. Thx.

@piiswrong
Copy link
Member

If we are going to change this we should change it to signed int64_t instead of unsigned size_t.

Although there could be pitfalls. @tqchen any thoughts?

@tqchen
Copy link
Member

tqchen commented Jul 16, 2018

int64_t is a better choice than size_t

@apeforest
Copy link
Contributor Author

@tqchen Could you elaborate a little more on the advantage of using int64_t over size_t here? Curious to learn. Thanks!

@tqchen
Copy link
Member

tqchen commented Jul 16, 2018

The main reason is that int64_t is used by default by DLPack and a few other frameworks. It also allows introducing -1 as some special placeholder when necessary.

size_t is usually not preferred in data-structure members, because it has a cross-platform issue(size_t can be 32 bit in 32bit platforms). I have explained the reason to favor int64_t over uint64_t

@apeforest
Copy link
Contributor Author

@tqchen Thanks for your explanation. I have change to use int64_t as type for the tensor dimension. For the completeness of this change, instead of only changing return type of the size() function, I updated the type definition of index_t from unsigned to int64_t.

I have written simple example to check runtime impact to softmax and relu operators in MXNet and did not see noticeable difference. Any other suggestion on performance test is appreciated.

import mxnet as mx
from mxnet.test_utils import check_speed

NUM_ITER = 1000

def check_softmax_cpu():
    data_shape=(1,10000)
    data = mx.sym.Variable(name='data',shape=data_shape)
    ctx=mx.cpu(0)
    avg_runtime = 0
    for iter in range(NUM_ITER):
        x = mx.nd.random.normal(0, 1.0, shape=data_shape, ctx=ctx)
        net = mx.sym.softmax(data=data)
        avg_runtime += check_speed(sym=net, location={'data': x}, ctx=ctx, N=10, grad_req='null', typ='forward') * 1000
    avg_runtime /= NUM_ITER
    print('runtime of softmax: ' + str(avg_runtime))

def check_relu_cpu():
    data_shape=(1,10000)
    data = mx.sym.Variable(name='data',shape=data_shape)
    ctx=mx.cpu(0)
    avg_runtime = 0
    for iter in range(NUM_ITER):
        x = mx.nd.random.normal(0, 1.0, shape=data_shape, ctx=ctx)
        net = mx.sym.relu(data=data)
        avg_runtime += check_speed(sym=net, location={'data': x}, ctx=ctx, N=10, grad_req='null', typ='forward') * 1000
    avg_runtime /= NUM_ITER
    print('runtime of relu: ' + str(avg_runtime))

check_softmax_cpu()
check_relu_cpu()

@apeforest
Copy link
Contributor Author

@tqchen I will appreciate if you could help to review this at your convenience or relay to other reviewers for me. We have another PR (apache/mxnet#11742) dependent on this. Thanks!

@apeforest
Copy link
Contributor Author

@piiswrong @tqchen Would appreciate if you could help review this at your earliest convenience. There is a MXNet issue depending on this. Thanks!

@tqchen tqchen merged commit d68d369 into dmlc:master Jul 31, 2018
azai91 added a commit to azai91/mshadow that referenced this pull request Sep 28, 2018
eric-haibin-lin added a commit that referenced this pull request Sep 28, 2018
Revert "Allow large array operation in MXNet (#348)"
szha added a commit that referenced this pull request Oct 4, 2018
Re-Revert PR #348 to support large tensor size in MXNet
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants