Softmax with length #15169

haojin2 · 2019-06-06T23:30:18Z

Description

Softmax with extra length input to specify line length of input on an axis.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Softmax with length
Unit test

Comments

Flakiness Check:

MXNET_TEST_COUNT=10000 nosetests tests/python/unittest/test_operator.py:test_softmax_with_length
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=507099719 to reproduce.
[23:18:55] src/operator/contrib/../tensor/./../../common/utils.h:450: MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See https://mxnet.incubator.apache.org/versions/master/faq/env_var.html for more details.
.
----------------------------------------------------------------------
Ran 1 test in 165.490s

OK
MXNET_TEST_COUNT=10000 nosetests tests/python/gpu/test_operator_gpu.py:test_softmax_with_length
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=634837637 to reproduce.
[23:24:32] src/operator/contrib/../tensor/./../../common/utils.h:450: MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See https://mxnet.incubator.apache.org/versions/master/faq/env_var.html for more details.
[23:24:32] src/operator/contrib/../tensor/./../../common/utils.h:450: MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See https://mxnet.incubator.apache.org/versions/master/faq/env_var.html for more details.
.
----------------------------------------------------------------------
Ran 1 test in 137.404s

OK

Benchmark results:
CPU: ~1.82x speedup
GPU: ~1.59x speedup

Benchmark script:

import mxnet as mx
import numpy as np
from mxnet.test_utils import check_speed, rand_ndarray

ctx = mx.gpu(0)
ctx = mx.cpu()

shape = (96, 1024, 1024)

len_shape = (96, 1024)

data = rand_ndarray(shape, ctx=ctx)

np_length = np.zeros(len_shape, dtype=np.int32)

all_length = np.arange(1, 1025, 1, dtype=np.int32)

for i in range(len_shape[0]):
    np_length[i, :] = all_length

length = mx.nd.array(np_length, ctx=ctx, dtype=np.int32)

mx_data = mx.sym.Variable("data")
mx_length = mx.sym.Variable("length")

mx_sym = mx.sym.softmax(data=mx_data, length=mx_length, use_length=True, axis=1)

print(check_speed(mx_sym, location={"data": data, "length": length}, ctx=ctx, N=100, typ='whole'))

mx_sym = mx.sym.softmax(data=mx_data, axis=1)

print(check_speed(mx_sym, location={"data": data}, ctx=ctx, N=100, typ='whole'))

piyushghai · 2019-06-07T06:05:11Z

@haojin2 Can you look into the CI failures on this ?

@mxnet-label-bot Add [Operator, pr-awaiting-review]

haojin2 · 2019-07-12T02:57:57Z

@szha @eric-haibin-lin Finally the CI passed... Please give a review when you have time.

src/operator/nn/softmax-inl.h

tests/python/unittest/test_operator.py

src/operator/nn/softmax-inl.h

apeforest · 2019-07-12T04:34:23Z

src/operator/nn/softmax-inl.h

          }
        } else {
-          if (shape.ndim() == 2) {
-            SoftmaxGrad<OP1, OP2, Req, negate, DType>(
+          MXNET_INT_TYPE_SWITCH(inputs[2].type_flag_, IType, {


Do we really need to iterate different int types for length? Can we just cast the type to int64_t.

It may not necessarily be int64_t, and what do you mean by "cast the type to int64_t"? Allocating a new buffer on the fly within the operator to hold the casted length input? I would consider that a big performance bottleneck.

It does not necessarily be int64_t. I think you can use the same data type as M, which is index_t I think.

I understand length is a tensor here. I guess my question is do we really need to care what the dtype of the values from length tensor and use a MXNET_INT_TYPE_SWITCH macro here to iterate. Can we just simply cast them to index_t regardless of the dtype of length tensor.

There's no iteration at all, TYPE_SWITCH is always a switch, not a loop. I would suggest that you read the code more carefully.

No matter you're doing a cast to whatever type within the kernel or not will not get rid of the TYPE_SWITCH at all. One way of doing a constant cast to index_t is within the computation kernel, like:

template<typename IType> MSHADOW_XINLINE static void Map(int i, IType *buffer) { index_t val = static_cast<index_t>(buffer[i]); }

And this would still require a TYPE_SWITCH when you're launching the kernel since this kernel is still templated and the input type is not limited.
Or if you really hate the TYPE_SWITCH for the compute kernel, you then have to cast your input buffer first:

Tensor<xpu, 1, index_t> index_t_buffer = ctx.requested[0].get_space_typed<xpu, 1, index_t>(<some shape>); // Cast your buffer to index_t_buffer // Launch the compute kernel with index_t_buffer, now you only need one TYPE_SWITCH for your input data type.

This I would really hate cause it's not bringing any benefit at all.
Or you only support length buffer of type int64, then you still need to use MSHADOW_IDX_TYPE_SWITCH to provide the error message when length is of other data types. Or you could insert an additional check on the data type before the kernel launches, but that would drop support for other integer types.
3. The fact that length is a tensor means that length=M is not an option, maybe you want to say length=nullptr? That would cause additional if branches within the kernel and will especially impact performance on GPU. The way this feature is implemented now puts that if branch on the host rather than within the kernel, which will have minimal performance impact.

I think your communication was not clear enough in the first place, I've now added cast to index_t in some existing code too, are you now comfortable with the changes now?

Thanks for making the changes. Sorry, if my earlier comments caused you confusion.

Looking at the current code, I see the SoftmaxWithLength() and Softmax() are very similar now. (differ by one if block https://github.com/apache/incubator-mxnet/pull/15169/files#diff-be02d7c5660bf5cd623601a501fc7abeR134). Do you think we can combine this two functions into one now? :)

No, because you are only looking at the CPU version of the code.

Are you referring to these two kernel functions?
https://github.com/apache/incubator-mxnet/blob/50c1c7ca7f2e6a7864e7a0aedc775ae7dd8be091/src/operator/nn/softmax-inl.h#L283
https://github.com/apache/incubator-mxnet/blob/50c1c7ca7f2e6a7864e7a0aedc775ae7dd8be091/src/operator/nn/softmax-inl.h#L339

They also only differ by a few lines. Would you think a better way to consolidate them?

I have a separate PR #15545 to optimize softmax GPU implementation, I can look into merging those 2 kernels in that PR.

apeforest

Can we consolidate the SoftmaxWithLength function with Softmax using a default length? Having two copies of the function to serve one extra argument seems an overkill design.

haojin2 · 2019-07-12T10:48:38Z

@apeforest Trust me or not, I've tried what you're suggesting to do on my end way before this PR was raised but found it not easily done without affecting the existing softmax's performance by much. I think I'll stick to this version for now.

apeforest · 2019-07-13T06:24:56Z

@apeforest Trust me or not, I've tried what you're suggesting to do on my end way before this PR was raised but found it not easily done without affecting the existing softmax's performance by much. I think I'll stick to this version for now.

If you have tried earlier, can you post some performance degradation results by trying the other approach? It may help to decide if we want to make some trade off between performance and code simplicity here.

src/operator/nn/softmax-inl.h

python/mxnet/test_utils.py

apeforest · 2019-07-17T21:50:17Z

This PR is more like patching an existing function to implement a feature which is not generally needed. From software engineering practice, this PR introduced unnecessary redundancy in the code and the performance impact of the alternative approach is not well measured.

As we discussed offline, if this feature is urgently needed and the code has to be patched in such a way to maintain backward compatibility, it might be okay to ship as it is. However, we should add a TODO in MXNet 2.0 to rewrite softmax from scratch taking into consideration of all the required scenarios and how to make it extensible.

@szha for review and final approve.

…ers only

szha

Approved. Softmax with variable length is very common for a number of application and should be worth the performance tradeoff with complexity

apeforest

Approve this PR to unblock the needed feature.
#15545 introduce a new softmax kernel and may need to be copied for this function as well.

haojin2 · 2019-07-19T17:43:07Z

Thanks all for the approval, merging this now. @ptrendx @szha @apeforest

marcoabreu · 2019-07-19T20:28:02Z

@apeforest did you add the TODO for 2.0 somewhere? #15169 (comment)

TaoLv · 2019-07-20T03:44:41Z

Sorry for missing the review. Do we need to change MKL-DNN softmax to accommodate the change?

* softmax with length forward * softmax with length backward * new macro to reduce compile-time heap usage and limit length to integers only * address comments

eric-haibin-lin · 2019-08-30T20:25:54Z

src/operator/nn/softmax.cc

@@ -92,6 +101,13 @@ Example::



No documentation for example inputs with length?

haojin2 requested review from szha and eric-haibin-lin June 6, 2019 23:30

haojin2 self-assigned this Jun 6, 2019

marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Jun 7, 2019

haojin2 force-pushed the softmax_with_length branch 3 times, most recently from 1e400f0 to 1ad0561 Compare June 11, 2019 05:41

haojin2 force-pushed the softmax_with_length branch 2 times, most recently from 06f25aa to 1abd460 Compare June 21, 2019 03:00

haojin2 force-pushed the softmax_with_length branch from 1abd460 to 1edf38f Compare July 1, 2019 09:14

haojin2 force-pushed the softmax_with_length branch from 1edf38f to 95394a7 Compare July 12, 2019 00:13

apeforest reviewed Jul 12, 2019

View reviewed changes

src/operator/nn/softmax-inl.h Show resolved Hide resolved

apeforest reviewed Jul 12, 2019

View reviewed changes

tests/python/unittest/test_operator.py Outdated Show resolved Hide resolved

apeforest reviewed Jul 12, 2019

View reviewed changes

src/operator/nn/softmax-inl.h Show resolved Hide resolved

apeforest reviewed Jul 12, 2019

View reviewed changes

apeforest reviewed Jul 15, 2019

View reviewed changes

src/operator/nn/softmax-inl.h Outdated Show resolved Hide resolved

apeforest reviewed Jul 15, 2019

View reviewed changes

src/operator/nn/softmax-inl.h Outdated Show resolved Hide resolved

apeforest reviewed Jul 15, 2019

View reviewed changes

src/operator/nn/softmax-inl.h Outdated Show resolved Hide resolved

apeforest reviewed Jul 15, 2019

View reviewed changes

src/operator/nn/softmax-inl.h Outdated Show resolved Hide resolved

haojin2 force-pushed the softmax_with_length branch 4 times, most recently from c5e5363 to 50c1c7c Compare July 15, 2019 22:20

ptrendx approved these changes Jul 17, 2019

View reviewed changes

haojin2 force-pushed the softmax_with_length branch from 50c1c7c to b940812 Compare July 17, 2019 21:22

apeforest reviewed Jul 17, 2019

View reviewed changes

python/mxnet/test_utils.py Outdated Show resolved Hide resolved

apeforest reviewed Jul 17, 2019

View reviewed changes

python/mxnet/test_utils.py Outdated Show resolved Hide resolved

haojin2 force-pushed the softmax_with_length branch from b940812 to 50d8ee7 Compare July 17, 2019 21:46

haojin2 added 4 commits July 18, 2019 18:50

softmax with length forward

bef987d

softmax with length backward

279a053

new macro to reduce compile-time heap usage and limit length to integ…

4d295c2

…ers only

address comments

8d1fc65

haojin2 force-pushed the softmax_with_length branch from 50d8ee7 to 8d1fc65 Compare July 18, 2019 18:50

haojin2 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Jul 19, 2019

szha approved these changes Jul 19, 2019

View reviewed changes

apeforest approved these changes Jul 19, 2019

View reviewed changes

haojin2 merged commit 076b2f3 into apache:master Jul 19, 2019

haojin2 deleted the softmax_with_length branch July 19, 2019 17:43

ptrendx mentioned this pull request Aug 29, 2019

[FEATURE] [WIP] Use softmax with length in attention cells dmlc/gluon-nlp#910

Closed

4 tasks

eric-haibin-lin reviewed Aug 30, 2019

View reviewed changes

src/operator/nn/softmax.cc

@@ -92,6 +101,13 @@ Example::

Copy link

Member

eric-haibin-lin Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No documentation for example inputs with length?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Softmax with length #15169

Softmax with length #15169

haojin2 commented Jun 6, 2019 •

edited

Loading

piyushghai commented Jun 7, 2019

haojin2 commented Jul 12, 2019

apeforest Jul 12, 2019

haojin2 Jul 12, 2019

apeforest Jul 13, 2019

apeforest Jul 13, 2019

haojin2 Jul 13, 2019

haojin2 Jul 15, 2019

apeforest Jul 15, 2019

haojin2 Jul 15, 2019 •

edited

Loading

apeforest Jul 17, 2019

ptrendx Jul 18, 2019

apeforest left a comment

haojin2 commented Jul 12, 2019

apeforest commented Jul 13, 2019

apeforest commented Jul 17, 2019 •

edited

Loading

szha left a comment

apeforest left a comment

haojin2 commented Jul 19, 2019

marcoabreu commented Jul 19, 2019

TaoLv commented Jul 20, 2019

eric-haibin-lin Aug 30, 2019

Softmax with length #15169

Softmax with length #15169

Conversation

haojin2 commented Jun 6, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

piyushghai commented Jun 7, 2019

haojin2 commented Jul 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haojin2 Jul 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest left a comment

Choose a reason for hiding this comment

haojin2 commented Jul 12, 2019

apeforest commented Jul 13, 2019

apeforest commented Jul 17, 2019 • edited Loading

szha left a comment

Choose a reason for hiding this comment

apeforest left a comment

Choose a reason for hiding this comment

haojin2 commented Jul 19, 2019

marcoabreu commented Jul 19, 2019

TaoLv commented Jul 20, 2019

Choose a reason for hiding this comment

haojin2 commented Jun 6, 2019 •

edited

Loading

haojin2 Jul 15, 2019 •

edited

Loading

apeforest commented Jul 17, 2019 •

edited

Loading