Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Softmax with length #15169

Merged
merged 4 commits into from
Jul 19, 2019
Merged

Softmax with length #15169

merged 4 commits into from
Jul 19, 2019

Conversation

haojin2
Copy link
Contributor

@haojin2 haojin2 commented Jun 6, 2019

Description

Softmax with extra length input to specify line length of input on an axis.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Softmax with length
  • Unit test

Comments

Flakiness Check:

MXNET_TEST_COUNT=10000 nosetests tests/python/unittest/test_operator.py:test_softmax_with_length
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=507099719 to reproduce.
[23:18:55] src/operator/contrib/../tensor/./../../common/utils.h:450: MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See https://mxnet.incubator.apache.org/versions/master/faq/env_var.html for more details.
.
----------------------------------------------------------------------
Ran 1 test in 165.490s

OK
MXNET_TEST_COUNT=10000 nosetests tests/python/gpu/test_operator_gpu.py:test_softmax_with_length
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=634837637 to reproduce.
[23:24:32] src/operator/contrib/../tensor/./../../common/utils.h:450: MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See https://mxnet.incubator.apache.org/versions/master/faq/env_var.html for more details.
[23:24:32] src/operator/contrib/../tensor/./../../common/utils.h:450: MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See https://mxnet.incubator.apache.org/versions/master/faq/env_var.html for more details.
.
----------------------------------------------------------------------
Ran 1 test in 137.404s

OK

Benchmark results:
CPU: ~1.82x speedup
GPU: ~1.59x speedup

Benchmark script:

import mxnet as mx
import numpy as np
from mxnet.test_utils import check_speed, rand_ndarray

ctx = mx.gpu(0)
ctx = mx.cpu()

shape = (96, 1024, 1024)

len_shape = (96, 1024)

data = rand_ndarray(shape, ctx=ctx)

np_length = np.zeros(len_shape, dtype=np.int32)

all_length = np.arange(1, 1025, 1, dtype=np.int32)

for i in range(len_shape[0]):
    np_length[i, :] = all_length

length = mx.nd.array(np_length, ctx=ctx, dtype=np.int32)

mx_data = mx.sym.Variable("data")
mx_length = mx.sym.Variable("length")

mx_sym = mx.sym.softmax(data=mx_data, length=mx_length, use_length=True, axis=1)

print(check_speed(mx_sym, location={"data": data, "length": length}, ctx=ctx, N=100, typ='whole'))

mx_sym = mx.sym.softmax(data=mx_data, axis=1)

print(check_speed(mx_sym, location={"data": data}, ctx=ctx, N=100, typ='whole'))

@haojin2 haojin2 self-assigned this Jun 6, 2019
@piyushghai
Copy link
Contributor

@haojin2 Can you look into the CI failures on this ?

@mxnet-label-bot Add [Operator, pr-awaiting-review]

@marcoabreu marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Jun 7, 2019
@haojin2 haojin2 force-pushed the softmax_with_length branch 3 times, most recently from 1e400f0 to 1ad0561 Compare June 11, 2019 05:41
@haojin2 haojin2 force-pushed the softmax_with_length branch 2 times, most recently from 06f25aa to 1abd460 Compare June 21, 2019 03:00
@haojin2
Copy link
Contributor Author

haojin2 commented Jul 12, 2019

@szha @eric-haibin-lin Finally the CI passed... Please give a review when you have time.

}
} else {
if (shape.ndim() == 2) {
SoftmaxGrad<OP1, OP2, Req, negate, DType>(
MXNET_INT_TYPE_SWITCH(inputs[2].type_flag_, IType, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to iterate different int types for length? Can we just cast the type to int64_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not necessarily be int64_t, and what do you mean by "cast the type to int64_t"? Allocating a new buffer on the fly within the operator to hold the casted length input? I would consider that a big performance bottleneck.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not necessarily be int64_t. I think you can use the same data type as M, which is index_t I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand length is a tensor here. I guess my question is do we really need to care what the dtype of the values from length tensor and use a MXNET_INT_TYPE_SWITCH macro here to iterate. Can we just simply cast them to index_t regardless of the dtype of length tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There's no iteration at all, TYPE_SWITCH is always a switch, not a loop. I would suggest that you read the code more carefully.
  2. No matter you're doing a cast to whatever type within the kernel or not will not get rid of the TYPE_SWITCH at all. One way of doing a constant cast to index_t is within the computation kernel, like:
template<typename IType>
MSHADOW_XINLINE static void Map(int i, IType *buffer) {
  index_t val = static_cast<index_t>(buffer[i]);
}

And this would still require a TYPE_SWITCH when you're launching the kernel since this kernel is still templated and the input type is not limited.
Or if you really hate the TYPE_SWITCH for the compute kernel, you then have to cast your input buffer first:

Tensor<xpu, 1, index_t> index_t_buffer = ctx.requested[0].get_space_typed<xpu, 1, index_t>(<some shape>);
// Cast your buffer to index_t_buffer
// Launch the compute kernel with index_t_buffer, now you only need one TYPE_SWITCH for your input data type.

This I would really hate cause it's not bringing any benefit at all.
Or you only support length buffer of type int64, then you still need to use MSHADOW_IDX_TYPE_SWITCH to provide the error message when length is of other data types. Or you could insert an additional check on the data type before the kernel launches, but that would drop support for other integer types.
3. The fact that length is a tensor means that length=M is not an option, maybe you want to say length=nullptr? That would cause additional if branches within the kernel and will especially impact performance on GPU. The way this feature is implemented now puts that if branch on the host rather than within the kernel, which will have minimal performance impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your communication was not clear enough in the first place, I've now added cast to index_t in some existing code too, are you now comfortable with the changes now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. Sorry, if my earlier comments caused you confusion.

Looking at the current code, I see the SoftmaxWithLength() and Softmax() are very similar now. (differ by one if block https://github.com/apache/incubator-mxnet/pull/15169/files#diff-be02d7c5660bf5cd623601a501fc7abeR134). Do you think we can combine this two functions into one now? :)

Copy link
Contributor Author

@haojin2 haojin2 Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because you are only looking at the CPU version of the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a separate PR #15545 to optimize softmax GPU implementation, I can look into merging those 2 kernels in that PR.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate the SoftmaxWithLength function with Softmax using a default length? Having two copies of the function to serve one extra argument seems an overkill design.

@haojin2
Copy link
Contributor Author

haojin2 commented Jul 12, 2019

@apeforest Trust me or not, I've tried what you're suggesting to do on my end way before this PR was raised but found it not easily done without affecting the existing softmax's performance by much. I think I'll stick to this version for now.

@apeforest
Copy link
Contributor

@apeforest Trust me or not, I've tried what you're suggesting to do on my end way before this PR was raised but found it not easily done without affecting the existing softmax's performance by much. I think I'll stick to this version for now.

If you have tried earlier, can you post some performance degradation results by trying the other approach? It may help to decide if we want to make some trade off between performance and code simplicity here.

@haojin2 haojin2 force-pushed the softmax_with_length branch 4 times, most recently from c5e5363 to 50c1c7c Compare July 15, 2019 22:20
@apeforest
Copy link
Contributor

apeforest commented Jul 17, 2019

This PR is more like patching an existing function to implement a feature which is not generally needed. From software engineering practice, this PR introduced unnecessary redundancy in the code and the performance impact of the alternative approach is not well measured.

As we discussed offline, if this feature is urgently needed and the code has to be patched in such a way to maintain backward compatibility, it might be okay to ship as it is. However, we should add a TODO in MXNet 2.0 to rewrite softmax from scratch taking into consideration of all the required scenarios and how to make it extensible.

@szha for review and final approve.

@haojin2 haojin2 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Jul 19, 2019
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Softmax with variable length is very common for a number of application and should be worth the performance tradeoff with complexity

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve this PR to unblock the needed feature.
#15545 introduce a new softmax kernel and may need to be copied for this function as well.

@haojin2
Copy link
Contributor Author

haojin2 commented Jul 19, 2019

Thanks all for the approval, merging this now. @ptrendx @szha @apeforest

@haojin2 haojin2 merged commit 076b2f3 into apache:master Jul 19, 2019
@haojin2 haojin2 deleted the softmax_with_length branch July 19, 2019 17:43
@marcoabreu
Copy link
Contributor

@apeforest did you add the TODO for 2.0 somewhere? #15169 (comment)

@TaoLv
Copy link
Member

TaoLv commented Jul 20, 2019

Sorry for missing the review. Do we need to change MKL-DNN softmax to accommodate the change?

anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
* softmax with length forward

* softmax with length backward

* new macro to reduce compile-time heap usage and limit length to integers only

* address comments
@@ -92,6 +101,13 @@ Example::

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No documentation for example inputs with length?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Operator pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants