contrib ctc interface changes, cudnn7 CTC, and gluon CTC #7442

szha · 2017-08-13T03:00:38Z

This change is to make current contrib CTC compatible with the cudnn7 CTC interface, and to add CTC loss layer for gluon.

szha · 2017-08-14T19:29:42Z

@sbodenstein the second commit in this PR is to address #7445.

piiswrong · 2017-08-15T00:38:50Z

src/operator/contrib/ctc_loss-inl.h

+}
+
+template <typename DType, typename xpu>
+inline bool PackLabelByLength(mshadow::Tensor<xpu, 2, DType> labels,


add comment

piiswrong · 2017-08-18T05:42:53Z

src/operator/contrib/ctc_loss-inl.h

@@ -240,15 +461,22 @@ class CTCLossProp : public OperatorProperty {
  int NumOutputs() const override { return 2; }

  std::vector<std::string> ListArguments() const override {
-    return {"data", "label"};
+    if (param_.use_input_lengths && param_.use_label_lengths) {
+      return {"data", "label", "input_lengths", "label_lengths"};


data_lengths

piiswrong · 2017-08-18T05:43:56Z

tests/python/unittest/test_loss.py

@@ -165,6 +165,36 @@ def test_l1_loss():
    assert mod.score(data_iter, eval_metric=mx.metric.Loss())[0][1] < 0.1


+def test_ctc_loss():
+    loss = gluon.loss.CTCLoss(padding_mask=0)


I included test_loss in test_operator_gpu and everything passes.

piiswrong · 2017-08-18T05:46:07Z

src/operator/contrib/ctc_loss-inl.h

+    // since the input is activation before softmax and cudnn ctc takes softmax
+    // apply softmax to inputs first.
+    Tensor<xpu, 3, real_t> prob(data.shape_);
+    mshadow::AllocSpace(&prob);


Dont alloc this way. Use ctx.requested

piiswrong · 2017-08-18T05:51:03Z

src/operator/contrib/ctc_loss-inl.h

+      PackLabelByLength(labels, in_data[kLabelLength].get<xpu, 1, real_t>(s),
+                        &packed_labels, &label_lengths);
+    } else {
+#if defined(__CUDACC__) && MXNET_USE_CUDNN == 1 && CUDNN_MAJOR >= 7


remove these ifs

piiswrong · 2017-08-18T05:51:33Z

src/operator/contrib/ctc_loss-inl.h

+    CUDNN_CALL(cudnnDestroyCTCLossDescriptor(ctc_desc_));
+    CUDNN_CALL(cudnnDestroyTensorDescriptor(prob_desc_));
+    CUDNN_CALL(cudnnDestroyTensorDescriptor(grad_desc_));
+#endif


if/endif shouldn't cross function

piiswrong · 2017-08-18T05:52:48Z

python/mxnet/gluon/loss.py

+        Layout of the output sequence activation vector.
+    label_layout : str, default 'NT'
+        Layout of the labels.
+    use_input_lengths : bool, default False


no need for the flag. just see if argument is none

piiswrong · 2017-08-18T05:52:54Z

python/mxnet/gluon/loss.py

+
+    Parameters
+    ----------
+    output_layout : str, default 'NTC'


output_layout -> layout

piiswrong · 2017-08-18T05:54:16Z

python/mxnet/gluon/loss.py

+        lengths of labels. Only required when `use_label_lengths` is false.
+    weight : float or None
+        Global scalar weight for loss.
+    input_lengths : NDArray or None,


these are argument to forward. List separately in input/output section

piiswrong · 2017-08-18T05:56:26Z

src/operator/contrib/ctc_loss-inl.h

+  int batch = labels.size(0);
+  int max_num_labels = labels.size(1);
+  std::vector<int> cpu_labels(max_num_labels);
+  IndexTensorToVector(in_label_lengths, label_lengths);


this function does cudaMemcopy.
Can you do it only once?

piiswrong · 2017-08-18T05:56:40Z

src/operator/contrib/ctc_loss-inl.h

                                      std::vector<int> *packed_labels,
                                      std::vector<int> *label_lengths) {
  int batch = labels.size(0);
  int max_num_labels = labels.size(1);
-  std::vector<index_t> cpu_labels(max_num_labels);
+  std::vector<int> cpu_labels(max_num_labels);
+  bool exceed_limit = false;

  for (int b = 0; b < batch; ++b) {
    IndexTensorToVector(labels[b], &cpu_labels);


try to do copy only once

piiswrong · 2017-08-18T05:57:22Z

Do you have performance comp between baidu and cudnn?

szha · 2017-08-18T09:40:15Z

I compared these two implementations roughly on K80 by timing only the parts where the two implementation diverges. The two implementations are about the same, and cudnn one seems slightly more efficient when input size is large. Below I listed the numbers. The four parts of the workload line represent: input shape before preprocessing, label shape, input length, label length.

workload: (32L, 256L, 28L), (32L, 1L), 256, 1
baidu
forward elapsed time: 614598 ns
cudnn
forward elapsed time: 545639 ns

workload: (32L, 512L, 28L), (32L, 1L), 512, 1
baidu
forward elapsed time: 1030004 ns
cudnn
forward elapsed time: 1063012 ns

workload: (32L, 1024L, 28L), (32L, 1L), 1024, 1
baidu
forward elapsed time: 2115799 ns
cudnn
forward elapsed time: 2207311 ns

workload: (32L, 2048L, 28L), (32L, 1L), 2048, 1
baidu
forward elapsed time: 4085191 ns
cudnn
forward elapsed time: 4085597 ns

workload: (32L, 256L, 28L), (32L, 2L), 256, 2
baidu
forward elapsed time: 680476 ns
cudnn
forward elapsed time: 685418 ns

workload: (32L, 512L, 28L), (32L, 2L), 512, 2
baidu
forward elapsed time: 1387201 ns
cudnn
forward elapsed time: 1338115 ns

workload: (32L, 1024L, 28L), (32L, 2L), 1024, 2
baidu
forward elapsed time: 11913695 ns
cudnn
forward elapsed time: 4217115 ns

workload: (32L, 2048L, 28L), (32L, 2L), 2048, 2
baidu
forward elapsed time: 7950573 ns
cudnn
forward elapsed time: 5184110 ns

workload: (32L, 256L, 28L), (32L, 4L), 256, 4
baidu
forward elapsed time: 950801 ns
cudnn
forward elapsed time: 952418 ns

workload: (32L, 512L, 28L), (32L, 4L), 512, 4
baidu
forward elapsed time: 1940807 ns
cudnn
forward elapsed time: 1873934 ns

workload: (32L, 1024L, 28L), (32L, 4L), 1024, 4
baidu
forward elapsed time: 3801619 ns
cudnn
forward elapsed time: 3755733 ns

workload: (32L, 2048L, 28L), (32L, 4L), 2048, 4
baidu
forward elapsed time: 7530379 ns
cudnn
forward elapsed time: 7496358 ns

workload: (32L, 256L, 28L), (32L, 8L), 256, 8
baidu
forward elapsed time: 1474249 ns
cudnn
forward elapsed time: 1472007 ns

workload: (32L, 512L, 28L), (32L, 8L), 512, 8
baidu
forward elapsed time: 2975453 ns
cudnn
forward elapsed time: 2951099 ns

workload: (32L, 1024L, 28L), (32L, 8L), 1024, 8
baidu
forward elapsed time: 7873482 ns
cudnn
forward elapsed time: 7956207 ns

workload: (32L, 2048L, 28L), (32L, 8L), 2048, 8
baidu
forward elapsed time: 14353415 ns
cudnn
forward elapsed time: 11856699 ns

workload: (32L, 256L, 28L), (32L, 16L), 256, 16
baidu
forward elapsed time: 2529732 ns
cudnn
forward elapsed time: 2527600 ns

workload: (32L, 512L, 28L), (32L, 16L), 512, 16
baidu
forward elapsed time: 5129444 ns
cudnn
forward elapsed time: 5202983 ns

workload: (32L, 1024L, 28L), (32L, 16L), 1024, 16
baidu
forward elapsed time: 20691226 ns
cudnn
forward elapsed time: 10203190 ns

workload: (32L, 2048L, 28L), (32L, 16L), 2048, 16
baidu
forward elapsed time: 20736001 ns
cudnn
forward elapsed time: 20764510 ns

workload: (32L, 256L, 28L), (32L, 32L), 256, 32
baidu
forward elapsed time: 4349291 ns
cudnn
forward elapsed time: 4372212 ns

workload: (32L, 512L, 28L), (32L, 32L), 512, 32
baidu
forward elapsed time: 9181241 ns
cudnn
forward elapsed time: 9177710 ns

workload: (32L, 1024L, 28L), (32L, 32L), 1024, 32
baidu
forward elapsed time: 26316193 ns
cudnn
forward elapsed time: 19705820 ns

workload: (32L, 2048L, 28L), (32L, 32L), 2048, 32
baidu
forward elapsed time: 37893973 ns
cudnn
forward elapsed time: 37927728 ns

workload: (32L, 256L, 28L), (32L, 64L), 256, 64
baidu
forward elapsed time: 7350089 ns
cudnn
forward elapsed time: 7341236 ns

workload: (32L, 512L, 28L), (32L, 64L), 512, 64
baidu
forward elapsed time: 10367995 ns
cudnn
forward elapsed time: 10394812 ns

workload: (32L, 1024L, 28L), (32L, 64L), 1024, 64
baidu
forward elapsed time: 26141244 ns
cudnn
forward elapsed time: 26080623 ns

workload: (32L, 2048L, 28L), (32L, 64L), 2048, 64
baidu
forward elapsed time: 46190315 ns
cudnn
forward elapsed time: 43888343 ns

workload: (32L, 256L, 28L), (32L, 128L), 256, 128
baidu
forward elapsed time: 7244567 ns
cudnn
forward elapsed time: 7197396 ns

workload: (32L, 512L, 28L), (32L, 128L), 512, 128
baidu
forward elapsed time: 17969224 ns
cudnn
forward elapsed time: 17975323 ns

workload: (32L, 1024L, 28L), (32L, 128L), 1024, 128
baidu
forward elapsed time: 37417761 ns
cudnn
forward elapsed time: 37404967 ns

workload: (32L, 2048L, 28L), (32L, 128L), 2048, 128
baidu
forward elapsed time: 134485276 ns
cudnn
forward elapsed time: 134316760 ns

workload: (32L, 256L, 28L), (32L, 256L), 256, 256
baidu
forward elapsed time: 7399992 ns
cudnn
forward elapsed time: 7459652 ns

workload: (32L, 512L, 28L), (32L, 256L), 512, 256
baidu
forward elapsed time: 20783582 ns
cudnn
forward elapsed time: 20854272 ns

workload: (32L, 1024L, 28L), (32L, 256L), 1024, 256
baidu
forward elapsed time: 67250512 ns
cudnn
forward elapsed time: 59459103 ns

workload: (32L, 2048L, 28L), (32L, 256L), 2048, 256
baidu
forward elapsed time: 147523944 ns
cudnn
forward elapsed time: 145160144 ns

* contrib ctc interface changes for compatibility * cudnn ctc * update per comments

szha force-pushed the gluon_ctc branch 18 times, most recently from 19298a9 to 245a789 Compare August 14, 2017 19:10

szha changed the title ~~contrib ctc interface changes for compatibility, and gluon CTC~~ contrib ctc interface changes, cudnn7 CTC, and gluon CTC Aug 14, 2017

szha force-pushed the gluon_ctc branch from b314c20 to 4aaedd7 Compare August 14, 2017 22:01

piiswrong reviewed Aug 15, 2017

View reviewed changes

szha force-pushed the gluon_ctc branch 4 times, most recently from 6820fd7 to 79a5b1f Compare August 16, 2017 19:53

piiswrong reviewed Aug 18, 2017

View reviewed changes

szha force-pushed the gluon_ctc branch 2 times, most recently from 3cfbfeb to 32884ea Compare August 18, 2017 08:46

szha force-pushed the gluon_ctc branch 5 times, most recently from 8630456 to c70505b Compare August 22, 2017 18:02

szha added 2 commits August 23, 2017 01:40

contrib ctc interface changes for compatibility

0c17a2c

cudnn ctc

f177516

szha force-pushed the gluon_ctc branch from c70505b to 50826c1 Compare August 23, 2017 01:52

update per comments

0750f3b

szha force-pushed the gluon_ctc branch from 50826c1 to 0750f3b Compare August 23, 2017 01:53

piiswrong merged commit f489810 into apache:master Aug 24, 2017

szha deleted the gluon_ctc branch August 27, 2017 22:31

mbaijal pushed a commit to mbaijal/incubator-mxnet that referenced this pull request Sep 6, 2017

contrib ctc interface changes, cudnn7 CTC, and gluon CTC (apache#7442)

6c2ba6b

* contrib ctc interface changes for compatibility * cudnn ctc * update per comments

sbodenstein mentioned this pull request Sep 7, 2017

CTC Loss Refactor #5559

Closed

cjolivier01 pushed a commit to cjolivier01/mxnet that referenced this pull request Sep 11, 2017

contrib ctc interface changes, cudnn7 CTC, and gluon CTC (apache#7442)

dab1b53

* contrib ctc interface changes for compatibility * cudnn ctc * update per comments

crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this pull request Oct 26, 2017

contrib ctc interface changes, cudnn7 CTC, and gluon CTC (apache#7442)

75a07d1

* contrib ctc interface changes for compatibility * cudnn ctc * update per comments

fanlu mentioned this pull request Nov 3, 2017

about contrib.sym.ctc_loss? ai-adv-lab/deepspeech.mxnet#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contrib ctc interface changes, cudnn7 CTC, and gluon CTC #7442

contrib ctc interface changes, cudnn7 CTC, and gluon CTC #7442

szha commented Aug 13, 2017

szha commented Aug 14, 2017

piiswrong Aug 15, 2017

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017

szha Aug 18, 2017

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017 •

edited

Loading

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017

piiswrong Aug 18, 2017

piiswrong commented Aug 18, 2017

szha commented Aug 18, 2017

contrib ctc interface changes, cudnn7 CTC, and gluon CTC #7442

contrib ctc interface changes, cudnn7 CTC, and gluon CTC #7442

Conversation

szha commented Aug 13, 2017

szha commented Aug 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong Aug 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Aug 18, 2017

szha commented Aug 18, 2017

piiswrong Aug 18, 2017 •

edited

Loading