solve problem in print "cudnn autotune" #7988

solin319 · 2017-09-22T02:12:09Z

Convolution algorithm was cached the results of Get as well as Find.
So we must use param.cudnn_tune to control weather to print "cudnn autotune" message.

@piiswrong
#7631

update

szha

@piiswrong this change looks good to me. Can we merge this?

piiswrong · 2017-10-11T02:47:41Z

Is param.cudnn_tune always non-empty when we run perf tests? I think if its empty it depends on the env var?

solin319 · 2017-10-11T03:31:00Z

When init CuDNNConvolutionOp, param.cudnn_tune will be set value (default 1).

if (!param_.cudnn_tune) {
      param_.cudnn_tune = dmlc::GetEnv("MXNET_CUDNN_AUTOTUNE_DEFAULT", 1);
    }

When we run perf tests param_.cudnn_tune always non-empty and it's value must larger than zero.

* Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme.

* CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (#8125) * v0.12 regression: Fix registration of children for Block (#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from #8152 * Add tests from #8152 * Revert "[CMAKE] Fix windows cmake build" (#8311) * Revert "Added my code signing key (#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (#8300) * Update rnn.md (#8320) * fluent methods for missed ops (#8329) * update ps lite (#8327) * Fix unused type warning (#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme.

…che#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme.

* CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (apache#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (apache#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (apache#8125) * v0.12 regression: Fix registration of children for Block (apache#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from apache#8152 * Add tests from apache#8152 * Revert "[CMAKE] Fix windows cmake build" (apache#8311) * Revert "Added my code signing key (apache#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (apache#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (apache#8300) * Update rnn.md (apache#8320) * fluent methods for missed ops (apache#8329) * update ps lite (apache#8327) * Fix unused type warning (apache#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme.

…che#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme.

* CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (apache#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (apache#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (apache#8125) * v0.12 regression: Fix registration of children for Block (apache#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from apache#8152 * Add tests from apache#8152 * Revert "[CMAKE] Fix windows cmake build" (apache#8311) * Revert "Added my code signing key (apache#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (apache#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (apache#8300) * Update rnn.md (apache#8320) * fluent methods for missed ops (apache#8329) * update ps lite (apache#8327) * Fix unused type warning (apache#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme.

…che#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme.

* CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (apache#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (apache#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (apache#8125) * v0.12 regression: Fix registration of children for Block (apache#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from apache#8152 * Add tests from apache#8152 * Revert "[CMAKE] Fix windows cmake build" (apache#8311) * Revert "Added my code signing key (apache#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (apache#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (apache#8300) * Update rnn.md (apache#8320) * fluent methods for missed ops (apache#8329) * update ps lite (apache#8327) * Fix unused type warning (apache#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme.

* Fill optimizations * Optimize IdentityCompute for CPU * lint * Fix unused type warning (#8316) * remove unused variable * CR comments * CR comments * Added _full operator * Trigger build * Trigger build * Add _full to symbolic * Merge conflict resolution fix * lint * Timing output for test_factorization_module when Verbose enabled (#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme. * Use omp_get_max_threads() when OMP_NUM_THREADS environment variable is set (#8379) * CPU optimization for ActivationOp (#8296) * CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (#8125) * v0.12 regression: Fix registration of children for Block (#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from #8152 * Add tests from #8152 * Revert "[CMAKE] Fix windows cmake build" (#8311) * Revert "Added my code signing key (#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (#8300) * Update rnn.md (#8320) * fluent methods for missed ops (#8329) * update ps lite (#8327) * Fix unused type warning (#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme. * Fix GPU copy * Remove duplicate * Trigger build

* Memory set/copy speed assertions * Memory set/copy speed assertions * .. * .. * .. * .. * bounce some cache * lint * Timing output for test_factorization_module when Verbose enabled (#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme. * Use omp_get_max_threads() when OMP_NUM_THREADS environment variable is set (#8379) * CPU optimization for ActivationOp (#8296) * CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (#8125) * v0.12 regression: Fix registration of children for Block (#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from #8152 * Add tests from #8152 * Revert "[CMAKE] Fix windows cmake build" (#8311) * Revert "Added my code signing key (#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (#8300) * Update rnn.md (#8320) * fluent methods for missed ops (#8329) * update ps lite (#8327) * Fix unused type warning (#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (#8369) * Allow test to converge (#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (#7988) * [Perl] emulate Python zip() for Perl (#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (#8376) Fix a typo in the example readme. * do gtest test * add assert and do higher runs as performance test only (when performance test flag set) * Trigger build * lint * Trigger build * Sparse operator performance improvement (#8412) * sparse rsprsp perf improvements * Clean up * dtype default to source_array.dtype for sparse ndarrays (#8403) * derive default dtype/ctx from input for sparse ndarrays * add gpu tests * fix lint. add doc * remove default_ctx code * bug fix when passing dtype to array() * update doc * remove extra line * also check ctx * fix using default mean pixels (#8352) * fix gluon.data.RecordFileDataset (#8353) * upgrade MKL (#8378) * Lint fix (#8402) * Trigger build

* Fill optimizations * Optimize IdentityCompute for CPU * lint * Fix unused type warning (apache#8316) * remove unused variable * CR comments * CR comments * Added _full operator * Trigger build * Trigger build * Add _full to symbolic * Merge conflict resolution fix * lint * Timing output for test_factorization_module when Verbose enabled (apache#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme. * Use omp_get_max_threads() when OMP_NUM_THREADS environment variable is set (apache#8379) * CPU optimization for ActivationOp (apache#8296) * CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (apache#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (apache#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (apache#8125) * v0.12 regression: Fix registration of children for Block (apache#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from apache#8152 * Add tests from apache#8152 * Revert "[CMAKE] Fix windows cmake build" (apache#8311) * Revert "Added my code signing key (apache#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (apache#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (apache#8300) * Update rnn.md (apache#8320) * fluent methods for missed ops (apache#8329) * update ps lite (apache#8327) * Fix unused type warning (apache#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme. * Fix GPU copy * Remove duplicate * Trigger build

* Memory set/copy speed assertions * Memory set/copy speed assertions * .. * .. * .. * .. * bounce some cache * lint * Timing output for test_factorization_module when Verbose enabled (apache#8363) * Timing output for test_factorization_module when Verbose enabled * Trigger build * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme. * Use omp_get_max_threads() when OMP_NUM_THREADS environment variable is set (apache#8379) * CPU optimization for ActivationOp (apache#8296) * CPU optimization for ActivationOp Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes * lint * Trigger build * Trigger build * Negative begin and end support for csr slice (apache#8241) * negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update * Preparing for 0.12.0.rc0: Final changes before RC (apache#8301) * Final changes before RC * Updates to NEWS.md * Updates * Enable smoothing in softmax operator (apache#8125) * v0.12 regression: Fix registration of children for Block (apache#8277) * Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from apache#8152 * Add tests from apache#8152 * Revert "[CMAKE] Fix windows cmake build" (apache#8311) * Revert "Added my code signing key (apache#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (apache#8227)" This reverts commit 1c1c788. * fixed broken links. https was pointing to http for mxnet.io (apache#8300) * Update rnn.md (apache#8320) * fluent methods for missed ops (apache#8329) * update ps lite (apache#8327) * Fix unused type warning (apache#8316) * Trigger build * Trigger build * Misc fixes for sparse distributed training (apache#8345) * remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example * Fix the Readme (apache#8369) * Allow test to converge (apache#8351) * Allow test to converge * Trigger build * Trigger build * Trigger build * Update cudnn_algoreg-inl.h (apache#7988) * [Perl] emulate Python zip() for Perl (apache#8192) * [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form * add profile option for frontend profiling to image script (apache#8171) * add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py * Fix Typo (classification) (apache#8376) Fix a typo in the example readme. * do gtest test * add assert and do higher runs as performance test only (when performance test flag set) * Trigger build * lint * Trigger build * Sparse operator performance improvement (apache#8412) * sparse rsprsp perf improvements * Clean up * dtype default to source_array.dtype for sparse ndarrays (apache#8403) * derive default dtype/ctx from input for sparse ndarrays * add gpu tests * fix lint. add doc * remove default_ctx code * bug fix when passing dtype to array() * update doc * remove extra line * also check ctx * fix using default mean pixels (apache#8352) * fix gluon.data.RecordFileDataset (apache#8353) * upgrade MKL (apache#8378) * Lint fix (apache#8402) * Trigger build

solin319 added 2 commits September 22, 2017 09:08

Merge pull request #5 from apache/master

c9385c5

update

Update cudnn_algoreg-inl.h

db1f592

solin319 requested review from mli and piiswrong as code owners September 22, 2017 02:12

szha approved these changes Sep 27, 2017

View reviewed changes

szha mentioned this pull request Oct 5, 2017

How to disable MXNET_CUDNN_AUTOTUNE_DEFAULT and bucketing log message without turning off MXNET_CUDNN_AUTOTUNE_DEFAULT? #8132

Closed

jiayiliu mentioned this pull request Oct 13, 2017

windows MXNET_CUDNN_AUTOTUNE_DEFAULT = 0 no work. #7631

Open

piiswrong merged commit ad20d91 into apache:master Oct 21, 2017

cjolivier01 pushed a commit to cjolivier01/mxnet that referenced this pull request Oct 22, 2017

Update cudnn_algoreg-inl.h (apache#7988)

2bb9e94

cjolivier01 pushed a commit to cjolivier01/mxnet that referenced this pull request Oct 22, 2017

Update cudnn_algoreg-inl.h (apache#7988)

84ef5cf

solin319 deleted the solin319-patch-MXNET_CUDNN_AUTOTUNE_DEFAULT branch October 23, 2017 00:52

crazy-cat pushed a commit to crazy-cat/incubator-mxnet that referenced this pull request Oct 26, 2017

Update cudnn_algoreg-inl.h (apache#7988)

a3b4c5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solve problem in print "cudnn autotune" #7988

solve problem in print "cudnn autotune" #7988

solin319 commented Sep 22, 2017 •

edited

Loading

szha left a comment

piiswrong commented Oct 11, 2017

solin319 commented Oct 11, 2017

solve problem in print "cudnn autotune" #7988

solve problem in print "cudnn autotune" #7988

Conversation

solin319 commented Sep 22, 2017 • edited Loading

szha left a comment

Choose a reason for hiding this comment

piiswrong commented Oct 11, 2017

solin319 commented Oct 11, 2017

solin319 commented Sep 22, 2017 •

edited

Loading