-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Commits on Oct 16, 2017
-
CPU optimization for ActivationOp
Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass). Very slight improvement on GPU. OLD MSHADOW APPROACH -------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes Activation Operator CPU: Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes Activation Operator CPU: Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes Activation Operator CPU: Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes Activation Operator CPU: Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes NEW MXNET_OP APPROACH --------------------- CPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator CPU: Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes Activation Operator CPU: Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator CPU: Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes Activation Operator CPU: Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator CPU: Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes Activation Operator CPU: Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator CPU: Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes Activation Operator CPU: Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator CPU: Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes Activation Operator CPU: Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes GPU === Timing: 50 iterations of 10 calls, shape = [1,1,28,28] Activation Operator GPU: Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [1,3,28,28] Activation Operator GPU: Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,1,18,32] Activation Operator GPU: Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [50,3,18,32] Activation Operator GPU: Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes Activation Operator GPU: Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes Timing: 50 iterations of 10 calls, shape = [20,3,128,128] Activation Operator GPU: Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes Activation Operator GPU: Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes
Olivier committedOct 16, 2017 Configuration menu - View commit details
-
Copy full SHA for 3bf48b8 - Browse repository at this point
Copy the full SHA 3bf48b8View commit details -
Olivier committed
Oct 16, 2017 Configuration menu - View commit details
-
Copy full SHA for 40b13f1 - Browse repository at this point
Copy the full SHA 40b13f1View commit details
Commits on Oct 17, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 6d4a2bb - Browse repository at this point
Copy the full SHA 6d4a2bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for df49eae - Browse repository at this point
Copy the full SHA df49eaeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c92218 - Browse repository at this point
Copy the full SHA 5c92218View commit details
Commits on Oct 18, 2017
-
Configuration menu - View commit details
-
Copy full SHA for db2767d - Browse repository at this point
Copy the full SHA db2767dView commit details -
Negative begin and end support for csr slice (apache#8241)
* negative index support for sparse slice * fix lint * getitem(int) for csr ndarray, support a[-1] * remove unneccessary argument * unittest and doc update
Configuration menu - View commit details
-
Copy full SHA for bf58bee - Browse repository at this point
Copy the full SHA bf58beeView commit details -
Preparing for 0.12.0.rc0: Final changes before RC (apache#8301)
* Final changes before RC * Updates to NEWS.md * Updates
Configuration menu - View commit details
-
Copy full SHA for 4ecb763 - Browse repository at this point
Copy the full SHA 4ecb763View commit details -
Configuration menu - View commit details
-
Copy full SHA for 618c2cc - Browse repository at this point
Copy the full SHA 618c2ccView commit details -
v0.12 regression: Fix registration of children for Block (apache#8277)
* Fix Block not registering children If the attribute was already set to something different than Block (e.g. None), it was not being registered. * fix if / elif for block children registration * trigger test * Add fix from apache#8152 * Add tests from apache#8152
Configuration menu - View commit details
-
Copy full SHA for cc93069 - Browse repository at this point
Copy the full SHA cc93069View commit details -
Revert "[CMAKE] Fix windows cmake build" (apache#8311)
* Revert "Added my code signing key (apache#8293)" This reverts commit 22ab185. * Revert "[CMAKE] Fix windows cmake build (apache#8227)" This reverts commit 1c1c788.
Configuration menu - View commit details
-
Copy full SHA for 8730f7a - Browse repository at this point
Copy the full SHA 8730f7aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 252227e - Browse repository at this point
Copy the full SHA 252227eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 310bbeb - Browse repository at this point
Copy the full SHA 310bbebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 83e96a9 - Browse repository at this point
Copy the full SHA 83e96a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for dc4c3c8 - Browse repository at this point
Copy the full SHA dc4c3c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 28b76e3 - Browse repository at this point
Copy the full SHA 28b76e3View commit details
Commits on Oct 20, 2017
-
Olivier committed
Oct 20, 2017 Configuration menu - View commit details
-
Copy full SHA for 55068f7 - Browse repository at this point
Copy the full SHA 55068f7View commit details
Commits on Oct 21, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 4065639 - Browse repository at this point
Copy the full SHA 4065639View commit details
Commits on Oct 22, 2017
-
Misc fixes for sparse distributed training (apache#8345)
* remove mshadow::range in init_op.h * add unit test * remove pass by ptr, add unit test for pull empty wieghts * fix range in key partition * remove wrong comment * remove change for partition * remove unused var * add int64 to arange. add checkpointing example
Configuration menu - View commit details
-
Copy full SHA for 2cf83cb - Browse repository at this point
Copy the full SHA 2cf83cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for f4c57aa - Browse repository at this point
Copy the full SHA f4c57aaView commit details -
Allow test to converge (apache#8351)
* Allow test to converge * Trigger build * Trigger build * Trigger build
Configuration menu - View commit details
-
Copy full SHA for 68ea95f - Browse repository at this point
Copy the full SHA 68ea95fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2bb9e94 - Browse repository at this point
Copy the full SHA 2bb9e94View commit details -
[Perl] emulate Python zip() for Perl (apache#8192)
* [Perl] emulate Python zip() for Perl * [Perl] retool zip() uses away from the callback form
Configuration menu - View commit details
-
Copy full SHA for 52adc56 - Browse repository at this point
Copy the full SHA 52adc56View commit details -
add profile option for frontend profiling to image script (apache#8171)
* add profile option for frontend profiling to image script * Update image_classification.py * Update image_classification.py
Configuration menu - View commit details
-
Copy full SHA for fa80a31 - Browse repository at this point
Copy the full SHA fa80a31View commit details -
Fix Typo (classification) (apache#8376)
Fix a typo in the example readme.
Configuration menu - View commit details
-
Copy full SHA for 9795461 - Browse repository at this point
Copy the full SHA 9795461View commit details -
Configuration menu - View commit details
-
Copy full SHA for d60707c - Browse repository at this point
Copy the full SHA d60707cView commit details