Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

CPU optimization for ActivationOp #8296

Merged
merged 26 commits into from
Oct 23, 2017

Commits on Oct 16, 2017

  1. CPU optimization for ActivationOp

    Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass).
    Very slight improvement on GPU.
    
    OLD MSHADOW APPROACH
    --------------------
    
    CPU
    ===
    
    Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
    Activation Operator CPU:  Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
    Activation Operator CPU:  Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
    Activation Operator CPU:  Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
    Activation Operator CPU:  Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
    Activation Operator CPU:  Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes
    
    GPU
    ===
    
    Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
    Activation Operator GPU:  Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
    Activation Operator GPU:  Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
    Activation Operator GPU:  Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
    Activation Operator GPU:  Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
    Activation Operator GPU:  Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes
    
    NEW MXNET_OP APPROACH
    ---------------------
    
    CPU
    ===
    
    Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
    Activation Operator CPU:  Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
    Activation Operator CPU:  Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
    Activation Operator CPU:  Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
    Activation Operator CPU:  Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
    Activation Operator CPU:  Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes
    Activation Operator CPU:  Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes
    
    GPU
    ===
    
    Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
    Activation Operator GPU:  Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
    Activation Operator GPU:  Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
    Activation Operator GPU:  Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
    Activation Operator GPU:  Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes
    
    Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
    Activation Operator GPU:  Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes
    Activation Operator GPU:  Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes
    Olivier committed Oct 16, 2017
    Configuration menu
    Copy the full SHA
    3bf48b8 View commit details
    Browse the repository at this point in the history
  2. lint

    Olivier committed Oct 16, 2017
    Configuration menu
    Copy the full SHA
    40b13f1 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2017

  1. Trigger build

    cjolivier01 committed Oct 17, 2017
    Configuration menu
    Copy the full SHA
    6d4a2bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    df49eae View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5c92218 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2017

  1. Trigger build

    cjolivier01 committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    db2767d View commit details
    Browse the repository at this point in the history
  2. Negative begin and end support for csr slice (apache#8241)

    * negative index support for sparse slice
    
    * fix lint
    
    * getitem(int) for csr ndarray, support a[-1]
    
    * remove unneccessary argument
    
    * unittest and doc update
    ZiyueHuang authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    bf58bee View commit details
    Browse the repository at this point in the history
  3. Preparing for 0.12.0.rc0: Final changes before RC (apache#8301)

    * Final changes before RC
    
    * Updates to NEWS.md
    
    * Updates
    mbaijal authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    4ecb763 View commit details
    Browse the repository at this point in the history
  4. Enable smoothing in softmax operator (apache#8125)

    KellenSunderland authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    618c2cc View commit details
    Browse the repository at this point in the history
  5. v0.12 regression: Fix registration of children for Block (apache#8277)

    * Fix Block not registering children
    
    If the attribute was already set to something different than Block (e.g. None),
    it was not being registered.
    
    * fix if / elif for block children registration
    
    * trigger test
    
    * Add fix from apache#8152
    
    * Add tests from apache#8152
    leezu authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    cc93069 View commit details
    Browse the repository at this point in the history
  6. Revert "[CMAKE] Fix windows cmake build" (apache#8311)

    * Revert "Added my code signing key (apache#8293)"
    
    This reverts commit 22ab185.
    
    * Revert "[CMAKE] Fix windows cmake build (apache#8227)"
    
    This reverts commit 1c1c788.
    cjolivier01 authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    8730f7a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    252227e View commit details
    Browse the repository at this point in the history
  8. Update rnn.md (apache#8320)

    szha authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    310bbeb View commit details
    Browse the repository at this point in the history
  9. fluent methods for missed ops (apache#8329)

    szha authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    83e96a9 View commit details
    Browse the repository at this point in the history
  10. update ps lite (apache#8327)

    piiswrong authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    dc4c3c8 View commit details
    Browse the repository at this point in the history
  11. Fix unused type warning (apache#8316)

    cjolivier01 authored and Olivier committed Oct 18, 2017
    Configuration menu
    Copy the full SHA
    28b76e3 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2017

  1. Trigger build

    Olivier committed Oct 20, 2017
    Configuration menu
    Copy the full SHA
    55068f7 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2017

  1. Trigger build

    cjolivier01 committed Oct 21, 2017
    Configuration menu
    Copy the full SHA
    4065639 View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2017

  1. Misc fixes for sparse distributed training (apache#8345)

    * remove mshadow::range in init_op.h
    
    * add unit test
    
    * remove pass by ptr, add unit test for pull empty wieghts
    
    * fix range in key partition
    
    * remove wrong comment
    
    * remove change for partition
    
    * remove unused var
    
    * add int64 to arange. add checkpointing example
    eric-haibin-lin authored and cjolivier01 committed Oct 22, 2017
    Configuration menu
    Copy the full SHA
    2cf83cb View commit details
    Browse the repository at this point in the history
  2. Fix the Readme (apache#8369)

    mbaijal authored and cjolivier01 committed Oct 22, 2017
    Configuration menu
    Copy the full SHA
    f4c57aa View commit details
    Browse the repository at this point in the history
  3. Allow test to converge (apache#8351)

    * Allow test to converge
    
    * Trigger build
    
    * Trigger build
    
    * Trigger build
    cjolivier01 committed Oct 22, 2017
    Configuration menu
    Copy the full SHA
    68ea95f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2bb9e94 View commit details
    Browse the repository at this point in the history
  5. [Perl] emulate Python zip() for Perl (apache#8192)

    * [Perl] emulate Python zip() for Perl
    
    * [Perl] retool zip() uses away from the callback form
    tlby authored and cjolivier01 committed Oct 22, 2017
    Configuration menu
    Copy the full SHA
    52adc56 View commit details
    Browse the repository at this point in the history
  6. add profile option for frontend profiling to image script (apache#8171)

    * add profile option for frontend profiling to image script
    
    * Update image_classification.py
    
    * Update image_classification.py
    szha authored and cjolivier01 committed Oct 22, 2017
    Configuration menu
    Copy the full SHA
    fa80a31 View commit details
    Browse the repository at this point in the history
  7. Fix Typo (classification) (apache#8376)

    Fix a typo in the example readme.
    0x6a62 authored and cjolivier01 committed Oct 22, 2017
    Configuration menu
    Copy the full SHA
    9795461 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d60707c View commit details
    Browse the repository at this point in the history