Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix performance regression in normalize operator #14055

Merged

Conversation

sandeep-krishnamurthy
Copy link
Contributor

Description

  1. In PR Image normalize operator - GPU support, 3D/4D inputs #13802 we added additional support for normalize operator and also re-organized with kernel launch/map functionality.
  2. However, PR Image normalize operator - GPU support, 3D/4D inputs #13802 introduced performance regression.
  3. In this PR, I fix the performance issue and bring it close to the original performance.
  4. Earlier, I was parallelizing kernel launch based on length (h*w) and hence, there was a significant overhead compared to simple operation being performance in the kernel (x-mean/std)
  5. Main changes in this PR includes - Parallelizing kernel launch based on number of channels rather than length, using omp parallel in the kernel.

Before regressing PR #13802

Total time for 50000 images of shape (3,300,300) to do normalization - 67.39s
Total time for 100000 images of shape (3,300,300) to do normalization - 134.72s

After regressing PR #13802

Total time for 50000 images of shape (3,300,300) to do normalization - 104.09s
Total time for 100000 images of shape (3,300,300) to do normalization - 203.78s

With changes in this PR

Total time for 50000 images of shape (3,300,300) to do normalization - 68.54s
Total time for 100000 images of shape (3,300,300) to do normalization - 136.12s

NOTE I have a revert PR #14054 , just in case, this PR gets delayed to be merged.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Code is well-documented:
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Parallelize kernel launch in normalize operator based on channel.

@nswamy @zhreshold @stu1130

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [pr-awaiting-review, Operator]

@marcoabreu marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Feb 4, 2019
Copy link
Contributor

@stu1130 stu1130 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit df4a4fd into apache:master Feb 6, 2019
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
* parallelize on channel forward pass

* parallelize on channel normalize backward pass

* Fix lint issues

* Trying to fix CI build failure on GPU

* Fix failing GPU test on CI Do not pass normalize param as is to GPU kernel

* Fix to_tensor tests

* Pass mean and std_dev as native types for kernel

* Fix CI failure. Do not pass mean, std as vector to kernel
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
* parallelize on channel forward pass

* parallelize on channel normalize backward pass

* Fix lint issues

* Trying to fix CI build failure on GPU

* Fix failing GPU test on CI Do not pass normalize param as is to GPU kernel

* Fix to_tensor tests

* Pass mean and std_dev as native types for kernel

* Fix CI failure. Do not pass mean, std as vector to kernel
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* parallelize on channel forward pass

* parallelize on channel normalize backward pass

* Fix lint issues

* Trying to fix CI build failure on GPU

* Fix failing GPU test on CI Do not pass normalize param as is to GPU kernel

* Fix to_tensor tests

* Pass mean and std_dev as native types for kernel

* Fix CI failure. Do not pass mean, std as vector to kernel
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants