Fix performance regression in normalize operator #14055

sandeep-krishnamurthy · 2019-02-02T02:12:21Z

Description

In PR Image normalize operator - GPU support, 3D/4D inputs #13802 we added additional support for normalize operator and also re-organized with kernel launch/map functionality.
However, PR Image normalize operator - GPU support, 3D/4D inputs #13802 introduced performance regression.
In this PR, I fix the performance issue and bring it close to the original performance.
Earlier, I was parallelizing kernel launch based on length (h*w) and hence, there was a significant overhead compared to simple operation being performance in the kernel (x-mean/std)
Main changes in this PR includes - Parallelizing kernel launch based on number of channels rather than length, using omp parallel in the kernel.

Before regressing PR #13802

Total time for 50000 images of shape (3,300,300) to do normalization - 67.39s
Total time for 100000 images of shape (3,300,300) to do normalization - 134.72s

After regressing PR #13802

Total time for 50000 images of shape (3,300,300) to do normalization - 104.09s
Total time for 100000 images of shape (3,300,300) to do normalization - 203.78s

With changes in this PR

Total time for 50000 images of shape (3,300,300) to do normalization - 68.54s
Total time for 100000 images of shape (3,300,300) to do normalization - 136.12s

NOTE I have a revert PR #14054 , just in case, this PR gets delayed to be merged.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Code is well-documented:
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Parallelize kernel launch in normalize operator based on channel.

@nswamy @zhreshold @stu1130

vandanavk · 2019-02-04T23:00:31Z

@mxnet-label-bot add [pr-awaiting-review, Operator]

stu1130

LGTM

…ernel

* parallelize on channel forward pass * parallelize on channel normalize backward pass * Fix lint issues * Trying to fix CI build failure on GPU * Fix failing GPU test on CI Do not pass normalize param as is to GPU kernel * Fix to_tensor tests * Pass mean and std_dev as native types for kernel * Fix CI failure. Do not pass mean, std as vector to kernel

sandeep-krishnamurthy mentioned this pull request Feb 2, 2019

Revert "Image normalize operator - GPU support, 3D/4D inputs (#13802)" #14054

Closed

sandeep-krishnamurthy force-pushed the fix_normalize_perf branch from 46d1fa7 to 822d806 Compare February 4, 2019 18:09

zhreshold approved these changes Feb 4, 2019

View reviewed changes

sandeep-krishnamurthy force-pushed the fix_normalize_perf branch from 822d806 to b590d41 Compare February 4, 2019 22:26

vandanavk approved these changes Feb 4, 2019

View reviewed changes

marcoabreu added Operator pr-awaiting-review PR is waiting for code review labels Feb 4, 2019

stu1130 approved these changes Feb 4, 2019

View reviewed changes

sandeep-krishnamurthy added 3 commits February 4, 2019 15:12

parallelize on channel forward pass

3a04285

parallelize on channel normalize backward pass

9fe52f0

Fix lint issues

3b6164f

sandeep-krishnamurthy force-pushed the fix_normalize_perf branch from b590d41 to 3b6164f Compare February 4, 2019 23:13

sandeep-krishnamurthy added 5 commits February 4, 2019 17:13

Trying to fix CI build failure on GPU

67cde94

Fix failing GPU test on CI Do not pass normalize param as is to GPU k…

cdc08de

…ernel

Fix to_tensor tests

21df212

Pass mean and std_dev as native types for kernel

ba4c161

Fix CI failure. Do not pass mean, std as vector to kernel

c6c6829

sandeep-krishnamurthy merged commit df4a4fd into apache:master Feb 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance regression in normalize operator #14055

Fix performance regression in normalize operator #14055

sandeep-krishnamurthy commented Feb 2, 2019

vandanavk commented Feb 4, 2019

stu1130 left a comment

Fix performance regression in normalize operator #14055

Fix performance regression in normalize operator #14055

Conversation

sandeep-krishnamurthy commented Feb 2, 2019

Description

Before regressing PR #13802

After regressing PR #13802

With changes in this PR

Checklist

Essentials

Changes

vandanavk commented Feb 4, 2019

stu1130 left a comment

Choose a reason for hiding this comment