Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix OpPerf in Master #17735

Merged
merged 3 commits into from
Mar 9, 2020
Merged

Fix OpPerf in Master #17735

merged 3 commits into from
Mar 9, 2020

Conversation

ChaiBapchya
Copy link
Contributor

@ChaiBapchya ChaiBapchya commented Mar 2, 2020

Description

Change 1 (Fix for rmsprop_update and rmsprop_alex_update)

After merging PR #17449 and #17400
refactor of optimizer was incomplete due to both PRs not knowing changes made by each other.

While #17449 added set of variables for large tensor, #17400 refactored 2 variables from gamma1 gamma2 to rho and momentum

Fixing that conflict here

Change 2 (Fix for BatchNorm)

Upon running entire opperf suite for CUDA=ON, CUDNN=ON, it was found BatchNorm fails here

<function BatchNorm at 0x7f46869e3bf8>
Traceback (most recent call last):
  File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module>
    sys.exit(main())
  File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main
    benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs)
  File "incubator-mxnet/benchmark/opperf/opperf.py", line 99, in run_all_mxnet_operator_benchmarks
    mxnet_operator_benchmark_results.append(run_nn_basic_operators_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs))
  File "/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_basic_operators.py", line 143, in run_nn_basic_operators_benchmarks
    mx_nn_basic_op_results = run_op_benchmarks(mx_nn_basic_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
  File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 211, in run_op_benchmarks
    warmup=warmup, runs=runs)
  File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 178, in run_performance_test
    benchmark_result = _run_nd_operator_performance_test(op, inputs, run_backward, warmup, runs, kwargs_list, profiler)
  File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 115, in _run_nd_operator_performance_test
    _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
  File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 200, in cpp_profile_it
    res = func(*args, **kwargs)
  File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 60, in nd_forward_backward_and_profile
    nd.waitall()
  File "/home/ubuntu/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 206, in waitall
    check_call(_LIB.MXNDArrayWaitAll())
  File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "/home/ubuntu/incubator-mxnet/src/operator/nn/./cudnn/cudnn_batch_norm-inl.h", line 62
MXNetError: Check failed: param.eps >= 1e-5 (1e-08 vs. 1e-05) : CuDNN requires eps to be no less than 1e-05

Change 3 Fix for lamb_update_*

wd parameter was incorrectly omitted from default_params
Added it back

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Code is well-documented:
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@ChaiBapchya
Copy link
Contributor Author

Ran the entire opperf suite after making the changes

Command

python incubator-mxnet/benchmark/opperf/opperf.py --ctx gpu --output-format md --output-file mxnet_opperf_gpu_fix_opperf_lamb.md --warmup 1 --runs 1
python incubator-mxnet/benchmark/opperf/opperf.py --ctx cpu --output-format md --output-file mxnet_opperf_cpu_fix_opperf_lamb.md --warmup 1 --runs 1

CPU & GPU

https://gist.github.com/ChaiBapchya/517445961bcd57da6b7b5d23fa5d3dc0

@leezu
Copy link
Contributor

leezu commented Mar 2, 2020

Restarted windows CI

Copy link
Contributor

@connorgoggins connorgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your help rectifying these issues.

@ChaiBapchya
Copy link
Contributor Author

@mxnet-label-bot update [pr-awaiting-merge]

@lanking520 lanking520 added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Mar 9, 2020
@leezu leezu merged commit 0aa2c78 into apache:master Mar 9, 2020
MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this pull request Apr 10, 2020
* cudnn expects eps to be >e105 and gamma1,gamma2 renamed in other PR

* fix lamb_update_* ops

* add var name for wd
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
* cudnn expects eps to be >e105 and gamma1,gamma2 renamed in other PR

* fix lamb_update_* ops

* add var name for wd
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants