apache · apeforest · Mar 10, 2020 · Jan 30, 2020 · Jan 30, 2020 · Jan 30, 2020
diff --git a/benchmark/opperf/README.md b/benchmark/opperf/README.md
@@ -50,7 +50,8 @@ Hence, in this utility, we will build the functionality to allow users and devel
 Provided you have MXNet installed (any version >= 1.5.1), all you need to use opperf utility is to add path to your cloned MXNet repository to the PYTHONPATH.
 
 Note: 
-To install MXNet, refer [Installing MXNet page](https://mxnet.apache.org/versions/master/install/index.html)
+1. Currently, opperf utility requires a cloned mxnet repo. It isn't supported on PyPi binary yet. [Work in Progress]
+2. To install MXNet, refer [Installing MXNet page](https://mxnet.apache.org/versions/master/install/index.html)
 
 ```
 export PYTHONPATH=$PYTHONPATH:/path/to/incubator-mxnet/
@@ -72,6 +73,9 @@ python incubator-mxnet/benchmark/opperf/opperf.py --output-format json --output-
 
 3. **dtype** : By default, `float32`. You can override and set the global dtype for all operator benchmarks. Example: --dtype float64.
 
+4. **profiler** : `native` or `python`. By default, 'native'. You can override and set the global profiler for all operator benchmarks. Example: --profiler 'python'.
+Native profiler uses MXNet C++ based built-in profiler. Python profiler uses Python package time. Generally, native profiler is used by developers and python profiler is used by users.
+
 ## Usecase 2 - Run benchmarks for all the operators in a specific category
 
 For example, you want to run benchmarks for all NDArray Broadcast Binary Operators, Ex: broadcast_add, broadcast_mod, broadcast_pow etc., You just run the following python script.
@@ -117,6 +121,7 @@ add_res = run_performance_test(nd.add, run_backward=True, dtype='float32', ctx=m
                                inputs=[{"lhs": (1024, 1024),
                                         "rhs": (1024, 1024)}],
                                warmup=10, runs=25)
+print(add_res)
 ```
 
 Output for the above benchmark run, on a CPU machine, would look something like below:
@@ -143,6 +148,7 @@ add_res = run_performance_test([nd.add, nd.subtract], run_backward=True, dtype='
                                inputs=[{"lhs": (1024, 1024),
                                         "rhs": (1024, 1024)}],
                                warmup=10, runs=25)
+print(add_res)
 ```
 
 Output for the above benchmark run, on a CPU machine, would look something like below:

diff --git a/benchmark/opperf/nd_operations/README.md b/benchmark/opperf/nd_operations/README.md
@@ -19,103 +19,10 @@
 
 **NOTE:** This list is AUTOGENERATED when you run opperf.py utility
 
-0. LogisticRegressionOutput
-1. broadcast_axes
-2. ravel_multi_index
-3. multi_sgd_mom_update
-4. smooth_l1
-5. scatter_nd
-6. reshape
-7. one_hot
-8. linalg_potri
-10. multi_sgd_update
-12. Convolution_v1
-13. repeat
-14. Custom
-15. softmax_cross_entropy
-16. SwapAxis
-17. norm
-18. Softmax
-20. fill_element_0index
-21. cast
-22. UpSampling
-23. BatchNorm_v1
-24. CTCLoss
-25. LRN
-26. cast_storage
-27. pick
-28. GridGenerator
-29. sample_multinomial
-30. Activation
-31. LinearRegressionOutput
-32. Pooling_v1
-34. Crop
-35. ElementWiseSum
-36. diag
-37. Reshape
-38. Pad
-39. linalg_gemm2
-40. crop
-43. RNN
-45. SoftmaxOutput
-46. linalg_extractdiag
-48. SequenceLast
-51. SequenceReverse
-53. SVMOutput
-54. linalg_trsm
-55. where
-56. SoftmaxActivation
-58. slice
-59. linalg_gelqf
-60. softmin
-61. linalg_gemm
-62. BilinearSampler
-64. choose_element_0index
-65. tile
-67. gather_nd
-69. SequenceMask
-70. reshape_like
-71. slice_axis
-72. stack
-74. khatri_rao
-75. multi_mp_sgd_update
-76. linalg_sumlogdiag
-77. broadcast_to
-78. IdentityAttachKLSparseReg
-80. SpatialTransformer
-81. Concat
-82. uniform
-83. InstanceNorm
-84. expand_dims
-85. multi_mp_sgd_mom_update
-86. reverse
-87. add_n
-88. clip
-89. ctc_loss
-90. shape_array
-91. unravel_index
-92. linalg_potrf
-93. Cast
-94. broadcast_like
-95. Embedding
-96. linalg_makediag
-98. linalg_syrk
-99. squeeze
-101. ROIPooling
-103. SliceChannel
-104. slice_like
-106. linalg_maketrian
-108. pad
-109. LayerNorm
-110. split
-111. MAERegressionOutput
-112. Correlation
-114. batch_take
-115. L2Normalization
-116. broadcast_axis
-117. linalg_trmm
-118. linalg_extracttrian
-119. normal
-120. take
-121. MakeLoss
-124. concat
+0. preloaded_multi_sgd_update
+1. multi_mp_sgd_mom_update
+2. IdentityAttachKLSparseReg
+3. unravel_index
+4. mp_lamb_update_phase1
+5. mp_lamb_update_phase2
+6. scatter_nd