Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

More user-friendly profiler result #14100

Closed
eric-haibin-lin opened this issue Feb 8, 2019 · 3 comments
Closed

More user-friendly profiler result #14100

eric-haibin-lin opened this issue Feb 8, 2019 · 3 comments
Labels
Feature request Profiler MXNet profiling issues

Comments

@eric-haibin-lin
Copy link
Member

This is an example output of mx.profiler.dumps():

operator
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
CopyCPU2CPU                            18         107.0130           3.9670           8.1190           5.9452
argmax                                  6           0.2470           0.0370           0.0460           0.0412
min                                     6           0.2580           0.0400           0.0460           0.0430
Concat                                  6           0.2770           0.0390           0.0590           0.0462
_ones                                   6           0.2050           0.0290           0.0440           0.0342
sqrt                                    6           0.9000           0.0380           0.3700           0.1500
_backward_Activation                    6           0.2470           0.0390           0.0440           0.0412
SyncCopyCPU2GPU                         6           0.1800           0.0270           0.0340           0.0300
_copyto_GPU2GPU                      3636         142.9960           0.0220           0.3250           0.0393
_backward_log_softmax                   6           0.2710           0.0290           0.0700           0.0452
slice                                   6           0.2380           0.0370           0.0410           0.0397
_backward_pick                          6           0.4430           0.0430           0.1190           0.0738
SyncCopyGPU2CPU                        18           0.8020           0.0290           0.0860           0.0446
_plus_scalar                           72          21.2870           0.0790           1.7160           0.2957
_div_scalar                            72          36.0430           0.0780           1.6120           0.5006
expand_dims                            84          14.1550           0.0280           1.6070           0.1685
softmax                                72          10.4490           0.0980           0.1770           0.1451
Embedding                              18           1.6480           0.0250           0.4190           0.0916
SetValueOp                              6           0.4930           0.0470           0.1340           0.0822
_mul_scalar                           150          42.0470           0.0220           1.2810           0.2803
_contrib_div_sqrt_dim                 144           6.8160           0.0320           0.0890           0.0473
FullyConnected                        444         262.0170           0.1020           2.2970           0.5901
_backward_SequenceMask                  6           0.9190           0.1350           0.1680           0.1532
broadcast_lesser                        6           0.2990           0.0400           0.0650           0.0498
_backward_Embedding                    18          10.6270           0.0560           1.0350           0.5904
ones_like                              72          27.5950           0.0270           1.7510           0.3833
where                                  72           5.6690           0.0340           1.0110           0.0787
DeleteVariable                      10326          96.4270           0.0000           2.4820           0.0093
dot                                  1206          54.2630           0.0280           0.1710           0.0450
batch_dot                             144          15.4360           0.0590           1.0670           0.1072
_backward_Dropout                     240          10.9490           0.0300           0.0920           0.0456
broadcast_axis                         78          37.3540           0.0370           3.1240           0.4789
_backward_mean                         12           2.9340           0.0390           0.6950           0.2445
LayerNorm                             150          26.5350           0.1380           0.7940           0.1769
mean                                   12           0.8380           0.0390           0.1220           0.0698
_backward_LayerNorm                   150          48.3110           0.2340           0.9060           0.3221
broadcast_mul                        1350          58.1180           0.0210           1.3150           0.0431
_backward_erf                          72           7.1030           0.0720           0.1460           0.0987
log_softmax                             6           0.2200           0.0350           0.0390           0.0367
broadcast_add                         156          10.7970           0.0380           0.7950           0.0692
_arange                                12          20.3000           0.0260           7.9170           1.6917
_backward_reshape                     732          36.3330           0.0280           0.1510           0.0496
erf                                    72           5.8460           0.0570           0.1590           0.0812
WaitForVar                             24           0.2220           0.0050           0.0150           0.0093
transpose                             576          87.2210           0.0430           1.2770           0.1514
Dropout                               240         155.5880           0.0840           2.3730           0.6483
SequenceMask                            6           2.0030           0.1580           0.6750           0.3338
CopyCPU2GPU                            24           2.4000           0.0300           0.4570           0.1000
_backward_where                        72           9.1340           0.0350           0.1860           0.1269
_copy                                  72           7.2890           0.0680           0.1410           0.1012
Activation                              6           0.2440           0.0370           0.0460           0.0407
_backward_FullyConnected              444         309.9390           0.1010           1.4010           0.6981
_backward_div_scalar                   72           5.3830           0.0560           0.0960           0.0748
_backward_slice                         6           0.2790           0.0430           0.0500           0.0465
_backward_broadcast_add               156           9.2620           0.0420           0.0930           0.0594
add_n                                 222          22.4660           0.0360           1.3650           0.1012
pick                                    6           0.2670           0.0340           0.0580           0.0445
_contrib_adamw_update                1206         101.7640           0.0460           1.7690           0.0844
_backward_batch_dot                   144          16.0560           0.0880           0.1540           0.1115
_backward_softmax                      72          10.7450           0.0690           0.2450           0.1492
_backward_broadcast_mul               144          19.9080           0.0340           0.6370           0.1382
_backward_mul_scalar                   78           6.6860           0.0240           0.1410           0.0857

As an user I am interested in sorting the operators based on a particular field (e.g. Avg Time) to find out the most expensive one. It would be great to have such an enhancement.
cc @Vikas89

@eric-haibin-lin eric-haibin-lin added Feature request Profiler MXNet profiling issues labels Feb 8, 2019
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 24, 2019

sorting is added in this PR: #15132

Now we can use
profiler.dumps(sort_by = 'avg', ascending = False)
where sort_by can take avg, min, max, count; and ascending can be either True or 'False'

By the way, you can also do profiler.dump(format = 'json') to get a json string now

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jul 24, 2019

@sandeep-krishnamurthy we can probably close this issue now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature request Profiler MXNet profiling issues
Projects
None yet
Development

No branches or pull requests

4 participants