Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet operator profile aggregate counter issue #10520

Closed
jinhuang415 opened this issue Apr 12, 2018 · 3 comments · Fixed by #15240
Closed

MXNet operator profile aggregate counter issue #10520

jinhuang415 opened this issue Apr 12, 2018 · 3 comments · Fixed by #15240
Labels
Bug Profiler MXNet profiling issues

Comments

@jinhuang415
Copy link
Contributor

jinhuang415 commented Apr 12, 2018

It looks the operator counter of the aggregated output is the 2x of real value.

I used below gluon model (mainly copied from https://mxnet.incubator.apache.org/tutorials/gluon/gluon.html, contains 2 convolutions for each forward pass) and did profiling, the aggregated convolution counter is 4 instead of 2, the counter doubled for other OPs as well.

# import dependencies
from __future__ import print_function
import numpy as np
import mxnet as mx
import mxnet.ndarray as F
import mxnet.gluon as gluon
from mxnet.gluon import nn
from mxnet import autograd

class Net(gluon.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        with self.name_scope():
            # layers created in name_scope will inherit name space
            # from parent layer.
            self.conv1 = nn.Conv2D(6, kernel_size=5)
            self.pool1 = nn.MaxPool2D(pool_size=(2,2))
            self.conv2 = nn.Conv2D(16, kernel_size=5)
            self.pool2 = nn.MaxPool2D(pool_size=(2,2))
            self.fc1 = nn.Dense(120)
            self.fc2 = nn.Dense(84)
            self.fc3 = nn.Dense(10)

    def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        # 0 means copy over size from corresponding dimension.
        # -1 means infer size from the rest of dimensions.
        x = x.reshape((0, -1))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()
# Initialize on CPU. Replace with `mx.gpu(0)`, or `[mx.gpu(0), mx.gpu(1)]`,
# etc to use one or more GPUs.
net.collect_params().initialize(mx.init.Xavier(), ctx=mx.cpu())

mx.profiler.set_config(aggregate_stats=True)
mx.profiler.set_state('run')

data = mx.nd.random_normal(shape=(10, 1, 32, 32))  # dummy data
output = net(data)
output.wait_to_read()

print(mx.profiler.dumps())

The profile output is below:

[jinhuang@mlt-ace image-classification]$ python test_gluon.py

Profile Statistics.
        Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: cpu/0                          65         535.0320           0.0240         726.1520         363.0640

MXNET_C_API
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
MXNDArrayReshape64                      1           0.0140           0.0140           0.0140           0.0140
MXNDArrayFree                          21           0.1100           0.0000           0.0290           0.0052
MXAutogradMarkVariables                10           0.1080           0.0080           0.0180           0.0108
MXNDArrayCreateEx                      10           0.0340           0.0020           0.0070           0.0034
MXNDArrayGetDType                      15           0.0000           0.0000           0.0000           0.0000
MXSymbolSetAttr                        47           0.0920           0.0010           0.0130           0.0020
MXNDArrayGetContext                    13           0.0070           0.0000           0.0010           0.0005
MXNDArrayWaitToRead                     1          12.0930          12.0930          12.0930          12.0930
MXSymbolInferShape                      7           0.5700           0.0490           0.2320           0.0814
MXNet C API Calls                     230           0.2300           0.0010           0.2300           0.1145
MXNet C API Concurrency               459           0.0010           0.0000           0.0010           0.0005
MXSymbolCreateAtomicSymbol               7           0.3780           0.0190           0.1540           0.0540
MXAutogradSetIsRecording               10           0.0140           0.0000           0.0070           0.0014
MXAutogradSetIsTraining                10           0.0040           0.0000           0.0010           0.0004
MXImperativeInvokeEx                   52           3.7600           0.0150           1.7250           0.0723
MXNDArrayGetShape                      25           0.0190           0.0000           0.0030           0.0008

operator
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
WaitForVar                              2           0.0080           0.0040           0.0040           0.0040
FullyConnected                          6           0.7770           0.0590           0.2380           0.1295
Pooling                                 4           0.5280           0.0890           0.1750           0.1320
_zeros                                 20           0.6220           0.0020           0.1410           0.0311
ResourceParallelRandomSetSeed               2          14.7190           7.3580           7.3610           7.3595
DeleteVariable                         40           0.1130           0.0010           0.0070           0.0028
relu                                    8           0.2220           0.0070           0.0770           0.0278
_random_normal                          2          10.8890           5.4430           5.4460           5.4445
_random_uniform                        10           0.7370           0.0100           0.2740           0.0737
_full                                  10           0.0350           0.0010           0.0080           0.0035
Convolution                             4          12.5590           0.5260           5.7530           3.1398
zeros_like                             20           0.3800           0.0030           0.1490           0.0190
CopyCPU2CPU                            20           0.4690           0.0030           0.1530           0.0235
Reorder                                 4           0.0110           0.0010           0.0050           0.0027
@jinhuang415
Copy link
Contributor Author

@pengzhao-intel

@Roshrini
Copy link
Member

@nswamy Can you please add labels-
Python, Gluon, Profiler

@pengzhao-intel
Copy link
Contributor

@cjolivier01 This functionality is really useful and we apply it in our daily works.
Could you help fix the issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Profiler MXNet profiling issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants