CachedOp performance regression #15067

lanking520 · 2019-05-24T23:00:11Z

Recently I am running benchmark on the cachedOp performance and get some regression on the result. Please see the table below:

	Module API	cachedOp with Static	CachedOp without static
p2.8xlarge	43ms	42ms	51ms
p3.2xlarge	11ms	19ms	16ms
c5.4xlarge	36ms	38ms	42ms

I would like to highlight the GPU performance comparison. You can see on P2 there is a performance gain with the flag being set but regression in P3.

imported_net.hybridize(static_alloc = True, static_shape = True)

In theory, it is expected the performance boost if you set these two flags since memory is reused. However, on large GPU it seemed not performing fine.

I used nightly build

pip3 install mxnet-cu92mkl --pre
pip3 install mxnet-mkl --pre

Benchmark Script

import mxnet as mx
from mxnet import ndarray as nd
import numpy as np
import json, time, os
from mxnet import gluon

path='http://data.mxnet.io/models/imagenet/'
[mx.test_utils.download(path+'resnet/152-layers/resnet-152-0000.params'),
mx.test_utils.download(path+'resnet/152-layers/resnet-152-symbol.json'),
mx.test_utils.download(path+'synset.txt')]


def compute_stats(perf_results, results):
  results["average"] = np.average(perf_results)
  results['tp50'] = np.percentile(perf_results, 50)
  results['tp90'] = np.percentile(perf_results, 90)
  results['tp99'] = np.percentile(perf_results, 99)

ctx_str = os.environ['BENCHMARK_CTX']

if ctx_str == 'GPU':
  ctx = mx.gpu(0)
elif ctx_str == 'CPU':
  ctx = mx.cpu()

benchmark = {}

prefix = 'resnet-152'

# Model Partition time
t1 = time.time()
imported_net = gluon.nn.SymbolBlock.imports(prefix + '-symbol.json', ['data', 'softmax_label'],
                                            prefix + '-0000.params')
t2 = time.time()
elapsed = (t2 - t1) * 1000

imported_net.hybridize(static_alloc = True, static_shape = True)

benchmark['ModelLoadTime'] = elapsed

fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true')
img = mx.image.imread(fname)


# convert into format (batch, RGB, width, height)
img = mx.image.imresize(img, 300, 300) # resize
img = img.transpose((2, 0, 1)) # Channel first
img = img.expand_dims(axis=0) # batchify
img = img.astype('float32')

sf_label = nd.ones((1))

if ctx_str == 'GPU':
  img = img.as_in_context(mx.gpu(0))

# First Inference
t1 = time.time()
op = imported_net(img, sf_label)
op.wait_to_read()
t2 = time.time()
elapsed = (t2 - t1) * 1000

benchmark['FirstInferCall'] = elapsed

times = 100
time_cost = []

for idx in range(0, times):
  t1 = time.time()
  op = imported_net(img, sf_label)
  op.wait_to_read()
  t2 = time.time()
  elapsed = (t2 - t1) * 1000
  time_cost.append(elapsed)
  print("time cost: ", elapsed, "ms")

benchmark['ModelLoadTime'] = benchmark['FirstInferCall'] - time_cost[0]
compute_stats(time_cost, benchmark)

output = json.dumps(benchmark)

f = open('Inf.json', 'w')
f.write(output)
f.close()

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-05-24T23:00:15Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

pengzhao-intel · 2019-05-25T00:56:06Z

We have a PR to improve CachedOP recently, #14931, I am not sure if this cause the issue.
Do you mind take a try?

ZhennanQin · 2019-05-25T03:34:18Z

So same benchmark with hybridize(static_alloc = True, static_shape = True) got different performance trend on different machines ?

lanking520 · 2019-05-27T16:33:55Z

So same benchmark with hybridize(static_alloc = True, static_shape = True) got different performance trend on different machines ?

Yeah, I suspect it is the problem is coming along with GPU.

lanking520 · 2019-05-27T16:34:14Z

We have a PR to improve CachedOP recently, #14931, I am not sure if this cause the issue.
Do you mind take a try?

Will do a test run on it

sxjscience · 2019-11-08T18:57:56Z

Has the issue been solved?

lanking520 added Bug Performance Gluon labels May 24, 2019

roywei mentioned this issue Jun 12, 2019

Improve static cached_op optimization #15187

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CachedOp performance regression #15067

CachedOp performance regression #15067

lanking520 commented May 24, 2019 •

edited

Loading

mxnet-label-bot commented May 24, 2019

pengzhao-intel commented May 25, 2019

ZhennanQin commented May 25, 2019

lanking520 commented May 27, 2019

lanking520 commented May 27, 2019

sxjscience commented Nov 8, 2019

CachedOp performance regression #15067

CachedOp performance regression #15067

Comments

lanking520 commented May 24, 2019 • edited Loading

Benchmark Script

mxnet-label-bot commented May 24, 2019

pengzhao-intel commented May 25, 2019

ZhennanQin commented May 25, 2019

lanking520 commented May 27, 2019

lanking520 commented May 27, 2019

sxjscience commented Nov 8, 2019

lanking520 commented May 24, 2019 •

edited

Loading