performance degradation in model inference from 1.3.1 to 1.4.0 #14569

apeforest · 2019-03-29T18:10:17Z

There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.

Setup

We use MMS docker images to run load tests, we can start a local container using the following command.

nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081  -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu

for building MXNet opencv 3.2 and CUDA 9.2 were used.

Load testing was done using locust, to install locust

pip install locust

Download Test image

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

The locust script for load testing

# test_resnet_!8.py
from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task
import urllib
import os
data = None
with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data:
    data = data.read()

class PredictionTasks(TaskSet):
    @task
    def inference(self):
        self.client.post("/predictions/resnet-18", data=data,headers={'Content-Type': 'image/jpeg'})

class Prediction(HttpLocust):
    task_set = PredictionTasks
    min_wait = 100
    max_wait = 100

Running Load test

Registering and loading model

# Register and load resnet-18 model archive
 curl -X POST 127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar

Start a single worker and run latency test

Start worker and latency test
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
$ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 --no-web  -c 1 -r 1 -t 20s --only-summary

To change mxnet version/build in docker image,

NOTE By default recent pip version is pulled.

# Go into docker image
nvidia-docker exec -u root -it mms_benchmark_gpu bash
$  pip uninstall mxnet-cu92mkl
$ pip install <new-build>.whl
ctrl + p + q to quit docker image

# Destroy existing worker, and create new worker, this loads in newly installed mxnet
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true'
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'

Results

on mxnet-cu92==1.3.0post0

# locust result
 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                      152     0(0.00%)      31      30      39  |      31    7.60
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                            152     0(0.00%)                                       7.60

Percentage of the requests completed within given times
 Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                       152     31     31     31     31     32     33     33     34     280
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                             152     31     31     31     31     32     33     33     34     280

On mxnet-cu92 with commit f9f7416

 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                      141     0(0.00%)      41      37     337  |      38    7.20
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                            141     0(0.00%)                                       7.20

Percentage of the requests completed within given times
 Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                       141     38     39     39     40     40     42     49     49    340
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                             141     38     39     39     40     40     42     49     49    340

This regression thus carries over to 1.3.1

There is a 30% increase in latency/inference time for resnet-18 based on the above results.

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-03-29T18:10:21Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

apeforest added the Performance label Mar 29, 2019

apeforest mentioned this issue Mar 29, 2019

add a compiler flag to use int64 as tensor size #14570

Merged

7 tasks

sandeep-krishnamurthy mentioned this issue Apr 1, 2019

the inference speed using C++ API with mxnet of higher version is slower than lower mxnet #14512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance degradation in model inference from 1.3.1 to 1.4.0 #14569

performance degradation in model inference from 1.3.1 to 1.4.0 #14569

apeforest commented Mar 29, 2019

mxnet-label-bot commented Mar 29, 2019

performance degradation in model inference from 1.3.1 to 1.4.0 #14569

performance degradation in model inference from 1.3.1 to 1.4.0 #14569

Comments

apeforest commented Mar 29, 2019

Download Test image

mxnet-label-bot commented Mar 29, 2019