Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

performance degradation in model inference from 1.3.1 to 1.4.0 #14569

Open
apeforest opened this issue Mar 29, 2019 · 1 comment
Open

performance degradation in model inference from 1.3.1 to 1.4.0 #14569

apeforest opened this issue Mar 29, 2019 · 1 comment

Comments

@apeforest
Copy link
Contributor

There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.

Setup

We use MMS docker images to run load tests, we can start a local container using the following command.

nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081  -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu

for building MXNet opencv 3.2 and CUDA 9.2 were used.

Load testing was done using locust, to install locust

pip install locust

Download Test image

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

The locust script for load testing

# test_resnet_!8.py
from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task
import urllib
import os
data = None
with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data:
    data = data.read()

class PredictionTasks(TaskSet):
    @task
    def inference(self):
        self.client.post("/predictions/resnet-18", data=data,headers={'Content-Type': 'image/jpeg'})

class Prediction(HttpLocust):
    task_set = PredictionTasks
    min_wait = 100
    max_wait = 100

Running Load test

Registering and loading model

# Register and load resnet-18 model archive
 curl -X POST 127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar

Start a single worker and run latency test

Start worker and latency test
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
$ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 --no-web  -c 1 -r 1 -t 20s --only-summary

To change mxnet version/build in docker image,

NOTE By default recent pip version is pulled.

# Go into docker image
nvidia-docker exec -u root -it mms_benchmark_gpu bash
$  pip uninstall mxnet-cu92mkl
$ pip install <new-build>.whl
ctrl + p + q to quit docker image

# Destroy existing worker, and create new worker, this loads in newly installed mxnet
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true'
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'

Results

on mxnet-cu92==1.3.0post0

# locust result
 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                      152     0(0.00%)      31      30      39  |      31    7.60
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                            152     0(0.00%)                                       7.60

Percentage of the requests completed within given times
 Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                       152     31     31     31     31     32     33     33     34     280
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                             152     31     31     31     31     32     33     33     34     280

On mxnet-cu92 with commit f9f7416

 Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                      141     0(0.00%)      41      37     337  |      38    7.20
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                            141     0(0.00%)                                       7.20

Percentage of the requests completed within given times
 Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
--------------------------------------------------------------------------------------------------------------------------------------------
 POST /predictions/resnet-18                                       141     38     39     39     40     40     42     49     49    340
--------------------------------------------------------------------------------------------------------------------------------------------
 Total                                                             141     38     39     39     40     40     42     49     49    340

This regression thus carries over to 1.3.1

There is a 30% increase in latency/inference time for resnet-18 based on the above results.

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants