You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.
Setup
We use MMS docker images to run load tests, we can start a local container using the following command.
nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081 -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu
for building MXNet opencv 3.2 and CUDA 9.2 were used.
Load testing was done using locust, to install locust
# Register and load resnet-18 model archive
curl -X POST 127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar
Start a single worker and run latency test
Start worker and latency test
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
$ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 --no-web -c 1 -r 1 -t 20s --only-summary
To change mxnet version/build in docker image,
NOTE By default recent pip version is pulled.
# Go into docker image
nvidia-docker exec -u root -it mms_benchmark_gpu bash
$ pip uninstall mxnet-cu92mkl
$ pip install <new-build>.whl
ctrl + p + q to quit docker image
# Destroy existing worker, and create new worker, this loads in newly installed mxnet
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true'
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
Results
on mxnet-cu92==1.3.0post0
# locust result
Name # reqs # fails Avg Min Max | Median req/s
--------------------------------------------------------------------------------------------------------------------------------------------
POST /predictions/resnet-18 152 0(0.00%) 31 30 39 | 31 7.60
--------------------------------------------------------------------------------------------------------------------------------------------
Total 152 0(0.00%) 7.60
Percentage of the requests completed within given times
Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100%
--------------------------------------------------------------------------------------------------------------------------------------------
POST /predictions/resnet-18 152 31 31 31 31 32 33 33 34 280
--------------------------------------------------------------------------------------------------------------------------------------------
Total 152 31 31 31 31 32 33 33 34 280
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance
There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.
Setup
We use MMS docker images to run load tests, we can start a local container using the following command.
for building MXNet opencv 3.2 and CUDA 9.2 were used.
Load testing was done using locust, to install locust
Download Test image
The locust script for load testing
Running Load test
Registering and loading model
Start a single worker and run latency test
To change mxnet version/build in docker image,
NOTE By default recent pip version is pulled.
Results
on mxnet-cu92==1.3.0post0
On mxnet-cu92 with commit f9f7416
This regression thus carries over to 1.3.1
There is a 30% increase in latency/inference time for resnet-18 based on the above results.
The text was updated successfully, but these errors were encountered: