Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Deploy BERT model - Script #1237

Merged
merged 39 commits into from
Sep 10, 2020
Merged

Deploy BERT model - Script #1237

merged 39 commits into from
Sep 10, 2020

Conversation

MoisesHer
Copy link
Contributor

Description

Includes an script to deploy BERT for QA / classification / regression / embedding tasks
It offers the possibility of using available GPU BERT optimizations on MXNet.
It reports latency and throughput, and can check accuracy.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • Code is well-documented

Changes

  • Added python script to deploy BERT for different tasks
  • Also including the custom graph pass to fuse BiasAddition-GELU ops, and prearrange Multi-head Attention weights-bias.

Comments

cc @dmlc/gluon-nlp-team

@MoisesHer MoisesHer requested a review from a team as a code owner June 2, 2020 21:30
@MoisesHer MoisesHer changed the title Deploy bert script Deploy BERT model - Script Jun 2, 2020
@mli
Copy link
Member

mli commented Jun 2, 2020

Job PR-1237/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/1/index.html

@mli
Copy link
Member

mli commented Jun 2, 2020

Job PR-1237/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/2/index.html

@mli
Copy link
Member

mli commented Jun 3, 2020

Job PR-1237/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/4/index.html

@mli
Copy link
Member

mli commented Jun 3, 2020

Job PR-1237/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/3/index.html

@mli
Copy link
Member

mli commented Jun 3, 2020

Job PR-1237/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/5/index.html

@codecov
Copy link

codecov bot commented Jun 3, 2020

Codecov Report

Merging #1237 into master will increase coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1237      +/-   ##
==========================================
+ Coverage   87.42%   87.45%   +0.03%     
==========================================
  Files          81       81              
  Lines        7346     7365      +19     
==========================================
+ Hits         6422     6441      +19     
  Misses        924      924              
Impacted Files Coverage Δ
src/gluonnlp/model/bert.py 94.65% <0.00%> (+0.03%) ⬆️
src/gluonnlp/model/transformer.py 91.71% <0.00%> (+0.05%) ⬆️
src/gluonnlp/model/language_model.py 98.64% <0.00%> (+0.15%) ⬆️

scripts/bert/finetune_classifier.py Show resolved Hide resolved
scripts/bert/finetune_squad.py Show resolved Hide resolved
@@ -0,0 +1,450 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind also updating index.rst with the usage for deploy.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just updated, but please let me know if you think we can improve it.
I am not very sure how the custom graph pass should be shared (source code / library / in another place)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the compilation of the custom graph pass currently triggered? Do you think it should be distributed as part of gluonnlp (or as now, "only" as part of the scripts). If so, ideally, pip should trigger compilation of the graph pass when installing gluonnlp. You could set this up via the setup.py. Maybe there are some missing pieces in the mxnet pip package preventing this to work.

If you like to keep it in the scripts, you can also add a setup.py in the scripts folder.

It would be good to try this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ideas. In the current version is not triggered, so yes, we need to provide a mechanism for that.
I think a setup.py would make more sense since this is specific for BERT-gpu and optional, and probably we do not want to affect the installation process.

@mli
Copy link
Member

mli commented Jun 6, 2020

Job PR-1237/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/6/index.html

@mli
Copy link
Member

mli commented Jun 6, 2020

Job PR-1237/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/7/index.html

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the graph pass requires mxnet nightly build? Would it make sense to mention the minimum mxnet version required for this script in the doc?

@mli
Copy link
Member

mli commented Jun 11, 2020

Job PR-1237/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/8/index.html

@mli
Copy link
Member

mli commented Jun 11, 2020

Job PR-1237/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/9/index.html

@MoisesHer
Copy link
Contributor Author

I assume the graph pass requires mxnet nightly build? Would it make sense to mention the minimum mxnet version required for this script in the doc?

Yes, I have added a comment in index.rst for TrueFP16 and custom pass optimizations: " These GPU optimizations require MXNet version 1.7 or higher"

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any wheel in https://dist.mxnet.io/python that contains the feature required by this script? If so, we can update https://github.com/dmlc/gluon-nlp/blob/v0.9.x/env/gpu/py3-master.yml#L36 and add an integration test for the script in https://github.com/dmlc/gluon-nlp/blob/master/scripts/tests/test_scripts.py ? Otherwise this PR looks good

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MoisesHer have you tried below?

out_lib_file = 'bertpass_lib.so'
log.info(' ... compiling BERT custom graph pass into %s', out_lib_file)
mxnet_path = pathlib.Path(mxnet.__file__).parent.absolute()
mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include/mxnet')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include/mxnet')
mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include')

#include <algorithm>
#include <unordered_set>
#include <functional>
#include "lib_api.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "lib_api.h"
#include "mxnet/lib_api.h"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks
I have tried, and also tried to print the file. It seems it is not there:
TypeError: invalid file: PosixPath('/var/lib/jenkins/workspace/gluon-nlp-gpu-py3-master/conda/gpu/py3-master/lib/python3.5/site-packages/mxnet/include/mxnet/lib_api.h')
but in the wheel it was included

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MoisesHer I verified that installing https://repo.mxnet.io/dist/python/cu100/mxnet_cu100-1.7.0b20200809-py2.py3-none-manylinux2014_x86_64.whl locally does place the file at /home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/mxnet/include/mxnet/lib_api.h.

You could print the output of find /var/lib/jenkins -name lib_api.h in a debug statement to find the file location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to print the header file /var/lib/jenkins/workspace/gluon-nlp-gpu-py3/conda/gpu/py3/lib/python3.5/site-packages/mxnet/include/mxnet/lib_api.h within the CI (the previous error was related to python version, so actually the file existed)
It contains this version:
https://github.com/apache/incubator-mxnet/blob/v1.6.x/include/mxnet/lib_api.h
which does not correspond to MXNet 1.7x version:
https://github.com/apache/incubator-mxnet/blob/v1.7.x/include/mxnet/lib_api.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu any clue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI "stable version tests" tests that the branch works with MXNet 1.6:

https://github.com/dmlc/gluon-nlp/blob/v0.x/env/gpu/py3.yml#L36

If you don't support 1.6, you need to skip the test for the "stable version tests". However, given that the 1.7 vote has finally passed, we can switch the "stable version tests" over to 1.7 in the near future.

@szha szha changed the base branch from master to v0.x August 13, 2020 02:15
@leezu
Copy link
Contributor

leezu commented Aug 28, 2020

See #1325 for doc fix

@mli
Copy link
Member

mli commented Sep 2, 2020

Job PR-1237/34 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/34/index.html

@MoisesHer
Copy link
Contributor Author

I am not sure about the remaining issue, is it a timeout?
if it is, can I avoid it?
thanks

@szha
Copy link
Member

szha commented Sep 2, 2020

@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?

@MoisesHer
Copy link
Contributor Author

@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?

Thanks, another question: is there a way for me to trigger CI checks (without new commit)?

@chenw23
Copy link
Member

chenw23 commented Sep 3, 2020

@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?

Thanks, another question: is there a way for me to trigger CI checks (without new commit)?

Sure, you can just click into Details of the check to be directed to the jenkins page, and then click Log in button on the right-upper corner. Then click the Rurun button(looks like a arrowed circle) on the upper-right corner.

@mli
Copy link
Member

mli commented Sep 3, 2020

Job PR-1237/37 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/37/index.html

@MoisesHer
Copy link
Contributor Author

I am confused, not sure why this is failing now: MXNetError: Check failed: compileResult == NVRTC_SUCCESS (6 vs. 0) : NVRTC Compilation failed. Please set environment variable MXNET_USE_FUSION to 0.

@MoisesHer
Copy link
Contributor Author

I am confused, not sure why this is failing now: MXNetError: Check failed: compileResult == NVRTC_SUCCESS (6 vs. 0) : NVRTC Compilation failed. Please set environment variable MXNET_USE_FUSION to 0.

Are all those expand_dims expected?

@szha szha merged commit 7f29267 into dmlc:v0.x Sep 10, 2020
@szha
Copy link
Member

szha commented Sep 10, 2020

@MoisesHer looks like a compatibility issue. we will address this in a separate PR. thanks for pushing this through!

MoisesHer added a commit to MoisesHer/gluon-nlp that referenced this pull request Sep 10, 2020
* Add example script to deploy BERT

* Add options to better measure performance

* Allow specification of path for exported model

* Add option to use custom graph pass

* Add optimization for MHA in custom graph pass

* Correct bug with input shapes in optimize_for

* correct typo

* fix lint

* fix lint

* Add documentation

* Add documentation for using deploy script

* Correct typo/add spaces in documentation

* Add setup.py to compile pass, update documentation

* Fix bug in path to include dir & fix pylint

* Add unitest for deploy bert script

* change CUDA version in wheel

* test latest wheel

* change path to custom pass library

* fixing trigger custom pass compilation

* fix lint

* fix lint

* Update mxnet pip version

* Only GPU versions changed

* fix lint

* change wheel to include mkl headers

* lint docstring

* remove debug print

* change include paths

* lint

* debugging lib_api.h

* debugging lib_api.h

* debugging

* Disable test for now

* skip test if mxnet_version < 1.7.0

* use pytest.mark.skipif to skip test

* test only BERT-base (fp16/fp32, SST/QA, embeddings) to avoid timeout

Co-authored-by: Leonard Lausen <[email protected]>
szha pushed a commit that referenced this pull request Sep 10, 2020
* Add example script to deploy BERT

* Add options to better measure performance

* Allow specification of path for exported model

* Add option to use custom graph pass

* Add optimization for MHA in custom graph pass

* Correct bug with input shapes in optimize_for

* correct typo

* fix lint

* fix lint

* Add documentation

* Add documentation for using deploy script

* Correct typo/add spaces in documentation

* Add setup.py to compile pass, update documentation

* Fix bug in path to include dir & fix pylint

* Add unitest for deploy bert script

* change CUDA version in wheel

* test latest wheel

* change path to custom pass library

* fixing trigger custom pass compilation

* fix lint

* fix lint

* Update mxnet pip version

* Only GPU versions changed

* fix lint

* change wheel to include mkl headers

* lint docstring

* remove debug print

* change include paths

* lint

* debugging lib_api.h

* debugging lib_api.h

* debugging

* Disable test for now

* skip test if mxnet_version < 1.7.0

* use pytest.mark.skipif to skip test

* test only BERT-base (fp16/fp32, SST/QA, embeddings) to avoid timeout

Co-authored-by: Leonard Lausen <[email protected]>

Co-authored-by: Leonard Lausen <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants