Deploy BERT model - Script #1237

MoisesHer · 2020-06-02T21:30:18Z

Description

Includes an script to deploy BERT for QA / classification / regression / embedding tasks
It offers the possibility of using available GPU BERT optimizations on MXNet.
It reports latency and throughput, and can check accuracy.

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
Code is well-documented

Changes

Added python script to deploy BERT for different tasks
Also including the custom graph pass to fuse BiasAddition-GELU ops, and prearrange Multi-head Attention weights-bias.

Comments

cc @dmlc/gluon-nlp-team

mli · 2020-06-02T22:30:12Z

Job PR-1237/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/1/index.html

mli · 2020-06-02T22:30:13Z

Job PR-1237/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/2/index.html

mli · 2020-06-03T01:11:46Z

Job PR-1237/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/4/index.html

mli · 2020-06-03T01:14:59Z

Job PR-1237/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/3/index.html

mli · 2020-06-03T01:22:28Z

Job PR-1237/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/5/index.html

codecov · 2020-06-03T02:04:17Z

Codecov Report

Merging #1237 into master will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1237      +/-   ##
==========================================
+ Coverage   87.42%   87.45%   +0.03%     
==========================================
  Files          81       81              
  Lines        7346     7365      +19     
==========================================
+ Hits         6422     6441      +19     
  Misses        924      924

Impacted Files	Coverage Δ
src/gluonnlp/model/bert.py	`94.65% <0.00%> (+0.03%)`	⬆️
src/gluonnlp/model/transformer.py	`91.71% <0.00%> (+0.05%)`	⬆️
src/gluonnlp/model/language_model.py	`98.64% <0.00%> (+0.15%)`	⬆️

scripts/bert/finetune_classifier.py

scripts/bert/finetune_squad.py

eric-haibin-lin · 2020-06-06T00:21:58Z

scripts/bert/bertpass_gpu.cc

@@ -0,0 +1,450 @@
+/*


Would you mind also updating index.rst with the usage for deploy.py?

just updated, but please let me know if you think we can improve it.
I am not very sure how the custom graph pass should be shared (source code / library / in another place)

How is the compilation of the custom graph pass currently triggered? Do you think it should be distributed as part of gluonnlp (or as now, "only" as part of the scripts). If so, ideally, pip should trigger compilation of the graph pass when installing gluonnlp. You could set this up via the setup.py. Maybe there are some missing pieces in the mxnet pip package preventing this to work.

If you like to keep it in the scripts, you can also add a setup.py in the scripts folder.

It would be good to try this out.

Thanks for the ideas. In the current version is not triggered, so yes, we need to provide a mechanism for that.
I think a setup.py would make more sense since this is specific for BERT-gpu and optional, and probably we do not want to affect the installation process.

mli · 2020-06-06T22:21:24Z

Job PR-1237/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/6/index.html

mli · 2020-06-06T22:24:09Z

Job PR-1237/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/7/index.html

eric-haibin-lin

I assume the graph pass requires mxnet nightly build? Would it make sense to mention the minimum mxnet version required for this script in the doc?

mli · 2020-06-11T02:30:13Z

Job PR-1237/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/8/index.html

mli · 2020-06-11T06:51:12Z

Job PR-1237/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/9/index.html

MoisesHer · 2020-06-11T17:05:25Z

I assume the graph pass requires mxnet nightly build? Would it make sense to mention the minimum mxnet version required for this script in the doc?

Yes, I have added a comment in index.rst for TrueFP16 and custom pass optimizations: " These GPU optimizations require MXNet version 1.7 or higher"

eric-haibin-lin

Is there any wheel in https://dist.mxnet.io/python that contains the feature required by this script? If so, we can update https://github.com/dmlc/gluon-nlp/blob/v0.9.x/env/gpu/py3-master.yml#L36 and add an integration test for the script in https://github.com/dmlc/gluon-nlp/blob/master/scripts/tests/test_scripts.py ? Otherwise this PR looks good

leezu

@MoisesHer have you tried below?

leezu · 2020-08-11T22:00:44Z

scripts/bert/setup.py

+    out_lib_file = 'bertpass_lib.so'
+    log.info(' ... compiling BERT custom graph pass into %s', out_lib_file)
+    mxnet_path = pathlib.Path(mxnet.__file__).parent.absolute()
+    mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include/mxnet')


Suggested change

mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include/mxnet')

mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include')

leezu · 2020-08-11T22:00:58Z

scripts/bert/bertpass_gpu.cc

+#include <algorithm>
+#include <unordered_set>
+#include <functional>
+#include "lib_api.h"


Suggested change

#include "lib_api.h"

#include "mxnet/lib_api.h"

thanks
I have tried, and also tried to print the file. It seems it is not there:
TypeError: invalid file: PosixPath('/var/lib/jenkins/workspace/gluon-nlp-gpu-py3-master/conda/gpu/py3-master/lib/python3.5/site-packages/mxnet/include/mxnet/lib_api.h')
but in the wheel it was included

@MoisesHer I verified that installing https://repo.mxnet.io/dist/python/cu100/mxnet_cu100-1.7.0b20200809-py2.py3-none-manylinux2014_x86_64.whl locally does place the file at /home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/mxnet/include/mxnet/lib_api.h.

You could print the output of find /var/lib/jenkins -name lib_api.h in a debug statement to find the file location.

I was able to print the header file /var/lib/jenkins/workspace/gluon-nlp-gpu-py3/conda/gpu/py3/lib/python3.5/site-packages/mxnet/include/mxnet/lib_api.h within the CI (the previous error was related to python version, so actually the file existed)
It contains this version:
https://github.com/apache/incubator-mxnet/blob/v1.6.x/include/mxnet/lib_api.h
which does not correspond to MXNet 1.7x version:
https://github.com/apache/incubator-mxnet/blob/v1.7.x/include/mxnet/lib_api.h

@leezu any clue?

CI "stable version tests" tests that the branch works with MXNet 1.6:

https://github.com/dmlc/gluon-nlp/blob/v0.x/env/gpu/py3.yml#L36

If you don't support 1.6, you need to skip the test for the "stable version tests". However, given that the 1.7 vote has finally passed, we can switch the "stable version tests" over to 1.7 in the near future.

leezu · 2020-08-28T17:18:35Z

See #1325 for doc fix

mli · 2020-09-02T18:32:59Z

Job PR-1237/34 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/34/index.html

Done

MoisesHer · 2020-09-02T20:19:10Z

I am not sure about the remaining issue, is it a timeout?
if it is, can I avoid it?
thanks

szha · 2020-09-02T20:26:40Z

@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?

MoisesHer · 2020-09-03T16:10:50Z

@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?

Thanks, another question: is there a way for me to trigger CI checks (without new commit)?

chenw23 · 2020-09-03T16:20:42Z

@MoisesHer yes I think the current test takes too long. Could you try to reduce the time it takes by potentially reducing the workload?

Thanks, another question: is there a way for me to trigger CI checks (without new commit)?

Sure, you can just click into Details of the check to be directed to the jenkins page, and then click Log in button on the right-upper corner. Then click the Rurun button(looks like a arrowed circle) on the upper-right corner.

mli · 2020-09-03T17:01:21Z

Job PR-1237/37 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1237/37/index.html

MoisesHer · 2020-09-03T22:12:26Z

I am confused, not sure why this is failing now: MXNetError: Check failed: compileResult == NVRTC_SUCCESS (6 vs. 0) : NVRTC Compilation failed. Please set environment variable MXNET_USE_FUSION to 0.

MoisesHer · 2020-09-10T15:51:32Z

I am confused, not sure why this is failing now: MXNetError: Check failed: compileResult == NVRTC_SUCCESS (6 vs. 0) : NVRTC Compilation failed. Please set environment variable MXNET_USE_FUSION to 0.

Are all those expand_dims expected?

szha · 2020-09-10T16:07:28Z

@MoisesHer looks like a compatibility issue. we will address this in a separate PR. thanks for pushing this through!

* Add example script to deploy BERT * Add options to better measure performance * Allow specification of path for exported model * Add option to use custom graph pass * Add optimization for MHA in custom graph pass * Correct bug with input shapes in optimize_for * correct typo * fix lint * fix lint * Add documentation * Add documentation for using deploy script * Correct typo/add spaces in documentation * Add setup.py to compile pass, update documentation * Fix bug in path to include dir & fix pylint * Add unitest for deploy bert script * change CUDA version in wheel * test latest wheel * change path to custom pass library * fixing trigger custom pass compilation * fix lint * fix lint * Update mxnet pip version * Only GPU versions changed * fix lint * change wheel to include mkl headers * lint docstring * remove debug print * change include paths * lint * debugging lib_api.h * debugging lib_api.h * debugging * Disable test for now * skip test if mxnet_version < 1.7.0 * use pytest.mark.skipif to skip test * test only BERT-base (fp16/fp32, SST/QA, embeddings) to avoid timeout Co-authored-by: Leonard Lausen <[email protected]>

* Add example script to deploy BERT * Add options to better measure performance * Allow specification of path for exported model * Add option to use custom graph pass * Add optimization for MHA in custom graph pass * Correct bug with input shapes in optimize_for * correct typo * fix lint * fix lint * Add documentation * Add documentation for using deploy script * Correct typo/add spaces in documentation * Add setup.py to compile pass, update documentation * Fix bug in path to include dir & fix pylint * Add unitest for deploy bert script * change CUDA version in wheel * test latest wheel * change path to custom pass library * fixing trigger custom pass compilation * fix lint * fix lint * Update mxnet pip version * Only GPU versions changed * fix lint * change wheel to include mkl headers * lint docstring * remove debug print * change include paths * lint * debugging lib_api.h * debugging lib_api.h * debugging * Disable test for now * skip test if mxnet_version < 1.7.0 * use pytest.mark.skipif to skip test * test only BERT-base (fp16/fp32, SST/QA, embeddings) to avoid timeout Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>

MoisesHer added 6 commits April 28, 2020 17:30

Add example script to deploy BERT

44c6ae0

Add options to better measure performance

fef1880

Allow specification of path for exported model

3268903

Add option to use custom graph pass

e49e4f9

Add optimization for MHA in custom graph pass

6fbb9b4

Correct bug with input shapes in optimize_for

840d8c6

MoisesHer requested a review from a team as a code owner June 2, 2020 21:30

correct typo

f4ff77b

MoisesHer changed the title ~~Deploy bert script~~ Deploy BERT model - Script Jun 2, 2020

MoisesHer added 2 commits June 2, 2020 15:15

fix lint

49e2137

fix lint

31ce7b5

Add documentation

0e29e80

eric-haibin-lin reviewed Jun 6, 2020

View reviewed changes

MoisesHer added 2 commits June 6, 2020 14:15

Add documentation for using deploy script

fa0f188

Correct typo/add spaces in documentation

df71178

eric-haibin-lin reviewed Jun 8, 2020

View reviewed changes

Add setup.py to compile pass, update documentation

0243d9e

Fix bug in path to include dir & fix pylint

a7935cd

eric-haibin-lin reviewed Jun 18, 2020

View reviewed changes

leezu reviewed Aug 11, 2020

View reviewed changes

MoisesHer added 4 commits August 11, 2020 22:39

change include paths

60140ce

lint

86d1d3c

debugging lib_api.h

0333d11

debugging lib_api.h

da256be

szha changed the base branch from master to v0.x August 13, 2020 02:15

Merge branch 'v0.x' into deploy_bert

ac56f2d

MoisesHer requested a review from eric-haibin-lin August 13, 2020 04:25

MoisesHer added 4 commits August 17, 2020 23:50

debugging

d6d7a99

Disable test for now

893d886

skip test if mxnet_version < 1.7.0

351010d

use pytest.mark.skipif to skip test

206f67f

szha assigned leezu Aug 28, 2020

Merge remote-tracking branch 'upstream/v0.x' into deploy_bert

44b4060

test only BERT-base (fp16/fp32, SST/QA, embeddings) to avoid timeout

e05fb60

szha merged commit 7f29267 into dmlc:v0.x Sep 10, 2020

leezu mentioned this pull request Oct 8, 2020

[1.x / 1.8] Regression in runtime fusion apache/mxnet#19316

Closed

	mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include/mxnet')
	mxnet_include_path = pathlib.Path.joinpath(mxnet_path, 'include')

Deploy BERT model - Script #1237

Deploy BERT model - Script #1237

Conversation

MoisesHer commented Jun 2, 2020

Description

Checklist

Essentials

Changes

Comments

mli commented Jun 2, 2020

mli commented Jun 2, 2020

mli commented Jun 3, 2020

mli commented Jun 3, 2020

mli commented Jun 3, 2020

codecov bot commented Jun 3, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mli commented Jun 6, 2020

mli commented Jun 6, 2020

eric-haibin-lin left a comment

Choose a reason for hiding this comment

mli commented Jun 11, 2020

mli commented Jun 11, 2020

MoisesHer commented Jun 11, 2020

eric-haibin-lin left a comment

Choose a reason for hiding this comment

leezu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leezu commented Aug 28, 2020

mli commented Sep 2, 2020

MoisesHer commented Sep 2, 2020

szha commented Sep 2, 2020

MoisesHer commented Sep 3, 2020

chenw23 commented Sep 3, 2020

mli commented Sep 3, 2020

MoisesHer commented Sep 3, 2020

MoisesHer commented Sep 10, 2020

szha commented Sep 10, 2020

codecov bot commented Jun 3, 2020 •

edited

Loading