Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[API] Make BERT a hybrid block #877

Merged
merged 35 commits into from
Aug 19, 2019
Merged

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Aug 15, 2019

Description

This PR makes use of the infer_range attribute of the arange operator (thanks to @TaoLv and @leezu ) in MXNet to fully hybridize the BERT model.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

The change is backward compatible. Added tests for model forward with and without optional inputs.

@codecov
Copy link

codecov bot commented Aug 15, 2019

Codecov Report

❗ No coverage uploaded for pull request head (hybrid-block@7989539). Click here to learn what that means.
The diff coverage is n/a.

@codecov
Copy link

codecov bot commented Aug 15, 2019

Codecov Report

Merging #877 into master will decrease coverage by 0.38%.
The diff coverage is 97.01%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #877      +/-   ##
========================================
- Coverage   90.38%    90%   -0.39%     
========================================
  Files          66     66              
  Lines        6367   6433      +66     
========================================
+ Hits         5755   5790      +35     
- Misses        612    643      +31
Impacted Files Coverage Δ
src/gluonnlp/model/bert.py 99.45% <100%> (+0.06%) ⬆️
src/gluonnlp/model/transformer.py 89.69% <92.3%> (-0.44%) ⬇️
src/gluonnlp/model/seq2seq_encoder_decoder.py 45.31% <0%> (-29.69%) ⬇️
src/gluonnlp/model/block.py 51.92% <0%> (-1.93%) ⬇️
src/gluonnlp/model/attention_cell.py 93.28% <0%> (-1.35%) ⬇️
src/gluonnlp/vocab/subwords.py 81.29% <0%> (+0.16%) ⬆️

@mli
Copy link
Member

mli commented Aug 16, 2019

Job PR-877/13 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/13/index.html

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why introduce legacy models? Can we just delete them?

@eric-haibin-lin
Copy link
Member Author

@leezu the contrib.arange_like operator is not available in mxnet 1.5, which gluonnlp depends on

@leezu
Copy link
Contributor

leezu commented Aug 18, 2019

Right, but there seems to be the workaround based on infer_range which you are using in this PR. That works on mxnet 1.5. Where does the workaround fall short?

@mli
Copy link
Member

mli commented Aug 18, 2019

Job PR-877/14 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/14/index.html

@mli
Copy link
Member

mli commented Aug 18, 2019

Job PR-877/15 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/15/index.html

@mli
Copy link
Member

mli commented Aug 18, 2019

Job PR-877/16 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/16/index.html

@mli
Copy link
Member

mli commented Aug 18, 2019

Job PR-877/17 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/17/index.html

@mli
Copy link
Member

mli commented Aug 18, 2019

Job PR-877/18 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/18/index.html

@mli
Copy link
Member

mli commented Aug 18, 2019

Job PR-877/19 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-877/19/index.html

@eric-haibin-lin
Copy link
Member Author

@leezu good point. I removed the legacy bert model definition. please review again

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@leezu leezu merged commit a9eb2fd into dmlc:master Aug 19, 2019
eric-haibin-lin added a commit that referenced this pull request Aug 20, 2019
* change BaseTransformerEncoder forward

* fix

* merge _forward to hybrid_forward

* if arange_like is not avaiable

* fix

* make transformer encoder a true hybridblock

* fix lint

* fix bug

* use zeroslike and infer_range to avoid backprop problem

* update test

* Revert "update test"

This reverts commit d024a0c.

* more printing

* fix test case

* make bert hybrid

* add legacy model

* fix lint

* fix lint

* revert change for default parameter

* fix arange dtype

* fix lint

* revert mokey patch in NMT interface

* fix dtype

* fix bug in arange

* remove legacy model

* also update bert scripts

* fix lint

* fix typo

* remove hybridbert test
@eric-haibin-lin eric-haibin-lin deleted the hybrid-block branch February 2, 2020 06:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants