Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[API] use softmax with length, and interleaved matmul for BERT #1091

Merged
merged 10 commits into from
Jan 24, 2020

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Jan 5, 2020

Description

This PR changes the input layout of BERTEncoder from NTC to TNC, so that we can adopt the fast fused self attention op introduced by @Caenorst and @TaoLv .

For BERTModel API, the input layout remains unchanged. If users obtain the BERT model via the get_model API, they don't need to make any code change to run the optimized version (other than upgrading the gluon-nlp version).

The tests won't pass as we need to wait for MXNet's nightly build since Jan 4th, otherwise the CPU op is missing.

On p3.16xlarge, BERT base, seq_len=512, batch_size=256

  • latency/batch before this commit: 611ms
  • latency/batch after this commit: 386ms

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@eric-haibin-lin eric-haibin-lin requested a review from a team as a code owner January 5, 2020 04:51
@codecov
Copy link

codecov bot commented Jan 5, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@f7a2ea1). Click here to learn what that means.
The diff coverage is 81.86%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1091   +/-   ##
=========================================
  Coverage          ?   88.77%           
=========================================
  Files             ?       67           
  Lines             ?     6351           
  Branches          ?        0           
=========================================
  Hits              ?     5638           
  Misses            ?      713           
  Partials          ?        0
Impacted Files Coverage Δ
src/gluonnlp/utils/parameter.py 87.09% <100%> (ø)
src/gluonnlp/optimizer/__init__.py 100% <100%> (ø)
src/gluonnlp/model/bert.py 93.85% <100%> (ø)
src/gluonnlp/metric/__init__.py 100% <100%> (ø)
src/gluonnlp/metric/masked_accuracy.py 100% <100%> (ø)
src/gluonnlp/utils/version.py 100% <100%> (ø)
src/gluonnlp/utils/files.py 45.9% <18.18%> (ø)
src/gluonnlp/model/train/language_model.py 88.51% <25%> (ø)
src/gluonnlp/optimizer/bert_adam.py 87.32% <81.57%> (ø)
src/gluonnlp/metric/length_normalized_loss.py 89.28% <89.28%> (ø)
... and 1 more

@eric-haibin-lin eric-haibin-lin added the release focus Progress focus for release label Jan 5, 2020
@eric-haibin-lin
Copy link
Member Author

eric-haibin-lin commented Jan 6, 2020

@leezu I had a temporary gluon parameter workaround due to apache/mxnet#17220, by overriding _collect_params_with_prefix of the DotProductionSelfAttentionCell.
will that be fixed with gluon 2.0 design?

@leezu
Copy link
Contributor

leezu commented Jan 6, 2020

@eric-haibin-lin eric-haibin-lin changed the title [API][WIP] use softmax with length, and interleaved matmul for BERT [API] use softmax with length, and interleaved matmul for BERT Jan 7, 2020
@eric-haibin-lin
Copy link
Member Author

@muhyun this helps GPT-2, too

@sxjscience
Copy link
Member

@eric-haibin-lin Will we achieve similar speed up if we fuse the kernel + using NTC layout?

@leezu
Copy link
Contributor

leezu commented Jan 15, 2020

Merged master due to #1096. Should be possible to merge now

@mli
Copy link
Member

mli commented Jan 15, 2020

Job PR-1091/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/5/index.html

@leezu
Copy link
Contributor

leezu commented Jan 15, 2020

@eric-haibin-lin

[2020-01-15T17:33:28.776Z] mxnet.base.MXNetError: Error in operator bertencoder0_transformer0_dotproductselfattentioncell0_interleaved_matmul_selfatt_valatt0: [17:21:45] src/operator/numpy/linalg/./../../tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node bertencoder0_transformer0_dotproductselfattentioncell0_interleaved_matmul_selfatt_valatt0 at 1-th input: expected float16, got float32

@mli
Copy link
Member

mli commented Jan 16, 2020

Job PR-1091/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/7/index.html

@mli
Copy link
Member

mli commented Jan 22, 2020

Job PR-1091/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/9/index.html

@eric-haibin-lin
Copy link
Member Author

@leezu could you help build the latest mxnet 1.6.0.rc1 for our CI pipeline? thanks!

@leezu
Copy link
Contributor

leezu commented Jan 23, 2020

@eric-haibin-lin done

@mli
Copy link
Member

mli commented Jan 23, 2020

Job PR-1091/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/10/index.html

@szhengac
Copy link
Member

Have u tried training the model?

@eric-haibin-lin
Copy link
Member Author

Yes. The one I am running contains the new dataset loader and this change. So far the loss looks normal

src/gluonnlp/model/bert.py Show resolved Hide resolved
src/gluonnlp/model/bert.py Show resolved Hide resolved
src/gluonnlp/model/bert.py Show resolved Hide resolved
@mli
Copy link
Member

mli commented Jan 24, 2020

Job PR-1091/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/11/index.html

@szhengac szhengac merged commit e88d55e into dmlc:master Jan 24, 2020
leezu added a commit to leezu/gluon-nlp that referenced this pull request Jan 27, 2020
leezu added a commit to leezu/gluon-nlp that referenced this pull request Jan 27, 2020
leezu added a commit that referenced this pull request Jan 27, 2020
eric-haibin-lin added a commit to eric-haibin-lin/gluon-nlp that referenced this pull request Feb 1, 2020
…1091)

* use softmax with length, and interleaved matmul

* push backward compatibility fix

* fix failing unittests for output_all_encodings, and valid-len=None

* fix lint

* Update bert.py

* amp patch

* Update MXNet 1.6 pre-release version tested on CI

* Update bert.py

Co-authored-by: Leonard Lausen <[email protected]>
eric-haibin-lin added a commit to eric-haibin-lin/gluon-nlp that referenced this pull request Feb 2, 2020
* [API] use softmax with length, and interleaved matmul for BERT (dmlc#1091)

* use softmax with length, and interleaved matmul

* push backward compatibility fix

* fix failing unittests for output_all_encodings, and valid-len=None

* fix lint

* Update bert.py

* amp patch

* Update MXNet 1.6 pre-release version tested on CI

* Update bert.py

Co-authored-by: Leonard Lausen <[email protected]>

* Add fused attn and softmax

* remove amp patch

* add test

* test for checkponts

* Update files.py

* py3.5 compatibility

Co-authored-by: Leonard Lausen <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release focus Progress focus for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants