[API] use softmax with length, and interleaved matmul for BERT #1091

eric-haibin-lin · 2020-01-05T04:51:04Z

Description

This PR changes the input layout of BERTEncoder from NTC to TNC, so that we can adopt the fast fused self attention op introduced by @Caenorst and @TaoLv .

For BERTModel API, the input layout remains unchanged. If users obtain the BERT model via the get_model API, they don't need to make any code change to run the optimized version (other than upgrading the gluon-nlp version).

The tests won't pass as we need to wait for MXNet's nightly build since Jan 4th, otherwise the CPU op is missing.

On p3.16xlarge, BERT base, seq_len=512, batch_size=256

latency/batch before this commit: 611ms
latency/batch after this commit: 386ms

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

codecov · 2020-01-05T04:51:07Z

Codecov Report

❗ No coverage uploaded for pull request base (master@f7a2ea1). Click here to learn what that means.
The diff coverage is 81.86%.

@@            Coverage Diff            @@
##             master    #1091   +/-   ##
=========================================
  Coverage          ?   88.77%           
=========================================
  Files             ?       67           
  Lines             ?     6351           
  Branches          ?        0           
=========================================
  Hits              ?     5638           
  Misses            ?      713           
  Partials          ?        0

Impacted Files	Coverage Δ
src/gluonnlp/utils/parameter.py	`87.09% <100%> (ø)`
src/gluonnlp/optimizer/__init__.py	`100% <100%> (ø)`
src/gluonnlp/model/bert.py	`93.85% <100%> (ø)`
src/gluonnlp/metric/__init__.py	`100% <100%> (ø)`
src/gluonnlp/metric/masked_accuracy.py	`100% <100%> (ø)`
src/gluonnlp/utils/version.py	`100% <100%> (ø)`
src/gluonnlp/utils/files.py	`45.9% <18.18%> (ø)`
src/gluonnlp/model/train/language_model.py	`88.51% <25%> (ø)`
src/gluonnlp/optimizer/bert_adam.py	`87.32% <81.57%> (ø)`
src/gluonnlp/metric/length_normalized_loss.py	`89.28% <89.28%> (ø)`
... and 1 more

eric-haibin-lin · 2020-01-06T04:59:19Z

@leezu I had a temporary gluon parameter workaround due to apache/mxnet#17220, by overriding _collect_params_with_prefix of the DotProductionSelfAttentionCell.
will that be fixed with gluon 2.0 design?

leezu · 2020-01-06T22:04:39Z

@eric-haibin-lin yes, apache/mxnet#17220 (comment)

eric-haibin-lin · 2020-01-07T06:45:41Z

@muhyun this helps GPT-2, too

sxjscience · 2020-01-07T06:47:58Z

@eric-haibin-lin Will we achieve similar speed up if we fuse the kernel + using NTC layout?

leezu · 2020-01-15T15:36:08Z

Merged master due to #1096. Should be possible to merge now

mli · 2020-01-15T17:07:08Z

Job PR-1091/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/5/index.html

leezu · 2020-01-15T18:07:20Z

@eric-haibin-lin

[2020-01-15T17:33:28.776Z] mxnet.base.MXNetError: Error in operator bertencoder0_transformer0_dotproductselfattentioncell0_interleaved_matmul_selfatt_valatt0: [17:21:45] src/operator/numpy/linalg/./../../tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node bertencoder0_transformer0_dotproductselfattentioncell0_interleaved_matmul_selfatt_valatt0 at 1-th input: expected float16, got float32

mli · 2020-01-16T22:55:24Z

Job PR-1091/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/7/index.html

mli · 2020-01-22T22:57:54Z

Job PR-1091/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/9/index.html

eric-haibin-lin · 2020-01-23T00:20:37Z

@leezu could you help build the latest mxnet 1.6.0.rc1 for our CI pipeline? thanks!

leezu · 2020-01-23T02:58:19Z

@eric-haibin-lin done

mli · 2020-01-23T03:30:37Z

Job PR-1091/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/10/index.html

szhengac · 2020-01-23T05:08:09Z

Have u tried training the model?

eric-haibin-lin · 2020-01-23T05:15:21Z

Yes. The one I am running contains the new dataset loader and this change. So far the loss looks normal

src/gluonnlp/model/bert.py

mli · 2020-01-24T20:35:00Z

Job PR-1091/11 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1091/11/index.html

dmlc#1091)" This reverts commit e88d55e. Reference dmlc#1127

#1091)" (#1128) This reverts commit e88d55e. Reference #1127

…1091) * use softmax with length, and interleaved matmul * push backward compatibility fix * fix failing unittests for output_all_encodings, and valid-len=None * fix lint * Update bert.py * amp patch * Update MXNet 1.6 pre-release version tested on CI * Update bert.py Co-authored-by: Leonard Lausen <[email protected]>

* [API] use softmax with length, and interleaved matmul for BERT (dmlc#1091) * use softmax with length, and interleaved matmul * push backward compatibility fix * fix failing unittests for output_all_encodings, and valid-len=None * fix lint * Update bert.py * amp patch * Update MXNet 1.6 pre-release version tested on CI * Update bert.py Co-authored-by: Leonard Lausen <[email protected]> * Add fused attn and softmax * remove amp patch * add test * test for checkponts * Update files.py * py3.5 compatibility Co-authored-by: Leonard Lausen <[email protected]>

use softmax with length, and interleaved matmul

6711f93

eric-haibin-lin requested a review from a team as a code owner January 5, 2020 04:51

eric-haibin-lin added the release focus Progress focus for release label Jan 5, 2020

push backward compatibility fix

ae7c5cf

fix failing unittests for output_all_encodings, and valid-len=None

2517e23

eric-haibin-lin changed the title ~~[API][WIP] use softmax with length, and interleaved matmul for BERT~~ [API] use softmax with length, and interleaved matmul for BERT Jan 7, 2020

Merge branch 'master' into fused-attn

1109620

eric-haibin-lin added 2 commits January 16, 2020 13:23

fix lint

df242f9

Update bert.py

417ad83

leezu assigned eric-haibin-lin Jan 22, 2020

Ubuntu and others added 2 commits January 22, 2020 22:23

amp patch

f7a2ea1

Merge branch 'master' into fused-attn

349f944

Update MXNet 1.6 pre-release version tested on CI

84cfed1

szhengac reviewed Jan 23, 2020

View reviewed changes

src/gluonnlp/model/bert.py Show resolved Hide resolved

src/gluonnlp/model/bert.py Show resolved Hide resolved

src/gluonnlp/model/bert.py Show resolved Hide resolved

Update bert.py

0fae193

eric-haibin-lin mentioned this pull request Jan 24, 2020

[FEATURE] [WIP] Use softmax with length in attention cells #910

Closed

4 tasks

szhengac approved these changes Jan 24, 2020

View reviewed changes

szhengac merged commit e88d55e into dmlc:master Jan 24, 2020

xinyu-intel mentioned this pull request Jan 27, 2020

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

Merged

13 tasks

leezu added a commit to leezu/gluon-nlp that referenced this pull request Jan 27, 2020

Revert "[API] use softmax with length, and interleaved matmul for BERT (

ec2a13d

dmlc#1091)" This reverts commit e88d55e. Reference dmlc#1127

leezu added a commit to leezu/gluon-nlp that referenced this pull request Jan 27, 2020

Revert "[API] use softmax with length, and interleaved matmul for BERT (

611dabe

dmlc#1091)" This reverts commit e88d55e. Reference dmlc#1127

leezu added a commit that referenced this pull request Jan 27, 2020

Revert "[API] use softmax with length, and interleaved matmul for BERT (

ab9e353

#1091)" (#1128) This reverts commit e88d55e. Reference #1127

eric-haibin-lin mentioned this pull request Feb 9, 2020

[API] use softmax with length, and interleaved matmul for BERT #1136

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API] use softmax with length, and interleaved matmul for BERT #1091

[API] use softmax with length, and interleaved matmul for BERT #1091

eric-haibin-lin commented Jan 5, 2020 •

edited

Loading

codecov bot commented Jan 5, 2020 •

edited

Loading

eric-haibin-lin commented Jan 6, 2020 •

edited

Loading

leezu commented Jan 6, 2020

eric-haibin-lin commented Jan 7, 2020

sxjscience commented Jan 7, 2020

leezu commented Jan 15, 2020

mli commented Jan 15, 2020

leezu commented Jan 15, 2020 •

edited

Loading

mli commented Jan 16, 2020

mli commented Jan 22, 2020

eric-haibin-lin commented Jan 23, 2020

leezu commented Jan 23, 2020

mli commented Jan 23, 2020

szhengac commented Jan 23, 2020

eric-haibin-lin commented Jan 23, 2020

mli commented Jan 24, 2020

[API] use softmax with length, and interleaved matmul for BERT #1091

[API] use softmax with length, and interleaved matmul for BERT #1091

Conversation

eric-haibin-lin commented Jan 5, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Jan 5, 2020 • edited Loading

Codecov Report

eric-haibin-lin commented Jan 6, 2020 • edited Loading

leezu commented Jan 6, 2020

eric-haibin-lin commented Jan 7, 2020

sxjscience commented Jan 7, 2020

leezu commented Jan 15, 2020

mli commented Jan 15, 2020

leezu commented Jan 15, 2020 • edited Loading

mli commented Jan 16, 2020

mli commented Jan 22, 2020

eric-haibin-lin commented Jan 23, 2020

leezu commented Jan 23, 2020

mli commented Jan 23, 2020

szhengac commented Jan 23, 2020

eric-haibin-lin commented Jan 23, 2020

mli commented Jan 24, 2020

eric-haibin-lin commented Jan 5, 2020 •

edited

Loading

codecov bot commented Jan 5, 2020 •

edited

Loading

eric-haibin-lin commented Jan 6, 2020 •

edited

Loading

leezu commented Jan 15, 2020 •

edited

Loading