[FEATURE] [WIP] Use softmax with length in attention cells #910

ptrendx · 2019-08-29T20:15:23Z

Description

MXNet added support for softmax with length parameter (apache/mxnet#15169) and this PR attempts to use it in attention cells. Work done by me and @blchu.

@eric-haibin-lin Could you help in making sure this works for all models (we tested just BERT and Transformer decoder)?

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

codecov · 2019-08-29T20:15:38Z

Codecov Report

❗ No coverage uploaded for pull request head (pr_softmax_with_length@c7ae5b4). Click here to learn what that means.
The diff coverage is n/a.

codecov · 2019-08-29T20:15:38Z

Codecov Report

Merging #910 into master will decrease coverage by 61.26%.
The diff coverage is 7.69%.

@@             Coverage Diff             @@
##           master     #910       +/-   ##
===========================================
- Coverage   90.48%   29.21%   -61.27%     
===========================================
  Files          66       66               
  Lines        6400     6380       -20     
===========================================
- Hits         5791     1864     -3927     
- Misses        609     4516     +3907

Impacted Files	Coverage Δ
src/gluonnlp/model/transformer.py	`14.5% <0%> (-76.71%)`	⬇️
src/gluonnlp/model/attention_cell.py	`21.64% <25%> (-72.99%)`	⬇️
src/gluonnlp/model/bilm_encoder.py	`15.25% <0%> (-84.75%)`	⬇️
src/gluonnlp/model/train/language_model.py	`16.47% <0%> (-80.69%)`	⬇️
src/gluonnlp/data/batchify/embedding.py	`17.96% <0%> (-79.69%)`	⬇️
src/gluonnlp/model/sequence_sampler.py	`12.11% <0%> (-79.59%)`	⬇️
src/gluonnlp/data/sampler.py	`18.59% <0%> (-77.89%)`	⬇️
src/gluonnlp/data/dataset.py	`22.22% <0%> (-76.99%)`	⬇️
src/gluonnlp/model/lstmpcellwithclip.py	`23.07% <0%> (-76.93%)`	⬇️
src/gluonnlp/model/language_model.py	`23.25% <0%> (-76.75%)`	⬇️
... and 48 more

eric-haibin-lin · 2020-01-24T20:04:09Z

moved to #1091 for BERT

ptrendx and others added 2 commits August 29, 2019 13:07

Use softmax with length

764d8db

Fix transformer decode for softmax length mask

c7ae5b4

ptrendx requested a review from szha as a code owner August 29, 2019 20:15

ptrendx added 3 commits August 29, 2019 13:36

Fix lint

00a97d2

Fix more lint

8675077

Fix TransformerXL

d473c98

ptrendx requested a review from a team as a code owner September 27, 2019 17:22

eric-haibin-lin self-assigned this Sep 30, 2019

leezu mentioned this pull request Nov 14, 2019

[Refactor]Add a switch for attention to return an unnormalized weight matrix. Move _get_attention_cell function position #1007

Open

6 tasks

eric-haibin-lin closed this Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] [WIP] Use softmax with length in attention cells #910

[FEATURE] [WIP] Use softmax with length in attention cells #910

ptrendx commented Aug 29, 2019

codecov bot commented Aug 29, 2019

codecov bot commented Aug 29, 2019 •

edited

Loading

eric-haibin-lin commented Jan 24, 2020

[FEATURE] [WIP] Use softmax with length in attention cells #910

[FEATURE] [WIP] Use softmax with length in attention cells #910

Conversation

ptrendx commented Aug 29, 2019

Description

Checklist

Essentials

codecov bot commented Aug 29, 2019

Codecov Report

codecov bot commented Aug 29, 2019 • edited Loading

Codecov Report

eric-haibin-lin commented Jan 24, 2020

codecov bot commented Aug 29, 2019 •

edited

Loading