[Feature] Add Machine translation estimator in api #1156

liuzh47 · 2020-02-13T13:16:47Z

Description

Implementation of machine translation GNMT and transformer estimator.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

codecov · 2020-02-13T13:16:49Z

Codecov Report

Merging #1156 into master will increase coverage by 3.18%.
The diff coverage is 26.22%.

@@            Coverage Diff             @@
##           master    #1156      +/-   ##
==========================================
+ Coverage   70.58%   73.77%   +3.18%     
==========================================
  Files          72       76       +4     
  Lines        6970     7317     +347     
==========================================
+ Hits         4920     5398     +478     
+ Misses       2050     1919     -131

Impacted Files	Coverage Δ
src/gluonnlp/estimator/__init__.py	`100% <100%> (ø)`
...nlp/estimator/machine_translation_event_handler.py	`22.22% <22.22%> (ø)`
...p/estimator/machine_translation_batch_processor.py	`28.57% <28.57%> (ø)`
...luonnlp/estimator/machine_translation_estimator.py	`54.54% <54.54%> (ø)`
src/gluonnlp/data/translation.py	`26.35% <0%> (-73.65%)`	⬇️
src/gluonnlp/model/train/cache.py	`25.58% <0%> (-72.1%)`	⬇️
src/gluonnlp/model/transformer.py	`31.73% <0%> (-54.81%)`	⬇️
src/gluonnlp/data/batchify/language_model.py	`43.92% <0%> (-52.34%)`	⬇️
src/gluonnlp/model/translation.py	`20.31% <0%> (-51.57%)`	⬇️
src/gluonnlp/embedding/evaluation.py	`40.33% <0%> (-51.27%)`	⬇️
... and 45 more

mli · 2020-02-13T13:57:33Z

Job PR-1156/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/1/index.html

liuzh47 · 2020-02-13T14:49:34Z

src/gluonnlp/estimator/machine_translation_event_handler.py

+        gnorm = gluon.utils.clip_global_norm(grads, self.clip)
+        estimator.trainer.step(1)
+
+class TransformerGradientAccumulationHandler(GradientUpdateHandler,


@eric-haibin-lin

Why does grad acc handler API require batch_size?

It needs batch_size to rescale. See here https://github.com/dmlc/gluon-nlp/blob/master/scripts/machine_translation/train_transformer.py#L320

src/gluonnlp/estimator/machine_translation_batch_processor.py

mli · 2020-02-14T15:55:41Z

Job PR-1156/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/2/index.html

mli · 2020-02-14T16:03:07Z

Job PR-1156/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/3/index.html

mli · 2020-02-14T16:13:37Z

Job PR-1156/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/4/index.html

mli · 2020-02-14T17:51:45Z

Job PR-1156/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/5/index.html

mli · 2020-02-14T17:56:16Z

Job PR-1156/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/6/index.html

mli · 2020-02-17T05:43:30Z

Job PR-1156/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/7/index.html

mli · 2020-02-17T06:55:29Z

Job PR-1156/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1156/8/index.html

chenw23 · 2020-06-19T06:54:35Z

@eric-haibin-lin Do you think this pull request should be fixed on CI and then be merged?

liuzh47 added 18 commits February 13, 2020 12:25

Add machine translation estimator

dce05e1

Add some files to machine translation estimator

4788985

modify machine translation estimator

db30a5f

bug fix for transformer translator

f3f00a6

fix bugs and add gnmt batch processor

ab94b38

add gnmt event handler and script

83ef62d

bug fix

f40e4dc

fix various errors

6dabf23

bug fix

f1e26d1

bug fix

81530e1

fix gnmt estimator bugs

0165b80

fix test data bugs

b4a695c

fix typo

458b6f7

fix gnmt estimator bugs

406acbe

change variable names for the latext mxnet build

02f7819

remove temporary length normalized loss

740c712

update index.rst

feef52e

fix import in gnmt estimator

4e13742

liuzh47 requested a review from a team as a code owner February 13, 2020 13:16

liuzh47 commented Feb 13, 2020

View reviewed changes

szhengac reviewed Feb 13, 2020

View reviewed changes

src/gluonnlp/estimator/machine_translation_batch_processor.py Show resolved Hide resolved

liuzh47 added 3 commits February 14, 2020 15:21

fix pylint errors and update docstrings

bfa8425

fix typo

b6faf29

fix docstring errors

e2dd1bd

fix typo

cc9665c

fix init file lint error

d7b51fa

liuzh47 added 2 commits February 17, 2020 05:03

resolve import lint errors

448934b

refine imports

7faf836

disable pylint errors

eab2ef7

szha changed the base branch from master to v0.x August 13, 2020 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Machine translation estimator in api #1156

[Feature] Add Machine translation estimator in api #1156

liuzh47 commented Feb 13, 2020 •

edited

Loading

codecov bot commented Feb 13, 2020 •

edited

Loading

mli commented Feb 13, 2020

liuzh47 Feb 13, 2020

eric-haibin-lin Feb 17, 2020

liuzh47 Feb 17, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 17, 2020

mli commented Feb 17, 2020

chenw23 commented Jun 19, 2020

[Feature] Add Machine translation estimator in api #1156

Are you sure you want to change the base?

[Feature] Add Machine translation estimator in api #1156

Conversation

liuzh47 commented Feb 13, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Feb 13, 2020 • edited Loading

Codecov Report

mli commented Feb 13, 2020

liuzh47 Feb 13, 2020

Choose a reason for hiding this comment

eric-haibin-lin Feb 17, 2020

Choose a reason for hiding this comment

liuzh47 Feb 17, 2020

Choose a reason for hiding this comment

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 14, 2020

mli commented Feb 17, 2020

mli commented Feb 17, 2020

chenw23 commented Jun 19, 2020

liuzh47 commented Feb 13, 2020 •

edited

Loading

codecov bot commented Feb 13, 2020 •

edited

Loading