Generalized translator and inference #429

msperber · 2018-06-18T10:00:04Z

The current design of translator, decoder, and inference classes is quite specific to RNN-based attentional encoder-decoder models which makes it difficult to re-use much of the code when implementing alternatives. One place this problem becomes evident is the current diverge between transformer- and RNN-based models.

This PR does some preliminary refactoring, but is mainly meant to initiate discussions about how to properly design the interfaces.

Description of changes so far:

improved interfaces: add a Inference base class, add some missing content to Decoder base class, add type annotations in some places
rename SimpleInference to SequenceInference
add a sequence labeler and classifier as two simple examples of models that need a different inference strategy and might be instructive for designing better interfaces.
some other clean up in related code

Things to do:

better design of inference classes: the current PR implement ClassifierInference and SequenceInference which both derive from the Inference base class. A few problems I noticed so far are that (1) SequenceInference assumes use of a search strategy which does not make sense for non-autoregressive models (e.g. the sequence labeler) and (2) having a separate ClassifierInference is probably reasonable, but some things like forced decoding apply here as well and should be reflected in the base class.
MLELoss: this could probably be generalized to handle non-attentional and potentially non-autoregressive models; otherwise should probably be renamed to communicate its intended use.
better design of MLP class: it's doing some checks of its place in the component hierarchy that don't generalize well
Clearer interface for translators
new sequence labeler and classifier models to verify appropriateness of new interfaces
separated softmax and projection (Separated softmax and projection #440 )
fix example configs
add entries to API doc

…; some refactoring to outputs and SequenceInference

neubig · 2018-06-18T11:57:25Z

Thanks! I'm happy to have the code separated better. A few thoughts:

SequenceInference might be AutoRegressiveInference?
It might be a good idea to either try implementing a sequence labeler (just BiLSTM, we could do something like CRF later) or a self-attentional decoder to try to make sure the interface works for these. This could either be during this PR, or after.
Agree about MLELoss.

…utoRegressiveMLELoss

msperber · 2018-06-19T16:27:13Z

This would be ready for review. All in all, I fortunately didn’t have to do any major changes, most changes are renamings, making method signatures more consistent, and a bit of rearranged code.

Translator / top-level models: Translator is renamed to AutoRegressiveTranslator because it basically adds features on top of GeneratorModel to allow auto-regressive training and inference. Non-autoregressive models should be derived from GeneratorModel directly. I added a new calc_loss_one_step() that is the training-time equivalent of generate_one_step() (formerly output_one_step()). From the docstring: “The core methods are calc_loss / calc_loss_one_step and generate / generate_one_step. The former are used during training, the latter for inference. During training, a loss calculator is used to calculate sequence loss by repeatedly calling the loss for one step. Similarly during inference, a search strategy is used to generate an output sequence by repeatedly calling generate_one_step.”
loss calculators: MLELoss included some code that made assumptions on the translator being a attentional enc-dec model. These parts of the code are moved inside the translator’s calc_loss_one_step now, so that MLELoss is appropriate for any auto-regressive model training. I renamed it to AutoRegressiveMLELoss as a consequence.
inference: we now have two basic inference classes, AutoRegressiveInference
and IndependentOutputInference where the latter is used for any form of inference that does not need a search strategy (let me know if you have a better name).
the new SequenceClassifier and SequenceLabeler classes both work with IndependentOutputInference and are derived from GeneratorModel, as both are non-autoregressive.
MLP is now divided into MLP / OutputMLP / AttentionalOutputMLP which avoids having to check for the yaml path, allowing the MLP to be more naturally used in the classifier and labeler models

One final question would be whether DefaultTranslator should be renamed to something more descriptive.

…eneralize-translator

neubig · 2018-06-25T13:16:13Z

@msperber Are you currently looking at my comment about separating the MLP and softmax classes? If not, I can take a look.

msperber · 2018-06-25T13:34:06Z

@neubig No, I'm not working on this currently, so feel free to go ahead.

…into generalize-translator

* Started separating out softmax * Started fixing tests * Fixed more tests * Fixed remainder of running tests * Fixed the rest of tests * Added AuxNonLinear * Updated examples (many were already broken?) * Fixed recipes * Removed MLP class * Added some doc * fix problem when calling a super constructor that is wrapped in serialized_init * Added some doc * fix / clean up sequence labeler * fix using scorer * document how to run test configs directly

msperber · 2018-06-26T20:15:06Z

Alright, this should be ready.

neubig · 2018-06-27T02:57:09Z

I made a few changes and LGTM! But quite bizarrely, it seems that tests are now not passing. The only changes I made should have no possibility of causing this effect (just deleted an unused yaml file and changed some documentation), so maybe it's due to a difference in the Travis environment?

neubig · 2018-06-27T02:57:32Z

P.S. also, tests are passing on my machine.

msperber · 2018-06-27T09:13:41Z

It seems that this is due to an upgrade in pyyaml that introduced some major changes related to making loading safe which I don't fully understand and don't seem to be well documented. I'm downgrading to the previous version for now.

philip30 · 2018-07-07T02:05:26Z

Hey @msperber can you also refactor my code (LexiconDecoder)? I don't have time to read the whole refactoring and my code does not work anymore. I think simply just because you are not using it, you can't just remove it from master branch.

If you have any question about it, you can always ask me.

Thank you.

neubig · 2018-07-07T02:14:07Z

@philip30 I did this actually. The reason why is because it shouldn't be a decoder, but rather a softmax. I think it should be implemented from scratch. Perhaps we should make an issue though.

philip30 · 2018-07-07T02:18:56Z

@neubig Alright, I'll make an issue!

msperber added 3 commits June 18, 2018 10:26

decoder interface and type annotations

Verified

This commit was signed with the committer’s verified signature. The key has expired.

chfast Paweł Bylica

GPG key ID: 7A0C037434FE77EF
Expired

Verified
Learn about vigilant mode

6837f35

add inference base clase; rename SimpleInference to SequenceInference…

ffe509a

…; some refactoring to outputs and SequenceInference

add sequence labeler and classifier

fd9914f

msperber added 16 commits June 18, 2018 16:41

move output processing from generator models to inference

d650668

rename SequenceInference to AutoRegressiveInference

9656699

improve doc for inference

ecaecd5

fix some serialization issues

37f5bc7

remove some model-specific code

0db517b

Merge branch 'master' into generalize-translator

fa2eff1

cleanup: unused inference src_mask, inconsistent input reader read_sent

662dff6

rename ClassifierInference to IndependentOutputInference

f038d15

update generate, use independent inference for seq labeler

111a7fe

fix warning

2ae5309

refactor MLE loss to be agnostic of translator internals, rename to A…

efebf02

…utoRegressiveMLELoss

fix single-quote docstring warning

701da91

fix further warnings

d8af0f8

some renaming plus doc updates for translator

4d06fd4

refactor MLP class

383de32

un-comment sentencepiece import

84c2b41

msperber changed the title ~~[WIP] Generalized translator and inference~~ Generalized translator and inference Jun 19, 2018

Merge branch 'master' into generalize-translator

980856f

msperber requested a review from neubig June 20, 2018 07:02

msperber added 6 commits June 20, 2018 14:08

avoid code duplication between calc_loss and generate methods

45734b1

share code between inference classes

2689ad5

IndependentOutputInference supports forced decoding etc

fdfee49

small cleanup

3257419

support forced decoding and fix batch loss for sequence labeler

f47f819

forced decoding for classifier

9f137dd

msperber and others added 3 commits June 25, 2018 13:26

made parameters of generate() clearer + other minor doc fixes

fcf112d

Added type annotation for transducers

8d8b7ab

Merge branch 'generalize-translator' of github.com:neulab/xnmt into g…

0625895

…eneralize-translator

msperber mentioned this pull request Jun 25, 2018

[WIP] Support for multiple inputs #442

Closed

msperber and others added 7 commits June 26, 2018 13:27

clean up output interface

ef93569

Merge branch 'generalize-translator' of https://github.com/neulab/xnmt …

250a361

…into generalize-translator

some fixes related to reporting

2420ab3

fix unit tests

3f84b1f

fix examples

3a21a9d

update api doc

3e6f875

neubig added 2 commits June 26, 2018 21:44

Removed extraneous yaml file

54fb41d

Update to doc

cb2a09e

msperber added 6 commits June 27, 2018 08:41

attempt to fix travis

704ccd5

represent command line args as normal dictionary

acd8dda

temporarily disable travis cache

b3f1620

undo previous commit

6f693c2

fix serialization problem

1069ae7

downgrade pyyaml

031e33b

msperber merged commit e3c7656 into master Jun 27, 2018

neubig deleted the generalize-translator branch July 30, 2018 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized translator and inference #429

Generalized translator and inference #429

msperber commented Jun 18, 2018 •

edited

Loading

neubig commented Jun 18, 2018

msperber commented Jun 19, 2018

neubig commented Jun 25, 2018

msperber commented Jun 25, 2018

msperber commented Jun 26, 2018

neubig commented Jun 27, 2018

neubig commented Jun 27, 2018

msperber commented Jun 27, 2018

philip30 commented Jul 7, 2018

neubig commented Jul 7, 2018

philip30 commented Jul 7, 2018

Generalized translator and inference #429

Generalized translator and inference #429

Conversation

msperber commented Jun 18, 2018 • edited Loading

neubig commented Jun 18, 2018

msperber commented Jun 19, 2018

neubig commented Jun 25, 2018

msperber commented Jun 25, 2018

msperber commented Jun 26, 2018

neubig commented Jun 27, 2018

neubig commented Jun 27, 2018

msperber commented Jun 27, 2018

philip30 commented Jul 7, 2018

neubig commented Jul 7, 2018

philip30 commented Jul 7, 2018

msperber commented Jun 18, 2018 •

edited

Loading