Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized translator and inference #429

Merged
merged 49 commits into from
Jun 27, 2018
Merged

Generalized translator and inference #429

merged 49 commits into from
Jun 27, 2018

Conversation

msperber
Copy link
Contributor

@msperber msperber commented Jun 18, 2018

The current design of translator, decoder, and inference classes is quite specific to RNN-based attentional encoder-decoder models which makes it difficult to re-use much of the code when implementing alternatives. One place this problem becomes evident is the current diverge between transformer- and RNN-based models.

This PR does some preliminary refactoring, but is mainly meant to initiate discussions about how to properly design the interfaces.

Description of changes so far:

  • improved interfaces: add a Inference base class, add some missing content to Decoder base class, add type annotations in some places
  • rename SimpleInference to SequenceInference
  • add a sequence labeler and classifier as two simple examples of models that need a different inference strategy and might be instructive for designing better interfaces.
  • some other clean up in related code

Things to do:

  • better design of inference classes: the current PR implement ClassifierInference and SequenceInference which both derive from the Inference base class. A few problems I noticed so far are that (1) SequenceInference assumes use of a search strategy which does not make sense for non-autoregressive models (e.g. the sequence labeler) and (2) having a separate ClassifierInference is probably reasonable, but some things like forced decoding apply here as well and should be reflected in the base class.
  • MLELoss: this could probably be generalized to handle non-attentional and potentially non-autoregressive models; otherwise should probably be renamed to communicate its intended use.
  • better design of MLP class: it's doing some checks of its place in the component hierarchy that don't generalize well
  • Clearer interface for translators
  • new sequence labeler and classifier models to verify appropriateness of new interfaces
  • separated softmax and projection (Separated softmax and projection #440 )
  • fix example configs
  • add entries to API doc

Sorry, something went wrong.

msperber added 3 commits June 18, 2018 10:26

Verified

This commit was signed with the committer’s verified signature. The key has expired.
chfast Paweł Bylica
…; some refactoring to outputs and SequenceInference
@neubig
Copy link
Contributor

neubig commented Jun 18, 2018

Thanks! I'm happy to have the code separated better. A few thoughts:

  • SequenceInference might be AutoRegressiveInference?
  • It might be a good idea to either try implementing a sequence labeler (just BiLSTM, we could do something like CRF later) or a self-attentional decoder to try to make sure the interface works for these. This could either be during this PR, or after.
  • Agree about MLELoss.

@msperber
Copy link
Contributor Author

This would be ready for review. All in all, I fortunately didn’t have to do any major changes, most changes are renamings, making method signatures more consistent, and a bit of rearranged code.

  • Translator / top-level models: Translator is renamed to AutoRegressiveTranslator because it basically adds features on top of GeneratorModel to allow auto-regressive training and inference. Non-autoregressive models should be derived from GeneratorModel directly. I added a new calc_loss_one_step() that is the training-time equivalent of generate_one_step() (formerly output_one_step()). From the docstring: “The core methods are calc_loss / calc_loss_one_step and generate / generate_one_step. The former are used during training, the latter for inference. During training, a loss calculator is used to calculate sequence loss by repeatedly calling the loss for one step. Similarly during inference, a search strategy is used to generate an output sequence by repeatedly calling generate_one_step.”
  • loss calculators: MLELoss included some code that made assumptions on the translator being a attentional enc-dec model. These parts of the code are moved inside the translator’s calc_loss_one_step now, so that MLELoss is appropriate for any auto-regressive model training. I renamed it to AutoRegressiveMLELoss as a consequence.
  • inference: we now have two basic inference classes, AutoRegressiveInference
    and IndependentOutputInference where the latter is used for any form of inference that does not need a search strategy (let me know if you have a better name).
  • the new SequenceClassifier and SequenceLabeler classes both work with IndependentOutputInference and are derived from GeneratorModel, as both are non-autoregressive.
  • MLP is now divided into MLP / OutputMLP / AttentionalOutputMLP which avoids having to check for the yaml path, allowing the MLP to be more naturally used in the classifier and labeler models

One final question would be whether DefaultTranslator should be renamed to something more descriptive.

@msperber msperber changed the title [WIP] Generalized translator and inference Generalized translator and inference Jun 19, 2018
@msperber msperber requested a review from neubig June 20, 2018 07:02
@neubig
Copy link
Contributor

neubig commented Jun 25, 2018

@msperber Are you currently looking at my comment about separating the MLP and softmax classes? If not, I can take a look.

@msperber
Copy link
Contributor Author

@neubig No, I'm not working on this currently, so feel free to go ahead.

msperber and others added 7 commits June 26, 2018 13:27
* Started separating out softmax

* Started fixing tests

* Fixed more tests

* Fixed remainder of running tests

* Fixed the rest of tests

* Added AuxNonLinear

* Updated examples (many were already broken?)

* Fixed recipes

* Removed MLP class

* Added some doc

* fix problem when calling a super constructor that is wrapped in serialized_init

* Added some doc

* fix / clean up sequence labeler

* fix using scorer

* document how to run test configs directly
@msperber
Copy link
Contributor Author

Alright, this should be ready.

@neubig
Copy link
Contributor

neubig commented Jun 27, 2018

I made a few changes and LGTM! But quite bizarrely, it seems that tests are now not passing. The only changes I made should have no possibility of causing this effect (just deleted an unused yaml file and changed some documentation), so maybe it's due to a difference in the Travis environment?

@neubig
Copy link
Contributor

neubig commented Jun 27, 2018

P.S. also, tests are passing on my machine.

@msperber
Copy link
Contributor Author

It seems that this is due to an upgrade in pyyaml that introduced some major changes related to making loading safe which I don't fully understand and don't seem to be well documented. I'm downgrading to the previous version for now.

@msperber msperber merged commit e3c7656 into master Jun 27, 2018
@philip30
Copy link
Contributor

philip30 commented Jul 7, 2018

Hey @msperber can you also refactor my code (LexiconDecoder)? I don't have time to read the whole refactoring and my code does not work anymore. I think simply just because you are not using it, you can't just remove it from master branch.

If you have any question about it, you can always ask me.

Thank you.

@neubig
Copy link
Contributor

neubig commented Jul 7, 2018

@philip30 I did this actually. The reason why is because it shouldn't be a decoder, but rather a softmax. I think it should be implemented from scratch. Perhaps we should make an issue though.

@philip30
Copy link
Contributor

philip30 commented Jul 7, 2018

@neubig Alright, I'll make an issue!

@neubig neubig deleted the generalize-translator branch July 30, 2018 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants