[PretrainedConfig] Fix save pretrained config for edge case #7943

patrickvonplaten · 2020-10-21T11:17:30Z

What does this PR do?

There is an edge case for which the "diff" save method for PretrainedConfig fails. We decided a while ago in this PR: #3797 that we wanted to have more readable configs and thus tweaked the save_pretrained() method so that only parameters that are different to the default PretrainedConfig class are serialized.

There was an edge case we did not consider:

If a parameter, like add_cross_attention defaults to True in ProphetNetConfig, but is by default False in PretrainedConfig a problem can arise when a user wants to save add_cross_attention=False in his ProphetNetConfig. Because add_cross_attention=False corresponds to the PretrainedConfig default case, this parameter will not be serialized and thus when reloading the config, the parameter defaults to ProphetNetConfig which is True and which is then an error.

This PR fixes this behavior by simply making sure that a parameter is only not saved if it is equal to both PretrainedConfig and ProphetNetConfig.

This feature requires configs to be instantiated without providing any parameters. This is currently not possible for the EncoderDecoderModelConfig and RagConfig because those configs are composed of multiple sub-configs which have to be provided. => A new class attribute is_composition is added to correctly handle these classes.

Two tests are added.

Also cc @stas00 for FSTM config.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to the it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

…into fix_config

sgugger · 2020-10-21T12:10:28Z

Any reason not to look at just the config class? At a first glance, I'd say we want to compare the defaults to the class we instantiated, not to the superclass PretrainedConfig.

patrickvonplaten · 2020-10-21T12:51:20Z

Any reason not to look at just the config class? At a first glance, I'd say we want to compare the defaults to the class we instantiated, not to the superclass PretrainedConfig.

Back then this was my initial idea as well - but then the configs could be more or less emtpy if all parameters are the same. This has a couple of disadvantages:

When looking at the config online people cannot see any parameters and would have to look into the code which might be annoying
This would make the configs much more prone to break if the init values of respective classes are changed.

patrickvonplaten · 2020-10-21T12:59:19Z

src/transformers/configuration_fsmt.py

-        langs,
-        src_vocab_size,
-        tgt_vocab_size,
+        langs=["en", "de"],


Mainly for consistency with other configs in the library, it should be possible to instantiate every config (which is not a "composition" config) without providing any parameters:

config = self.config_cls()

I added these init params from: https://huggingface.co/facebook/wmt19-en-de (cc @stas00)

I'd say, in such case let's use langs=["xx", "yy"] so it's clear it's non-sense. Using meaningful data here is misleading at best.

But it's the actual configuration_fsmt.py file no? All other configuration files use the actual params of one of the main models as default. E.g. BertConfig() uses the params of bert-base-cased as defaults. I don't really see why this is misleading... @sgugger @LysandreJik @sshleifer what is your opinion here?

patrickvonplaten · 2020-10-21T13:00:07Z

tests/test_configuration_common.py

+    def check_config_can_be_init_without_params(self):
+        if self.config_class.is_composition:
+            return
+        config = self.config_class()


This makes sure that every config can be instantiated without providing any parameters.

patrickvonplaten · 2020-10-21T13:09:41Z

UPDATE: I had to add a class attribute to the config to make this feature work (see description above) - @julien-c @sgugger @thomwolf @LysandreJik - could you check if this is fine for you guys.

sgugger

Understood, this LGTM then! Thanks!

src/transformers/configuration_utils.py

julien-c · 2020-10-21T13:18:11Z

LGTM

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 · 2020-10-21T16:14:40Z

src/transformers/configuration_fsmt.py

-        tgt_vocab_size,
+        langs=["en", "de"],
+        src_vocab_size=42024,
+        tgt_vocab_size=42024,


The suggested vocab size defaults make no sense w/o the corresponding vocab. Just as well set them to 1000 if defaults are now required. I remember some common tests fail if vocab size is less than 1000, so that is probably a good default.

ok great! Thanks for the tip :-)

Sorry, actually I don't really see why 42024 makes no sense - could you explain a bit?

Is this a set of defaults for the sake of defaults, or are we configuring a very specific model as a default that is actually a working model?

If you're using some real model as a default, then yes, you want the actual numbers. If these are just random numbers, then why would you want to set it to 42024?

I think @patrickvonplaten has the right intentions following what is established across the repo. In every configuration file (e.g., BERT, ALBERT, CTRL), we have defaults for configurations so that initializing one without arguments yields a sensible architecture based on existing pre-trained weights. If it is not the case for a model, then it slipped the review and should be updated.

This means that doing: BertModel(BertConfig()) yields a model similar to what the original BERT model is. This makes it easier to work with as we don't have a myriad of (usually different) arguments to supply to each configuration when doing tests.

Also, this is the convention we have adopted until now and I see no strong argument against it that would lead to changing this convention. I also see no argument on why FSMT would need to differ in that regard.

@stas00:

Because the configuration process for fsmt was complex, using some defaults originally masked problems, so by not having defaults it was detecting problems of not getting the right config immediately. I'm concerned that adding defaults that look about right all kinds of unexpected problems may arise.

I would say that if some problems were masked by using some defaults, then some additional tests should be added to ensure that these tests are not an issue, either for this model or for others.

this is the convention we have adopted until now and I see no strong argument against it that would lead to changing this convention

If you're saying that this convention practically works for this project, then it is.

some additional tests should be added

Yes, I have just done that here #7860

@LysandreJik The convention doesn't make as much sense for situations where you have many checkpoints trained from scratch for different datasets, like FSMT/Marian. They all have different vocab sizes and none is meaningfully the "base" model in a bert-base-cased way.

Similarly, there are two completely different blenderbot checkpoints and I arbitrarily chose the smaller to be the config defaults.

As I mentioned twice earlier this is the way of the project, and so it is.

So I will just address @patrickvonplaten comment since i think I know why we are missing each other here:

I guess for me it's quite natural that if you do:

config = BertConfig()

you would expect to get the most commonly used / standard BERT config being bert-base-cased, which is useful information IMO.

I totally agree! It works for Bert and other similar models. It doesn't work for translation models, IMHO.

What would a user do with the default German to English translation model with 40K vocab of non-existing vocab - the model the user will end up with will not be functional - yes, they have to configure it if they don't use a pretrained model. There is no magical way around it.

The key here is that certain models do not have sensible defaults and when this is the case giving a default that looks very much like correct default just for the consistency sake is questionable engineering-wise.

It'd work for automatic tests as in ThomWolf's recent PR as long as you don't try to do any qualitative tests, but it won't do anything useful for end users.

LysandreJik

LGTM!

patrickvonplaten added 3 commits October 21, 2020 10:31

fix config save

3d73fbf

Merge branch 'master' of https://github.com/huggingface/transformers …

374d27d

…into fix_config

add test

c6f4481

patrickvonplaten requested review from LysandreJik, julien-c and sgugger October 21, 2020 11:24

julien-c approved these changes Oct 21, 2020

View reviewed changes

patrickvonplaten added 2 commits October 21, 2020 12:46

add config class variable and another test

930d167

line break

3bf0c25

fix fsmt and typo

56dd4f2

patrickvonplaten commented Oct 21, 2020

View reviewed changes

god am I making many errors today :-/

8437104

patrickvonplaten requested a review from julien-c October 21, 2020 13:08

patrickvonplaten requested a review from thomwolf October 21, 2020 13:09

sgugger approved these changes Oct 21, 2020

View reviewed changes

src/transformers/configuration_utils.py Outdated Show resolved Hide resolved

Update src/transformers/configuration_utils.py

a3cd0f5

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 reviewed Oct 21, 2020

View reviewed changes

LysandreJik approved these changes Oct 22, 2020

View reviewed changes

patrickvonplaten merged commit f34372a into huggingface:master Oct 22, 2020

zucchini-nlp mentioned this pull request Oct 25, 2024

Load sub-configs from composite configs #34410

Merged

[PretrainedConfig] Fix save pretrained config for edge case #7943

[PretrainedConfig] Fix save pretrained config for edge case #7943

Uh oh!

Conversation

patrickvonplaten commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sgugger commented Oct 21, 2020

Uh oh!

patrickvonplaten commented Oct 21, 2020

Uh oh!

patrickvonplaten Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

julien-c commented Oct 21, 2020

Uh oh!

stas00 Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

stas00 Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

LysandreJik Oct 22, 2020

Choose a reason for hiding this comment

Uh oh!

stas00 Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sshleifer Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sshleifer Oct 22, 2020

Choose a reason for hiding this comment

Uh oh!

stas00 Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

patrickvonplaten commented Oct 21, 2020 •

edited

Loading

patrickvonplaten Oct 21, 2020 •

edited

Loading

stas00 Oct 21, 2020 •

edited

Loading

patrickvonplaten Oct 21, 2020 •

edited

Loading

patrickvonplaten commented Oct 21, 2020 •

edited

Loading

stas00 Oct 21, 2020 •

edited

Loading

stas00 Oct 21, 2020 •

edited

Loading

stas00 Oct 22, 2020 •

edited

Loading

sshleifer Oct 22, 2020 •

edited

Loading

stas00 Oct 22, 2020 •

edited

Loading