-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[PretrainedConfig] Fix save pretrained config for edge case #7943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3d73fbf
374d27d
c6f4481
930d167
3bf0c25
56dd4f2
8437104
a3cd0f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -126,9 +126,9 @@ class FSMTConfig(PretrainedConfig): | |
| # update the defaults from config file | ||
| def __init__( | ||
| self, | ||
| langs, | ||
| src_vocab_size, | ||
| tgt_vocab_size, | ||
| langs=["en", "de"], | ||
| src_vocab_size=42024, | ||
| tgt_vocab_size=42024, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The suggested vocab size defaults make no sense w/o the corresponding vocab. Just as well set them to 1000 if defaults are now required. I remember some common tests fail if vocab size is less than 1000, so that is probably a good default.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok great! Thanks for the tip :-)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, actually I don't really see why 42024 makes no sense - could you explain a bit?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a set of defaults for the sake of defaults, or are we configuring a very specific model as a default that is actually a working model?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're using some real model as a default, then yes, you want the actual numbers. If these are just random numbers, then why would you want to set it to 42024?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @patrickvonplaten has the right intentions following what is established across the repo. In every configuration file (e.g., BERT, ALBERT, CTRL), we have defaults for configurations so that initializing one without arguments yields a sensible architecture based on existing pre-trained weights. If it is not the case for a model, then it slipped the review and should be updated. This means that doing: Also, this is the convention we have adopted until now and I see no strong argument against it that would lead to changing this convention. I also see no argument on why FSMT would need to differ in that regard.
I would say that if some problems were masked by using some defaults, then some additional tests should be added to ensure that these tests are not an issue, either for this model or for others.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If you're saying that this convention practically works for this project, then it is.
Yes, I have just done that here #7860
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @LysandreJik The convention doesn't make as much sense for situations where you have many checkpoints trained from scratch for different datasets, like FSMT/Marian. They all have different vocab sizes and none is meaningfully the "base" model in a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly, there are two completely different blenderbot checkpoints and I arbitrarily chose the smaller to be the config defaults.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned twice earlier this is the way of the project, and so it is. So I will just address @patrickvonplaten comment since i think I know why we are missing each other here:
I totally agree! It works for Bert and other similar models. It doesn't work for translation models, IMHO. What would a user do with the default German to English translation model with 40K vocab of non-existing vocab - the model the user will end up with will not be functional - yes, they have to configure it if they don't use a pretrained model. There is no magical way around it. The key here is that certain models do not have sensible defaults and when this is the case giving a default that looks very much like correct default just for the consistency sake is questionable engineering-wise. It'd work for automatic tests as in ThomWolf's recent PR as long as you don't try to do any qualitative tests, but it won't do anything useful for end users. |
||
| activation_function="relu", | ||
| d_model=1024, | ||
| max_length=200, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,9 +66,16 @@ def create_and_test_config_with_num_labels(self): | |
| self.parent.assertEqual(len(config.id2label), 3) | ||
| self.parent.assertEqual(len(config.label2id), 3) | ||
|
|
||
| def check_config_can_be_init_without_params(self): | ||
| if self.config_class.is_composition: | ||
| return | ||
| config = self.config_class() | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sure that every config can be instantiated without providing any parameters. |
||
| self.parent.assertIsNotNone(config) | ||
|
|
||
| def run_common_tests(self): | ||
| self.create_and_test_config_common_properties() | ||
| self.create_and_test_config_to_json_string() | ||
| self.create_and_test_config_to_json_file() | ||
| self.create_and_test_config_from_and_save_pretrained() | ||
| self.create_and_test_config_with_num_labels() | ||
| self.check_config_can_be_init_without_params() | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly for consistency with other configs in the library, it should be possible to instantiate every config (which is not a "composition" config) without providing any parameters:
I added these init params from: https://huggingface.co/facebook/wmt19-en-de (cc @stas00)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say, in such case let's use
langs=["xx", "yy"]so it's clear it's non-sense. Using meaningful data here is misleading at best.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's the actual
configuration_fsmt.pyfile no? All other configuration files use the actual params of one of the main models as default. E.g.BertConfig()uses the params ofbert-base-casedas defaults. I don't really see why this is misleading... @sgugger @LysandreJik @sshleifer what is your opinion here?