Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/transformers/configuration_encoder_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ class EncoderDecoderConfig(PretrainedConfig):
>>> model = EncoderDecoderModel.from_pretrained('my-model', config=encoder_decoder_config)
"""
model_type = "encoder_decoder"
is_composition = True

def __init__(self, **kwargs):
super().__init__(**kwargs)
Expand Down
6 changes: 3 additions & 3 deletions src/transformers/configuration_fsmt.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ class FSMTConfig(PretrainedConfig):
# update the defaults from config file
def __init__(
self,
langs,
src_vocab_size,
tgt_vocab_size,
langs=["en", "de"],
Copy link
Contributor Author

@patrickvonplaten patrickvonplaten Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for consistency with other configs in the library, it should be possible to instantiate every config (which is not a "composition" config) without providing any parameters:

config = self.config_cls()

I added these init params from: https://huggingface.co/facebook/wmt19-en-de (cc @stas00)

Copy link
Contributor

@stas00 stas00 Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, in such case let's use langs=["xx", "yy"] so it's clear it's non-sense. Using meaningful data here is misleading at best.

Copy link
Contributor Author

@patrickvonplaten patrickvonplaten Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's the actual configuration_fsmt.py file no? All other configuration files use the actual params of one of the main models as default. E.g. BertConfig() uses the params of bert-base-cased as defaults. I don't really see why this is misleading... @sgugger @LysandreJik @sshleifer what is your opinion here?

src_vocab_size=42024,
tgt_vocab_size=42024,
Copy link
Contributor

@stas00 stas00 Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested vocab size defaults make no sense w/o the corresponding vocab. Just as well set them to 1000 if defaults are now required. I remember some common tests fail if vocab size is less than 1000, so that is probably a good default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok great! Thanks for the tip :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, actually I don't really see why 42024 makes no sense - could you explain a bit?

Copy link
Contributor

@stas00 stas00 Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a set of defaults for the sake of defaults, or are we configuring a very specific model as a default that is actually a working model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using some real model as a default, then yes, you want the actual numbers. If these are just random numbers, then why would you want to set it to 42024?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @patrickvonplaten has the right intentions following what is established across the repo. In every configuration file (e.g., BERT, ALBERT, CTRL), we have defaults for configurations so that initializing one without arguments yields a sensible architecture based on existing pre-trained weights. If it is not the case for a model, then it slipped the review and should be updated.

This means that doing: BertModel(BertConfig()) yields a model similar to what the original BERT model is. This makes it easier to work with as we don't have a myriad of (usually different) arguments to supply to each configuration when doing tests.

Also, this is the convention we have adopted until now and I see no strong argument against it that would lead to changing this convention. I also see no argument on why FSMT would need to differ in that regard.

@stas00:

Because the configuration process for fsmt was complex, using some defaults originally masked problems, so by not having defaults it was detecting problems of not getting the right config immediately. I'm concerned that adding defaults that look about right all kinds of unexpected problems may arise.

I would say that if some problems were masked by using some defaults, then some additional tests should be added to ensure that these tests are not an issue, either for this model or for others.

Copy link
Contributor

@stas00 stas00 Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the convention we have adopted until now and I see no strong argument against it that would lead to changing this convention

If you're saying that this convention practically works for this project, then it is.

some additional tests should be added

Yes, I have just done that here #7860

Copy link
Contributor

@sshleifer sshleifer Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LysandreJik The convention doesn't make as much sense for situations where you have many checkpoints trained from scratch for different datasets, like FSMT/Marian. They all have different vocab sizes and none is meaningfully the "base" model in a bert-base-cased way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, there are two completely different blenderbot checkpoints and I arbitrarily chose the smaller to be the config defaults.

Copy link
Contributor

@stas00 stas00 Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned twice earlier this is the way of the project, and so it is.

So I will just address @patrickvonplaten comment since i think I know why we are missing each other here:

I guess for me it's quite natural that if you do:

config = BertConfig()

you would expect to get the most commonly used / standard BERT config being bert-base-cased, which is useful information IMO.

I totally agree! It works for Bert and other similar models. It doesn't work for translation models, IMHO.

What would a user do with the default German to English translation model with 40K vocab of non-existing vocab - the model the user will end up with will not be functional - yes, they have to configure it if they don't use a pretrained model. There is no magical way around it.

The key here is that certain models do not have sensible defaults and when this is the case giving a default that looks very much like correct default just for the consistency sake is questionable engineering-wise.

It'd work for automatic tests as in ThomWolf's recent PR as long as you don't try to do any qualitative tests, but it won't do anything useful for end users.

activation_function="relu",
d_model=1024,
max_length=200,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/configuration_rag.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
@add_start_docstrings(RAG_CONFIG_DOC)
class RagConfig(PretrainedConfig):
model_type = "rag"
is_composition = True

def __init__(
self,
Expand Down
14 changes: 13 additions & 1 deletion src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ class PretrainedConfig(object):
Class attributes (overridden by derived classes)
- **model_type** (:obj:`str`): An identifier for the model type, serialized into the JSON file, and used to
recreate the correct object in :class:`~transformers.AutoConfig`.
- **is_composition** (:obj:`bool`): Whether the config class is composed of multiple
sub-configs. In this case the config has to be initialized from two or more configs of
type :class:`~transformers.PretrainedConfig` like: :class:`~transformers.EncoderDecoderConfig` or
:class:`~RagConfig`.

Args:
name_or_path (:obj:`str`, `optional`, defaults to :obj:`""`):
Expand Down Expand Up @@ -145,6 +149,7 @@ class PretrainedConfig(object):
use BFloat16 scalars (only used by some TensorFlow models).
"""
model_type: str = ""
is_composition: bool = False

def __init__(self, **kwargs):
# Attributes with defaults
Expand Down Expand Up @@ -476,11 +481,18 @@ def to_diff_dict(self) -> Dict[str, Any]:
# get the default config dict
default_config_dict = PretrainedConfig().to_dict()

# get class specific config dict
class_config_dict = self.__class__().to_dict() if not self.is_composition else {}

serializable_config_dict = {}

# only serialize values that differ from the default config
for key, value in config_dict.items():
if key not in default_config_dict or value != default_config_dict[key]:
if (
key not in default_config_dict
or value != default_config_dict[key]
or (key in class_config_dict and value != class_config_dict[key])
):
serializable_config_dict[key] = value

return serializable_config_dict
Expand Down
7 changes: 7 additions & 0 deletions tests/test_configuration_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,16 @@ def create_and_test_config_with_num_labels(self):
self.parent.assertEqual(len(config.id2label), 3)
self.parent.assertEqual(len(config.label2id), 3)

def check_config_can_be_init_without_params(self):
if self.config_class.is_composition:
return
config = self.config_class()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure that every config can be instantiated without providing any parameters.

self.parent.assertIsNotNone(config)

def run_common_tests(self):
self.create_and_test_config_common_properties()
self.create_and_test_config_to_json_string()
self.create_and_test_config_to_json_file()
self.create_and_test_config_from_and_save_pretrained()
self.create_and_test_config_with_num_labels()
self.check_config_can_be_init_without_params()
9 changes: 9 additions & 0 deletions tests/test_modeling_prophetnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -901,6 +901,15 @@ def test_attn_mask_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.check_model_with_attn_mask(*config_and_inputs)

def test_config_save(self):
config = self.model_tester.prepare_config_and_inputs()[0]
config.add_cross_attention = False
with tempfile.TemporaryDirectory() as tmp_dirname:
config.save_pretrained(tmp_dirname)
config = ProphetNetConfig.from_pretrained(tmp_dirname)

self.assertFalse(config.add_cross_attention)

@unittest.skipIf(torch_device == "cpu", "Cant do half precision")
def test_fp16_forward(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
Expand Down