Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/transformers/models/auto/feature_extraction_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@
("swinv2", "ViTFeatureExtractor"),
("table-transformer", "DetrFeatureExtractor"),
("timesformer", "VideoMAEFeatureExtractor"),
("tvlt", "TvltFeatureExtractor"),
("unispeech", "Wav2Vec2FeatureExtractor"),
("unispeech-sat", "Wav2Vec2FeatureExtractor"),
("van", "ConvNextFeatureExtractor"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@
("swinv2", "ViTImageProcessor"),
("table-transformer", "DetrImageProcessor"),
("timesformer", "VideoMAEImageProcessor"),
("tvlt", "TvltImageProcessor"),
("upernet", "SegformerImageProcessor"),
("van", "ConvNextImageProcessor"),
("videomae", "VideoMAEImageProcessor"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
("speech_to_text_2", "Speech2Text2Processor"),
("speecht5", "SpeechT5Processor"),
("trocr", "TrOCRProcessor"),
("tvlt", "TvltProcessor"),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add the missing entries in mappings.

("unispeech", "Wav2Vec2Processor"),
("unispeech-sat", "Wav2Vec2Processor"),
("vilt", "ViltProcessor"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ class GPTSanJapaneseConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`GPTSanJapaneseModel`]. It is used to instantiate
a GPTSANJapanese model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the GPTSANJapanese
[tanreinama/GPTSAN-2.8B-spout_is_uniform](https://huggingface.co/tanreinama/GPTSAN-2.8B-spout_is_uniform)
architecture.
[Tanrei/GPTSAN-japanese](https://huggingface.co/Tanrei/GPTSAN-japanese) architecture.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous one was wrong name/link.


Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ class TimesformerConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`TimesformerModel`]. It is used to instantiate a
TimeSformer model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the TimeSformer
[facebook/timesformer](https://huggingface.co/facebook/timesformer-base-finetuned-k600) architecture.
[facebook/timesformer-base-finetuned-k600](https://huggingface.co/facebook/timesformer-base-finetuned-k600)
architecture.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous one was wrong name/link.


Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/tvlt/configuration_tvlt.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class TvltConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`TvltModel`]. It is used to instantiate a TVLT
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the TVLT
[TVLT/tvlt-base](https://huggingface.co/ZinengTang/tvlt-base) architecture.
[ZinengTang/tvlt-base](https://huggingface.co/ZinengTang/tvlt-base) architecture.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous one was wrong name/link.


Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/models/xmod/configuration_xmod.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ class XmodConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`XmodModel`]. It is used to instantiate an X-MOD
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the [xmod-base](https://huggingface.co/facebook/xmod-base)
architecture.
defaults will yield a similar configuration to that of the
[facebook/xmod-base](https://huggingface.co/facebook/xmod-base) architecture.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous one was wrong name/link.


Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Expand Down
17 changes: 13 additions & 4 deletions tests/models/oneformer/test_modeling_oneformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ def __init__(
parent,
batch_size=2,
is_training=True,
vocab_size=99,
use_auxiliary_loss=False,
num_queries=10,
num_channels=3,
Expand All @@ -69,6 +70,7 @@ def __init__(
self.parent = parent
self.batch_size = batch_size
self.is_training = is_training
self.vocab_size = vocab_size
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to accept vocab_size: this model uses CLIPTokenizer, and the checkpoint on the Hub have tokenizer files. In this case, the creation script will call model tester in order to get a smaller vocab size to use.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add vocab_size is also better: we can avoid larger vocab_size like 49408 in the testing.

self.use_auxiliary_loss = use_auxiliary_loss
self.num_queries = num_queries
self.num_channels = num_channels
Expand All @@ -84,12 +86,16 @@ def prepare_config_and_inputs(self):
torch_device
)

task_inputs = torch.randint(high=49408, size=(self.batch_size, self.sequence_length)).to(torch_device).long()
task_inputs = (
torch.randint(high=self.vocab_size, size=(self.batch_size, self.sequence_length)).to(torch_device).long()
)

pixel_mask = torch.ones([self.batch_size, self.min_size, self.max_size], device=torch_device)

text_inputs = (
torch.randint(high=49408, size=(self.batch_size, self.num_queries - self.n_ctx, self.sequence_length))
torch.randint(
high=self.vocab_size, size=(self.batch_size, self.num_queries - self.n_ctx, self.sequence_length)
)
.to(torch_device)
.long()
)
Expand All @@ -104,6 +110,7 @@ def prepare_config_and_inputs(self):

def get_config(self):
config = OneFormerConfig(
text_encoder_vocab_size=self.vocab_size,
hidden_size=self.hidden_dim,
)

Expand Down Expand Up @@ -303,8 +310,10 @@ def test_model_with_labels(self):
size = (self.model_tester.min_size,) * 2
inputs = {
"pixel_values": torch.randn((2, 3, *size), device=torch_device),
"task_inputs": torch.randint(high=49408, size=(2, 77), device=torch_device).long(),
"text_inputs": torch.randint(high=49408, size=(2, 134, 77), device=torch_device).long(),
"task_inputs": torch.randint(high=self.model_tester.vocab_size, size=(2, 77), device=torch_device).long(),
"text_inputs": torch.randint(
high=self.model_tester.vocab_size, size=(2, 134, 77), device=torch_device
).long(),
"mask_labels": torch.randn((2, 150, *size), device=torch_device),
"class_labels": torch.zeros(2, 150, device=torch_device).long(),
}
Expand Down
3 changes: 3 additions & 0 deletions tests/models/speecht5/test_modeling_speecht5.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ def __init__(
batch_size=13,
seq_length=7,
is_training=False,
vocab_size=81,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above (for OneFormerModelTester), and always better to have it.

hidden_size=24,
num_hidden_layers=4,
num_attention_heads=2,
Expand All @@ -112,6 +113,7 @@ def __init__(
self.batch_size = batch_size
self.seq_length = seq_length
self.is_training = is_training
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
Expand Down Expand Up @@ -140,6 +142,7 @@ def prepare_config_and_inputs_for_common(self):

def get_config(self):
return SpeechT5Config(
vocab_size=self.vocab_size,
hidden_size=self.hidden_size,
encoder_layers=self.num_hidden_layers,
decoder_layers=self.num_hidden_layers,
Expand Down
10 changes: 6 additions & 4 deletions utils/check_config_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,12 @@ def get_checkpoint_from_config_class(config_class):
config_source = inspect.getsource(config_class)
checkpoints = _re_checkpoint.findall(config_source)

for checkpoint in checkpoints:
# Each `checkpoint` is a tuple of a checkpoint name and a checkpoint link.
# For example, `('bert-base-uncased', 'https://huggingface.co/bert-base-uncased')`
ckpt_name, ckpt_link = checkpoint
# Each `checkpoint` is a tuple of a checkpoint name and a checkpoint link.
# For example, `('bert-base-uncased', 'https://huggingface.co/bert-base-uncased')`
for ckpt_name, ckpt_link in checkpoints:
# allow the link to end with `/`
if ckpt_link.endswith("/"):
ckpt_link = ckpt_link[:-1]
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow link with ending / work, like

https://huggingface.co/BridgeTower/bridgetower-base/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, previously, checkpoint is used for looping, which is not good: a few lines below this place, we have

        if ckpt_link == ckpt_link_from_name:
            checkpoint = ckpt_name
            break

We only want to return checkpoint under this verified condition, otherwise, it should return None.


# verify the checkpoint name corresponds to the checkpoint link
ckpt_link_from_name = f"https://huggingface.co/{ckpt_name}"
Expand Down
5 changes: 5 additions & 0 deletions utils/create_dummy_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -782,6 +782,11 @@ def get_config_overrides(config_class, processors):
# CLIP-like models have `text_model_tester` and `vision_model_tester`, and we need to pass `vocab_size` to
# `text_model_tester` via `text_kwargs`. The same trick is also necessary for `Flava`.
if config_class.__name__ in [
"AlignConfig",
"AltCLIPConfig",
"ChineseCLIPConfig",
"CLIPSegConfig",
"ClapConfig",
"CLIPConfig",
"GroupViTConfig",
"OwlViTConfig",
Expand Down