From 3ff7dc98d9cfd6a2f2c125ba470e858fd84d1782 Mon Sep 17 00:00:00 2001 From: Steven Date: Thu, 31 Mar 2022 15:29:00 -0700 Subject: [PATCH 01/34] =?UTF-8?q?=20=F0=9F=93=9D=20add=20image/vision=20cl?= =?UTF-8?q?assification=20and=20asr?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/task_summary.mdx | 155 +++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) diff --git a/docs/source/task_summary.mdx b/docs/source/task_summary.mdx index 95c2d9c201a5..068b376eeb25 100644 --- a/docs/source/task_summary.mdx +++ b/docs/source/task_summary.mdx @@ -967,3 +967,158 @@ Here is an example of doing translation using a model and a tokenizer. The proce We get the same translation as with the pipeline example. + +## Audio classification + +Audio classification assigns a class to an audio signal. The Keyword Spotting dataset from the [SUPERB](https://huggingface.co/datasets/superb) benchmark is an example dataset that can be used for audio classification fine-tuning. This dataset contains ten classes of keywords for classification. If you'd like to fine-tune a model for audio classification, take a look at the [run_audio_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/audio-classification/run_audio_classification.py) script or the how-to guide [here](/tasks/audio_classification). + +The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for audio classification inference: + +```py +>>> from transformers import pipeline + +>>> audio_classifier = pipeline( +... task="audio-classification", model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition" +... ) +>>> audio_classifier("jfk_moon_speech.wav") +[{'label': 'calm', 'score': 0.13856211304664612}, + {'label': 'disgust', 'score': 0.13148026168346405}, + {'label': 'happy', 'score': 0.12635163962841034}, + {'label': 'angry', 'score': 0.12439591437578201}, + {'label': 'fearful', 'score': 0.12404385954141617}] +``` + +The general process for using a model and tokenizer for audio classification is: + +1. Instantiate a tokenizer and a model from the checkpoint name. +2. Process the audio signal to be classified with a feature extractor. +3. Pass the input through the model and take the `argmax` to retrieve the most likely class. +4. Convert the class id to a class name with `id2label` to return an interpretable result. + + + +```py +>>> from transformers import AutoFeatureExtractor, AutoModelForAudioClassification +>>> from datasets import load_dataset +>>> import torch + +>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") +>>> dataset = dataset.sort("id") +>>> sampling_rate = dataset.features["audio"].sampling_rate + +>>> feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/wav2vec2-base-superb-ks") +>>> model = Wav2Vec2ForSequenceClassification.from_pretrained("superb/wav2vec2-base-superb-ks") + +>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") + +>>> with torch.no_grad(): +... logits = model(**inputs).logits + +>>> predicted_class_ids = torch.argmax(logits, dim=-1).item() +>>> predicted_label = model.config.id2label[predicted_class_ids] +>>> predicted_label +``` + + + +## Automatic speech recognition + +Automatic speech recognition transcribes an audio signal to text. The [Common Voice](https://huggingface.co/datasets/common_voice) dataset is an example dataset that can be used for automatic speech recognition fine-tuning. It contains an audio file of a speaker and the corresponding sentence. If you'd like to fine-tune a model for automatic speech recognition, take a look at the [run_speech_recognition_ctc.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or [run_speech_recognition_seq2seq.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py )scripts or the how-to guide [here](/tasks/asr). + +The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for automatic speech recognition inference: + +```py +>>> from transformers import pipeline + +>>> speech_recognizer = pipeline( + task="automatic-speech-recognition", model="facebook/wav2vec2-base-960h" +) +>>> speech_recognizer("jfk_moon_speech.wav") +{'text': "PRESENTETE MISTER VICE PRESIDENT GOVERNOR CONGRESSMEN THOMAS SAN O TE WILAN CONGRESSMAN MILLA MISTER WEBB MSTBELL SCIENIS DISTINGUISHED GUESS AT LADIES AND GENTLEMAN I APPRECIATE TO YOUR PRESIDENT HAVING MADE ME AN HONORARY VISITING PROFESSOR AND I WILL ASSURE YOU THAT MY FIRST LECTURE WILL BE A VERY BRIEF I AM DELIGHTED TO BE HERE AND I'M PARTICULARLY DELIGHTED TO BE HERE ON THIS OCCASION WE MEED AT A COLLEGE NOTED FOR KNOWLEGE IN A CITY NOTED FOR PROGRESS IN A STATE NOTED FOR STRAINTH AN WE STAND IN NEED OF ALL THREE"} +``` + +The general process for using a model and tokenizer for automatic speech recognition is: + +1. Instantiate a tokenizer and a model from the checkpoint name. +2. Process the audio signal and text with a processor. +3. Pass the input through the model and take the `argmax` to retrieve the predicted text. +4. Decode the text with a tokenizer to obtain the transcription. + + + +```py +>>> from transformers import AutoProcessor, AutoModelForCTC +>>> from datasets import load_dataset +>>> import torch + +>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") +>>> dataset = dataset.sort("id") +>>> sampling_rate = dataset.features["audio"].sampling_rate + +>>> processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base-960h") +>>> model = AutoModelForCTC.from_pretrained("facebook/wav2vec2-base-960h") + +>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") +>>> with torch.no_grad(): +... logits = model(**inputs).logits +>>> predicted_ids = torch.argmax(logits, dim=-1) + +>>> transcription = processor.batch_decode(predicted_ids) +>>> transcription[0] +``` + + + +## Image classification + +Like text and audio classification, image classification assigns a class to an image. The [CIFAR-100](https://huggingface.co/datasets/cifar100) dataset is an example dataset that can be used for image classification fine-tuning. It contains an image and the corresponding class. If you'd like to fine-tune a model for image classification, take a look at the [run_image_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/image-classification/run_image_classification.py) script or the how-to guide [here](/tasks/image_classification). + +The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for image classification inference: + +```py +>>> from transformers import pipeline + +>>> vision_classifier = pipeline(task="image-classification") +>>> vision_classifier( +... images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg" +... ) +[{'label': 'lynx, catamount', 'score': 0.4403027892112732}, + {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor', + 'score': 0.03433405980467796}, + {'label': 'snow leopard, ounce, Panthera uncia', + 'score': 0.032148055732250214}, + {'label': 'Egyptian cat', 'score': 0.02353910356760025}, + {'label': 'tiger cat', 'score': 0.023034192621707916}] +``` + +The general process for using a model and tokenizer for image classification is: + +1. Instantiate a tokenizer and a model from the checkpoint name. +2. Process the image to be classified with a feature extractor. +3. Pass the input through the model and take the `argmax` to retrieve the predicted class. +4. Convert the class id to a class name with `id2label` to return an interpretable result. + + + +```py +>>> from transformers import AutoFeatureExtractor, AutoModelForImageClassification +>>> import torch +>>> from datasets import load_dataset + +>>> dataset = load_dataset("huggingface/cats-image") +>>> image = dataset["test"]["image"][0] + +>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224") +>>> model = AutoModelForImageClassification.from_pretrained("google/vit-base-patch16-224") + +>>> inputs = feature_extractor(image, return_tensors="pt") + +>>> with torch.no_grad(): +... logits = model(**inputs).logits + +>>> predicted_label = logits.argmax(-1).item() +>>> print(model.config.id2label[predicted_label]) +Egyptian cat +``` + + \ No newline at end of file From 4e8453a503ad65f3a7df1aad84e861811b19380a Mon Sep 17 00:00:00 2001 From: Steven Date: Thu, 31 Mar 2022 16:01:24 -0700 Subject: [PATCH 02/34] =?UTF-8?q?=20=F0=9F=96=8D=20minor=20formatting=20fi?= =?UTF-8?q?xes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/task_summary.mdx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/source/task_summary.mdx b/docs/source/task_summary.mdx index 068b376eeb25..fd30add50729 100644 --- a/docs/source/task_summary.mdx +++ b/docs/source/task_summary.mdx @@ -970,7 +970,7 @@ We get the same translation as with the pipeline example. ## Audio classification -Audio classification assigns a class to an audio signal. The Keyword Spotting dataset from the [SUPERB](https://huggingface.co/datasets/superb) benchmark is an example dataset that can be used for audio classification fine-tuning. This dataset contains ten classes of keywords for classification. If you'd like to fine-tune a model for audio classification, take a look at the [run_audio_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/audio-classification/run_audio_classification.py) script or the how-to guide [here](/tasks/audio_classification). +Audio classification assigns a class to an audio signal. The Keyword Spotting dataset from the [SUPERB](https://huggingface.co/datasets/superb) benchmark is an example dataset that can be used for audio classification fine-tuning. This dataset contains ten classes of keywords for classification. If you'd like to fine-tune a model for audio classification, take a look at the [run_audio_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/audio-classification/run_audio_classification.py) script or the how-to guide [here](./tasks/audio_classification). The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for audio classification inference: @@ -1006,8 +1006,8 @@ The general process for using a model and tokenizer for audio classification is: >>> dataset = dataset.sort("id") >>> sampling_rate = dataset.features["audio"].sampling_rate ->>> feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/wav2vec2-base-superb-ks") ->>> model = Wav2Vec2ForSequenceClassification.from_pretrained("superb/wav2vec2-base-superb-ks") +>>> feature_extractor = AutoFeatureExtractor.from_pretrained("superb/wav2vec2-base-superb-ks") +>>> model = AutoModelForAudioClassification.from_pretrained("superb/wav2vec2-base-superb-ks") >>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") @@ -1023,7 +1023,7 @@ The general process for using a model and tokenizer for audio classification is: ## Automatic speech recognition -Automatic speech recognition transcribes an audio signal to text. The [Common Voice](https://huggingface.co/datasets/common_voice) dataset is an example dataset that can be used for automatic speech recognition fine-tuning. It contains an audio file of a speaker and the corresponding sentence. If you'd like to fine-tune a model for automatic speech recognition, take a look at the [run_speech_recognition_ctc.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or [run_speech_recognition_seq2seq.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py )scripts or the how-to guide [here](/tasks/asr). +Automatic speech recognition transcribes an audio signal to text. The [Common Voice](https://huggingface.co/datasets/common_voice) dataset is an example dataset that can be used for automatic speech recognition fine-tuning. It contains an audio file of a speaker and the corresponding sentence. If you'd like to fine-tune a model for automatic speech recognition, take a look at the [run_speech_recognition_ctc.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or [run_speech_recognition_seq2seq.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py) scripts or the how-to guide [here](./tasks/asr). The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for automatic speech recognition inference: @@ -1031,8 +1031,8 @@ The following examples demonstrate how to use a [`pipeline`] and a model and tok >>> from transformers import pipeline >>> speech_recognizer = pipeline( - task="automatic-speech-recognition", model="facebook/wav2vec2-base-960h" -) +... task="automatic-speech-recognition", model="facebook/wav2vec2-base-960h" +... ) >>> speech_recognizer("jfk_moon_speech.wav") {'text': "PRESENTETE MISTER VICE PRESIDENT GOVERNOR CONGRESSMEN THOMAS SAN O TE WILAN CONGRESSMAN MILLA MISTER WEBB MSTBELL SCIENIS DISTINGUISHED GUESS AT LADIES AND GENTLEMAN I APPRECIATE TO YOUR PRESIDENT HAVING MADE ME AN HONORARY VISITING PROFESSOR AND I WILL ASSURE YOU THAT MY FIRST LECTURE WILL BE A VERY BRIEF I AM DELIGHTED TO BE HERE AND I'M PARTICULARLY DELIGHTED TO BE HERE ON THIS OCCASION WE MEED AT A COLLEGE NOTED FOR KNOWLEGE IN A CITY NOTED FOR PROGRESS IN A STATE NOTED FOR STRAINTH AN WE STAND IN NEED OF ALL THREE"} ``` @@ -1071,7 +1071,7 @@ The general process for using a model and tokenizer for automatic speech recogni ## Image classification -Like text and audio classification, image classification assigns a class to an image. The [CIFAR-100](https://huggingface.co/datasets/cifar100) dataset is an example dataset that can be used for image classification fine-tuning. It contains an image and the corresponding class. If you'd like to fine-tune a model for image classification, take a look at the [run_image_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/image-classification/run_image_classification.py) script or the how-to guide [here](/tasks/image_classification). +Like text and audio classification, image classification assigns a class to an image. The [CIFAR-100](https://huggingface.co/datasets/cifar100) dataset is an example dataset that can be used for image classification fine-tuning. It contains an image and the corresponding class. If you'd like to fine-tune a model for image classification, take a look at the [run_image_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/image-classification/run_image_classification.py) script or the how-to guide [here](./tasks/image_classification). The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for image classification inference: From 4f80d31dd2ad3b2956729e49ec58c20f33e1c1a1 Mon Sep 17 00:00:00 2001 From: Cathy <815244047@qq.com> Date: Fri, 1 Apr 2022 15:17:31 +0800 Subject: [PATCH 03/34] Fixed a typo in legacy seq2seq_trainer.py (#16531) --- examples/legacy/seq2seq/seq2seq_trainer.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/legacy/seq2seq/seq2seq_trainer.py b/examples/legacy/seq2seq/seq2seq_trainer.py index 1c2d7924a444..eeff082499c4 100644 --- a/examples/legacy/seq2seq/seq2seq_trainer.py +++ b/examples/legacy/seq2seq/seq2seq_trainer.py @@ -115,7 +115,7 @@ def create_optimizer_and_scheduler(self, num_training_steps: int): "eps": self.args.adam_epsilon, } optimizer_kwargs["lr"] = self.args.learning_rate - if self.sharded_dpp: + if self.sharded_ddp: self.optimizer = OSS( params=optimizer_grouped_parameters, optim=optimizer_cls, From 1f426af06a88364b5934403cc603e01dc1f06a86 Mon Sep 17 00:00:00 2001 From: Jim Rohrer Date: Fri, 1 Apr 2022 03:52:42 -0500 Subject: [PATCH 04/34] Add ONNX export for BeiT (#16498) * Add beit onnx conversion support * Updated docs * Added cross reference to ViT ONNX config --- docs/source/serialization.mdx | 1 + src/transformers/models/beit/__init__.py | 4 ++-- .../models/beit/configuration_beit.py | 23 +++++++++++++++++++ src/transformers/onnx/features.py | 2 ++ tests/onnx/test_onnx_v2.py | 6 ++--- 5 files changed, 31 insertions(+), 5 deletions(-) diff --git a/docs/source/serialization.mdx b/docs/source/serialization.mdx index fc969aac4fdd..65fb5fa5cc54 100644 --- a/docs/source/serialization.mdx +++ b/docs/source/serialization.mdx @@ -47,6 +47,7 @@ Ready-made configurations include the following architectures: - ALBERT - BART +- BEiT - BERT - Blenderbot - BlenderbotSmall diff --git a/src/transformers/models/beit/__init__.py b/src/transformers/models/beit/__init__.py index 319fb2880a1d..27c31775d34e 100644 --- a/src/transformers/models/beit/__init__.py +++ b/src/transformers/models/beit/__init__.py @@ -22,7 +22,7 @@ _import_structure = { - "configuration_beit": ["BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP", "BeitConfig"], + "configuration_beit": ["BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP", "BeitConfig", "BeitOnnxConfig"], } if is_vision_available(): @@ -48,7 +48,7 @@ ] if TYPE_CHECKING: - from .configuration_beit import BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP, BeitConfig + from .configuration_beit import BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP, BeitConfig, BeitOnnxConfig if is_vision_available(): from .feature_extraction_beit import BeitFeatureExtractor diff --git a/src/transformers/models/beit/configuration_beit.py b/src/transformers/models/beit/configuration_beit.py index 9a1dfa8c20fc..7c47aba0c2ab 100644 --- a/src/transformers/models/beit/configuration_beit.py +++ b/src/transformers/models/beit/configuration_beit.py @@ -13,8 +13,13 @@ # See the License for the specific language governing permissions and # limitations under the License. """ BEiT model configuration""" +from collections import OrderedDict +from typing import Mapping + +from packaging import version from ...configuration_utils import PretrainedConfig +from ...onnx import OnnxConfig from ...utils import logging @@ -176,3 +181,21 @@ def __init__( self.auxiliary_num_convs = auxiliary_num_convs self.auxiliary_concat_input = auxiliary_concat_input self.semantic_loss_ignore_index = semantic_loss_ignore_index + + +# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig +class BeitOnnxConfig(OnnxConfig): + + torch_onnx_minimum_version = version.parse("1.11") + + @property + def inputs(self) -> Mapping[str, Mapping[int, str]]: + return OrderedDict( + [ + ("pixel_values", {0: "batch", 1: "sequence"}), + ] + ) + + @property + def atol_for_validation(self) -> float: + return 1e-4 diff --git a/src/transformers/onnx/features.py b/src/transformers/onnx/features.py index 926137c59482..cf5e55c521de 100644 --- a/src/transformers/onnx/features.py +++ b/src/transformers/onnx/features.py @@ -4,6 +4,7 @@ from .. import PretrainedConfig, PreTrainedModel, TFPreTrainedModel, is_tf_available, is_torch_available from ..models.albert import AlbertOnnxConfig from ..models.bart import BartOnnxConfig +from ..models.beit import BeitOnnxConfig from ..models.bert import BertOnnxConfig from ..models.blenderbot import BlenderbotOnnxConfig from ..models.blenderbot_small import BlenderbotSmallOnnxConfig @@ -270,6 +271,7 @@ class FeaturesManager: onnx_config_cls=ElectraOnnxConfig, ), "vit": supported_features_mapping("default", "image-classification", onnx_config_cls=ViTOnnxConfig), + "beit": supported_features_mapping("default", "image-classification", onnx_config_cls=BeitOnnxConfig), "blenderbot": supported_features_mapping( "default", "default-with-past", diff --git a/tests/onnx/test_onnx_v2.py b/tests/onnx/test_onnx_v2.py index f530515aed79..ba8d51158ff9 100644 --- a/tests/onnx/test_onnx_v2.py +++ b/tests/onnx/test_onnx_v2.py @@ -15,14 +15,13 @@ export, validate_model_outputs, ) +from transformers.onnx.utils import compute_effective_axis_dimension, compute_serialized_parameters_size +from transformers.testing_utils import require_onnx, require_tf, require_torch, require_vision, slow if is_torch_available() or is_tf_available(): from transformers.onnx.features import FeaturesManager -from transformers.onnx.utils import compute_effective_axis_dimension, compute_serialized_parameters_size -from transformers.testing_utils import require_onnx, require_tf, require_torch, require_vision, slow - @require_onnx class OnnxUtilsTestCaseV2(TestCase): @@ -181,6 +180,7 @@ def test_values_override(self): ("xlm-roberta", "xlm-roberta-base"), ("layoutlm", "microsoft/layoutlm-base-uncased"), ("vit", "google/vit-base-patch16-224"), + ("beit", "microsoft/beit-base-patch16-224"), } PYTORCH_EXPORT_WITH_PAST_MODELS = { From ef37dc48640516c70f106b94794dc5db3d4d34df Mon Sep 17 00:00:00 2001 From: Ferdinand Schlatt Date: Fri, 1 Apr 2022 14:50:47 +0200 Subject: [PATCH 05/34] call on_train_end when trial is pruned (#16536) --- src/transformers/trainer.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/transformers/trainer.py b/src/transformers/trainer.py index 157e65d18352..948697e35127 100755 --- a/src/transformers/trainer.py +++ b/src/transformers/trainer.py @@ -991,6 +991,7 @@ def _report_to_hp_search( trial.report(self.objective, epoch) if trial.should_prune(): + self.callback_handler.on_train_end(self.args, self.state, self.control) raise optuna.TrialPruned() elif self.hp_search_backend == HPSearchBackend.RAY: from ray import tune From 91167b13c9055c0be7eaca27a20e9673a5ce6979 Mon Sep 17 00:00:00 2001 From: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com> Date: Fri, 1 Apr 2022 06:27:41 -0700 Subject: [PATCH 06/34] Type hints added (#16529) --- .../models/openai/modeling_tf_openai.py | 98 ++++++++++--------- 1 file changed, 50 insertions(+), 48 deletions(-) diff --git a/src/transformers/models/openai/modeling_tf_openai.py b/src/transformers/models/openai/modeling_tf_openai.py index 490b3fac47e5..80d7a9abd192 100644 --- a/src/transformers/models/openai/modeling_tf_openai.py +++ b/src/transformers/models/openai/modeling_tf_openai.py @@ -16,8 +16,9 @@ """ TF 2.0 OpenAI GPT model.""" from dataclasses import dataclass -from typing import Optional, Tuple +from typing import Optional, Tuple, Union +import numpy as np import tensorflow as tf from ...activations_tf import get_tf_activation @@ -25,6 +26,7 @@ from ...modeling_tf_utils import ( TFCausalLanguageModelingLoss, TFConv1D, + TFModelInputType, TFPreTrainedModel, TFSequenceClassificationLoss, TFSequenceSummary, @@ -510,18 +512,18 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - training=False, + input_ids: Optional[TFModelInputType] = None, + attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + training: Optional[bool] = False, **kwargs, - ): + ) -> Union[Tuple, TFBaseModelOutput]: outputs = self.transformer( input_ids=input_ids, @@ -573,19 +575,19 @@ def set_output_embeddings(self, value): ) def call( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - labels=None, - training=False, + input_ids: Optional[TFModelInputType] = None, + attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + training: Optional[bool] = False, **kwargs, - ): + ) -> Union[Tuple, TFCausalLMOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the cross entropy classification loss. Indices should be in `[0, ..., @@ -656,19 +658,19 @@ def __init__(self, config, *inputs, **kwargs): @replace_return_docstrings(output_type=TFOpenAIGPTDoubleHeadsModelOutput, config_class=_CONFIG_FOR_DOC) def call( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - mc_token_ids=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - training=False, + input_ids: Optional[TFModelInputType] = None, + attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + mc_token_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + training: Optional[bool] = False, **kwargs, - ): + ) -> Union[Tuple, TFOpenAIGPTDoubleHeadsModelOutput]: r""" mc_token_ids (`tf.Tensor` or `Numpy array` of shape `(batch_size, num_choices)`, *optional*, default to index of the last token of the input): Index of the classification token in each input sequence. Selected in the range `[0, input_ids.size(-1) - @@ -800,19 +802,19 @@ def __init__(self, config, *inputs, **kwargs): ) def call( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - labels=None, - training=False, + input_ids: Optional[TFModelInputType] = None, + attention_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + token_type_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + position_ids: Optional[Union[np.ndarray, tf.Tensor]] = None, + head_mask: Optional[Union[np.ndarray, tf.Tensor]] = None, + inputs_embeds: Optional[Union[np.ndarray, tf.Tensor]] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + labels: Optional[Union[np.ndarray, tf.Tensor]] = None, + training: Optional[bool] = False, **kwargs, - ): + ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the cross entropy classification loss. Indices should be in `[0, ..., From a9425ec12314745028dafd44ac62befac454c9c2 Mon Sep 17 00:00:00 2001 From: Gunjan Chhablani Date: Fri, 1 Apr 2022 19:20:22 +0530 Subject: [PATCH 07/34] Fix Bart type hints (#16297) * Add type hints to PLBart PyTorch * Remove pending merge conflicts * Fix PLBart Type Hints * Add changes from review --- .../models/plbart/modeling_plbart.py | 80 +++++++++---------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/src/transformers/models/plbart/modeling_plbart.py b/src/transformers/models/plbart/modeling_plbart.py index b1a2088913fd..37230541e9db 100755 --- a/src/transformers/models/plbart/modeling_plbart.py +++ b/src/transformers/models/plbart/modeling_plbart.py @@ -16,7 +16,7 @@ import copy import math import random -from typing import List, Optional, Tuple, Union +from typing import Any, Dict, List, Optional, Tuple, Union import torch import torch.utils.checkpoint @@ -1142,21 +1142,21 @@ def get_decoder(self): ) def forward( self, - input_ids=None, - attention_mask=None, - decoder_input_ids=None, - decoder_attention_mask=None, - head_mask=None, - decoder_head_mask=None, - cross_attn_head_mask=None, - encoder_outputs=None, - past_key_values=None, - inputs_embeds=None, + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + decoder_input_ids: Optional[torch.LongTensor] = None, + decoder_attention_mask: Optional[torch.Tensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.LongTensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + encoder_outputs: Optional[List[torch.FloatTensor]] = None, + past_key_values: Optional[List[torch.FloatTensor]] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, decoder_inputs_embeds=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, ): output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_hidden_states = ( @@ -1271,23 +1271,23 @@ def set_output_embeddings(self, new_embeddings): @add_end_docstrings(PLBART_GENERATION_EXAMPLE) def forward( self, - input_ids=None, - attention_mask=None, - decoder_input_ids=None, - decoder_attention_mask=None, - head_mask=None, - decoder_head_mask=None, - cross_attn_head_mask=None, - encoder_outputs=None, - past_key_values=None, - inputs_embeds=None, + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + decoder_input_ids: Optional[torch.LongTensor] = None, + decoder_attention_mask: Optional[torch.Tensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.LongTensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + encoder_outputs: Optional[List[torch.FloatTensor]] = None, + past_key_values: Optional[List[torch.FloatTensor]] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, decoder_inputs_embeds=None, - labels=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + labels: Optional[torch.Tensor] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Tuple[torch.Tensor], Seq2SeqLMOutput]: r""" labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., @@ -1345,16 +1345,16 @@ def forward( def prepare_inputs_for_generation( self, - decoder_input_ids, - past=None, - attention_mask=None, - head_mask=None, - decoder_head_mask=None, - cross_attn_head_mask=None, - use_cache=None, - encoder_outputs=None, + decoder_input_ids: torch.LongTensor, + past: Optional[List[torch.FloatTensor]] = None, + attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + use_cache: Optional[bool] = None, + encoder_outputs: Optional[List[torch.FloatTensor]] = None, **kwargs # TODO: Check if this is needed. It is unused? - ): + ) -> Dict[str, Any]: # cut decoder_input_ids if past is used if past is not None: decoder_input_ids = decoder_input_ids[:, -1:] From a1dfe0064eab59b95fc0becd94a195a79341751a Mon Sep 17 00:00:00 2001 From: Gunjan Chhablani Date: Fri, 1 Apr 2022 19:32:58 +0530 Subject: [PATCH 08/34] Add VisualBert type hints (#16544) --- .../visual_bert/modeling_visual_bert.py | 184 +++++++++--------- 1 file changed, 92 insertions(+), 92 deletions(-) diff --git a/src/transformers/models/visual_bert/modeling_visual_bert.py b/src/transformers/models/visual_bert/modeling_visual_bert.py index 0e5acf32b3c4..69495785fe81 100755 --- a/src/transformers/models/visual_bert/modeling_visual_bert.py +++ b/src/transformers/models/visual_bert/modeling_visual_bert.py @@ -17,7 +17,7 @@ import math from dataclasses import dataclass -from typing import Optional, Tuple +from typing import Optional, Tuple, Union import torch import torch.utils.checkpoint @@ -720,20 +720,20 @@ class PreTrainedModel @replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - visual_embeds=None, - visual_attention_mask=None, - visual_token_type_ids=None, - image_text_alignment=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.LongTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + visual_embeds: Optional[torch.FloatTensor] = None, + visual_attention_mask: Optional[torch.LongTensor] = None, + visual_token_type_ids: Optional[torch.LongTensor] = None, + image_text_alignment: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPooling]: r""" Returns: @@ -893,22 +893,22 @@ def set_output_embeddings(self, new_embeddings): @replace_return_docstrings(output_type=VisualBertForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - visual_embeds=None, - visual_attention_mask=None, - visual_token_type_ids=None, - image_text_alignment=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - labels=None, - sentence_image_labels=None, - ): + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.LongTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + visual_embeds: Optional[torch.FloatTensor] = None, + visual_attention_mask: Optional[torch.LongTensor] = None, + visual_token_type_ids: Optional[torch.LongTensor] = None, + image_text_alignment: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + labels: Optional[torch.LongTensor] = None, + sentence_image_labels: Optional[torch.LongTensor] = None, + ) -> Union[Tuple[torch.Tensor], VisualBertForPreTrainingOutput]: r""" labels (`torch.LongTensor` of shape `(batch_size, total_sequence_length)`, *optional*): Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., @@ -1039,21 +1039,21 @@ def __init__(self, config): @replace_return_docstrings(output_type=MultipleChoiceModelOutput, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - visual_embeds=None, - visual_attention_mask=None, - visual_token_type_ids=None, - image_text_alignment=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - labels=None, - ): + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.LongTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + visual_embeds: Optional[torch.FloatTensor] = None, + visual_attention_mask: Optional[torch.LongTensor] = None, + visual_token_type_ids: Optional[torch.LongTensor] = None, + image_text_alignment: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + labels: Optional[torch.LongTensor] = None, + ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]: r""" labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., @@ -1191,21 +1191,21 @@ def __init__(self, config): @replace_return_docstrings(output_type=SequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - visual_embeds=None, - visual_attention_mask=None, - visual_token_type_ids=None, - image_text_alignment=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - labels=None, - ): + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.LongTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + visual_embeds: Optional[torch.FloatTensor] = None, + visual_attention_mask: Optional[torch.LongTensor] = None, + visual_token_type_ids: Optional[torch.LongTensor] = None, + image_text_alignment: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + labels: Optional[torch.LongTensor] = None, + ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]: r""" labels (`torch.LongTensor` of shape `(batch_size, total_sequence_length)`, *optional*): Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., @@ -1317,21 +1317,21 @@ def __init__(self, config): @replace_return_docstrings(output_type=SequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - visual_embeds=None, - visual_attention_mask=None, - visual_token_type_ids=None, - image_text_alignment=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - labels=None, - ): + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.LongTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + visual_embeds: Optional[torch.FloatTensor] = None, + visual_attention_mask: Optional[torch.LongTensor] = None, + visual_token_type_ids: Optional[torch.LongTensor] = None, + image_text_alignment: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + labels: Optional[torch.LongTensor] = None, + ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]: r""" labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., @@ -1477,22 +1477,22 @@ def __init__(self, config): @replace_return_docstrings(output_type=SequenceClassifierOutput, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - visual_embeds=None, - visual_attention_mask=None, - visual_token_type_ids=None, - image_text_alignment=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - region_to_phrase_position=None, - labels=None, - ): + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.LongTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.LongTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + visual_embeds: Optional[torch.FloatTensor] = None, + visual_attention_mask: Optional[torch.LongTensor] = None, + visual_token_type_ids: Optional[torch.LongTensor] = None, + image_text_alignment: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + region_to_phrase_position: Optional[torch.LongTensor] = None, + labels: Optional[torch.LongTensor] = None, + ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]: r""" region_to_phrase_position (`torch.LongTensor` of shape `(batch_size, total_sequence_length)`, *optional*): The positions depicting the position of the image embedding corresponding to the textual tokens. From 8976f05a8f9524a4278b34143f010d1578c43255 Mon Sep 17 00:00:00 2001 From: Rishav Chandra Varma Date: Fri, 1 Apr 2022 19:51:26 +0530 Subject: [PATCH 09/34] Adding missing type hints for mBART model (PyTorch) (#16429) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent Co-authored-by: matt --- .../modeling_bigbird_pegasus.py | 2 +- .../models/blenderbot/modeling_blenderbot.py | 4 +- .../models/m2m_100/modeling_m2m_100.py | 4 +- .../models/mbart/modeling_mbart.py | 112 +++++++++--------- .../models/pegasus/modeling_pegasus.py | 4 +- src/transformers/models/xglm/modeling_xglm.py | 2 +- 6 files changed, 64 insertions(+), 64 deletions(-) diff --git a/src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py b/src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py index 540f77944b7b..1fb8de8e1452 100755 --- a/src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py +++ b/src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py @@ -1478,7 +1478,7 @@ def forward( past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = True, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* diff --git a/src/transformers/models/blenderbot/modeling_blenderbot.py b/src/transformers/models/blenderbot/modeling_blenderbot.py index 928e22e860e7..d1f84d2c3917 100755 --- a/src/transformers/models/blenderbot/modeling_blenderbot.py +++ b/src/transformers/models/blenderbot/modeling_blenderbot.py @@ -294,7 +294,7 @@ def forward( attention_mask: torch.Tensor, layer_head_mask: torch.Tensor, output_attentions: bool = False, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* @@ -378,7 +378,7 @@ def forward( past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = True, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* diff --git a/src/transformers/models/m2m_100/modeling_m2m_100.py b/src/transformers/models/m2m_100/modeling_m2m_100.py index 3bb749564a01..d816218824e1 100755 --- a/src/transformers/models/m2m_100/modeling_m2m_100.py +++ b/src/transformers/models/m2m_100/modeling_m2m_100.py @@ -363,7 +363,7 @@ def forward( attention_mask: torch.Tensor, layer_head_mask: torch.Tensor, output_attentions: bool = False, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* @@ -447,7 +447,7 @@ def forward( past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = True, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* diff --git a/src/transformers/models/mbart/modeling_mbart.py b/src/transformers/models/mbart/modeling_mbart.py index 446a02f648cd..6ed7c24ab176 100755 --- a/src/transformers/models/mbart/modeling_mbart.py +++ b/src/transformers/models/mbart/modeling_mbart.py @@ -307,7 +307,7 @@ def forward( attention_mask: torch.Tensor, layer_head_mask: torch.Tensor, output_attentions: bool = False, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* @@ -390,7 +390,7 @@ def forward( past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = True, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* @@ -722,14 +722,14 @@ def _backward_compatibility_gradient_checkpointing(self): def forward( self, - input_ids=None, - attention_mask=None, - head_mask=None, - inputs_embeds=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.Tensor] = None, + head_mask: Optional[torch.Tensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Tuple, BaseModelOutput]: r""" Args: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): @@ -913,19 +913,19 @@ def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_em def forward( self, - input_ids=None, - attention_mask=None, - encoder_hidden_states=None, - encoder_attention_mask=None, - head_mask=None, - cross_attn_head_mask=None, - past_key_values=None, - inputs_embeds=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.Tensor] = None, + encoder_hidden_states: Optional[torch.FloatTensor] = None, + encoder_attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Tuple, BaseModelOutputWithPastAndCrossAttentions]: r""" Args: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): @@ -1168,22 +1168,22 @@ def get_decoder(self): ) def forward( self, - input_ids=None, - attention_mask=None, - decoder_input_ids=None, - decoder_attention_mask=None, - head_mask=None, - decoder_head_mask=None, - cross_attn_head_mask=None, - encoder_outputs=None, - past_key_values=None, - inputs_embeds=None, - decoder_inputs_embeds=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.Tensor] = None, + decoder_input_ids: Optional[torch.LongTensor] = None, + decoder_attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + encoder_outputs: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + decoder_inputs_embeds: Optional[torch.FloatTensor] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Seq2SeqModelOutput, Tuple[torch.FloatTensor]]: output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_hidden_states = ( output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states @@ -1297,23 +1297,23 @@ def set_output_embeddings(self, new_embeddings): @add_end_docstrings(MBART_GENERATION_EXAMPLE) def forward( self, - input_ids=None, - attention_mask=None, - decoder_input_ids=None, - decoder_attention_mask=None, - head_mask=None, - decoder_head_mask=None, - cross_attn_head_mask=None, - encoder_outputs=None, - past_key_values=None, - inputs_embeds=None, - decoder_inputs_embeds=None, - labels=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.Tensor] = None, + decoder_input_ids: Optional[torch.LongTensor] = None, + decoder_attention_mask: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.Tensor] = None, + decoder_head_mask: Optional[torch.Tensor] = None, + cross_attn_head_mask: Optional[torch.Tensor] = None, + encoder_outputs: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + decoder_inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[Seq2SeqLMOutput, Tuple[torch.FloatTensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., diff --git a/src/transformers/models/pegasus/modeling_pegasus.py b/src/transformers/models/pegasus/modeling_pegasus.py index f1d7a6ce56ef..06cf9f130a73 100755 --- a/src/transformers/models/pegasus/modeling_pegasus.py +++ b/src/transformers/models/pegasus/modeling_pegasus.py @@ -309,7 +309,7 @@ def forward( attention_mask: torch.Tensor, layer_head_mask: torch.Tensor, output_attentions: bool = False, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* @@ -393,7 +393,7 @@ def forward( past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = True, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* diff --git a/src/transformers/models/xglm/modeling_xglm.py b/src/transformers/models/xglm/modeling_xglm.py index af277fcd7880..8d45e2b200b7 100755 --- a/src/transformers/models/xglm/modeling_xglm.py +++ b/src/transformers/models/xglm/modeling_xglm.py @@ -423,7 +423,7 @@ def forward( past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: Optional[bool] = False, use_cache: Optional[bool] = True, - ): + ) -> torch.Tensor: """ Args: hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* From 50ca1d1055691cd9363a511a7cc8089a2e618c61 Mon Sep 17 00:00:00 2001 From: Gunjan Chhablani Date: Fri, 1 Apr 2022 20:09:28 +0530 Subject: [PATCH 10/34] Remove MBart subclass of XLMRoberta in tokenzier docs (#16546) * Remove MBart subclass of XLMRoberta in tokenzier * Fix style * Copy docs from MBart50 tokenizer --- src/transformers/models/mbart/tokenization_mbart_fast.py | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/transformers/models/mbart/tokenization_mbart_fast.py b/src/transformers/models/mbart/tokenization_mbart_fast.py index 1de8d62f3608..a172d37913a4 100644 --- a/src/transformers/models/mbart/tokenization_mbart_fast.py +++ b/src/transformers/models/mbart/tokenization_mbart_fast.py @@ -62,9 +62,8 @@ class MBartTokenizerFast(PreTrainedTokenizerFast): Construct a "fast" MBART tokenizer (backed by HuggingFace's *tokenizers* library). Based on [BPE](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=BPE#models). - [`MBartTokenizerFast`] is a subclass of [`XLMRobertaTokenizerFast`]. Refer to superclass - [`XLMRobertaTokenizerFast`] for usage examples and documentation concerning the initialization parameters and other - methods. + This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should + refer to this superclass for more information regarding those methods. The tokenization method is ` ` for source language documents, and `` ``` for target language documents. From 4ba9b4d663ce9dae4220aa3b69741dd6bb881c9b Mon Sep 17 00:00:00 2001 From: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Fri, 1 Apr 2022 16:53:07 +0200 Subject: [PATCH 11/34] Use random_attention_mask for TF tests (#16517) * use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh --- ...est_modeling_tf_{{cookiecutter.lowercase_modelname}}.py | 4 ++-- tests/albert/test_modeling_tf_albert.py | 4 ++-- tests/bert/test_modeling_tf_bert.py | 4 ++-- tests/clip/test_modeling_tf_clip.py | 6 ++++++ tests/convbert/test_modeling_tf_convbert.py | 4 ++-- tests/ctrl/test_modeling_tf_ctrl.py | 4 ++-- tests/deberta/test_modeling_tf_deberta.py | 4 ++-- tests/deberta_v2/test_modeling_tf_deberta_v2.py | 4 ++-- tests/distilbert/test_modeling_tf_distilbert.py | 4 ++-- tests/dpr/test_modeling_tf_dpr.py | 7 +++---- tests/electra/test_modeling_tf_electra.py | 4 ++-- tests/flaubert/test_modeling_tf_flaubert.py | 4 ++-- tests/funnel/test_modeling_tf_funnel.py | 4 ++-- tests/gpt2/test_modeling_tf_gpt2.py | 4 ++-- tests/gptj/test_modeling_tf_gptj.py | 4 ++-- tests/layoutlm/test_modeling_tf_layoutlm.py | 4 ++-- tests/longformer/test_modeling_tf_longformer.py | 4 ++-- tests/lxmert/test_modeling_tf_lxmert.py | 4 ++-- tests/mobilebert/test_modeling_tf_mobilebert.py | 4 ++-- tests/mpnet/test_modeling_tf_mpnet.py | 4 ++-- tests/openai/test_modeling_tf_openai.py | 4 ++-- tests/rembert/test_modeling_tf_rembert.py | 4 ++-- tests/roberta/test_modeling_tf_roberta.py | 4 ++-- tests/roformer/test_modeling_tf_roformer.py | 4 ++-- tests/t5/test_modeling_tf_t5.py | 4 ++-- tests/tapas/test_modeling_tf_tapas.py | 4 ++-- tests/test_modeling_tf_common.py | 2 +- tests/xlm/test_modeling_tf_xlm.py | 4 ++-- tests/xlnet/test_modeling_tf_xlnet.py | 4 ++-- 29 files changed, 62 insertions(+), 57 deletions(-) diff --git a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py index 16b31500dd6c..57fd95dd3ff6 100644 --- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py +++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py @@ -21,7 +21,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask if is_tf_available(): @@ -92,7 +92,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/albert/test_modeling_tf_albert.py b/tests/albert/test_modeling_tf_albert.py index 59815561c056..7eacc1f32a47 100644 --- a/tests/albert/test_modeling_tf_albert.py +++ b/tests/albert/test_modeling_tf_albert.py @@ -21,7 +21,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -96,7 +96,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/bert/test_modeling_tf_bert.py b/tests/bert/test_modeling_tf_bert.py index 611268337ffd..8c709e093801 100644 --- a/tests/bert/test_modeling_tf_bert.py +++ b/tests/bert/test_modeling_tf_bert.py @@ -21,7 +21,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask from ..utils.test_modeling_tf_core import TFCoreModelTesterMixin @@ -96,7 +96,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/clip/test_modeling_tf_clip.py b/tests/clip/test_modeling_tf_clip.py index 02e289cd5b2a..d3c3cb9f5033 100644 --- a/tests/clip/test_modeling_tf_clip.py +++ b/tests/clip/test_modeling_tf_clip.py @@ -301,6 +301,12 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: input_mask = random_attention_mask([self.batch_size, self.seq_length]) + # make sure the first token has attention mask `1` to ensure that, after combining the causal mask, there + # is still at least one token being attended to for each batch. + # TODO: Change `random_attention_mask` in PT/TF/Flax common test file, after a discussion with the team. + input_mask = tf.concat( + [tf.ones_like(input_mask[:, :1], dtype=input_mask.dtype), input_mask[:, 1:]], axis=-1 + ) config = self.get_config() diff --git a/tests/convbert/test_modeling_tf_convbert.py b/tests/convbert/test_modeling_tf_convbert.py index ff4cbb1aa974..e2d68876263a 100644 --- a/tests/convbert/test_modeling_tf_convbert.py +++ b/tests/convbert/test_modeling_tf_convbert.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -94,7 +94,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/ctrl/test_modeling_tf_ctrl.py b/tests/ctrl/test_modeling_tf_ctrl.py index 65b984b51c9a..d17a97a3ad83 100644 --- a/tests/ctrl/test_modeling_tf_ctrl.py +++ b/tests/ctrl/test_modeling_tf_ctrl.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -69,7 +69,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/deberta/test_modeling_tf_deberta.py b/tests/deberta/test_modeling_tf_deberta.py index 581f6f02f470..7e2a3c3110ee 100644 --- a/tests/deberta/test_modeling_tf_deberta.py +++ b/tests/deberta/test_modeling_tf_deberta.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -92,7 +92,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/deberta_v2/test_modeling_tf_deberta_v2.py b/tests/deberta_v2/test_modeling_tf_deberta_v2.py index 391afee59784..4fd967c2fa6e 100644 --- a/tests/deberta_v2/test_modeling_tf_deberta_v2.py +++ b/tests/deberta_v2/test_modeling_tf_deberta_v2.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -95,7 +95,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/distilbert/test_modeling_tf_distilbert.py b/tests/distilbert/test_modeling_tf_distilbert.py index 7a146e9c3bf8..5266723f1f86 100644 --- a/tests/distilbert/test_modeling_tf_distilbert.py +++ b/tests/distilbert/test_modeling_tf_distilbert.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -70,7 +70,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) sequence_labels = None token_labels = None diff --git a/tests/dpr/test_modeling_tf_dpr.py b/tests/dpr/test_modeling_tf_dpr.py index 7a48a2254e10..ffce36efc3a6 100644 --- a/tests/dpr/test_modeling_tf_dpr.py +++ b/tests/dpr/test_modeling_tf_dpr.py @@ -19,7 +19,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -94,9 +94,8 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor( - [self.batch_size, self.seq_length], vocab_size=2 - ) # follow test_modeling_tf_ctrl.py + # follow test_modeling_tf_ctrl.py + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/electra/test_modeling_tf_electra.py b/tests/electra/test_modeling_tf_electra.py index 4593ecff6100..ff2acd37e69f 100644 --- a/tests/electra/test_modeling_tf_electra.py +++ b/tests/electra/test_modeling_tf_electra.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask if is_tf_available(): @@ -71,7 +71,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/flaubert/test_modeling_tf_flaubert.py b/tests/flaubert/test_modeling_tf_flaubert.py index 62503bac2861..86bcd6ea6484 100644 --- a/tests/flaubert/test_modeling_tf_flaubert.py +++ b/tests/flaubert/test_modeling_tf_flaubert.py @@ -19,7 +19,7 @@ from transformers.testing_utils import require_sentencepiece, require_tf, require_tokenizers, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -75,7 +75,7 @@ def __init__( def prepare_config_and_inputs(self): input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) - input_mask = ids_tensor([self.batch_size, self.seq_length], 2, dtype=tf.float32) + input_mask = random_attention_mask([self.batch_size, self.seq_length], dtype=tf.float32) input_lengths = None if self.use_input_lengths: diff --git a/tests/funnel/test_modeling_tf_funnel.py b/tests/funnel/test_modeling_tf_funnel.py index 6105f9ab8035..c3ae3788d61e 100644 --- a/tests/funnel/test_modeling_tf_funnel.py +++ b/tests/funnel/test_modeling_tf_funnel.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -111,7 +111,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/gpt2/test_modeling_tf_gpt2.py b/tests/gpt2/test_modeling_tf_gpt2.py index f94387509e6a..d6470c0d1526 100644 --- a/tests/gpt2/test_modeling_tf_gpt2.py +++ b/tests/gpt2/test_modeling_tf_gpt2.py @@ -19,7 +19,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask from ..utils.test_modeling_tf_core import TFCoreModelTesterMixin @@ -74,7 +74,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/gptj/test_modeling_tf_gptj.py b/tests/gptj/test_modeling_tf_gptj.py index 32ce3f8564b0..63feffb8c62e 100644 --- a/tests/gptj/test_modeling_tf_gptj.py +++ b/tests/gptj/test_modeling_tf_gptj.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow, tooslow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask from ..utils.test_modeling_tf_core import TFCoreModelTesterMixin @@ -70,7 +70,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/layoutlm/test_modeling_tf_layoutlm.py b/tests/layoutlm/test_modeling_tf_layoutlm.py index f60d0c6f91d5..90e2b4fcf169 100644 --- a/tests/layoutlm/test_modeling_tf_layoutlm.py +++ b/tests/layoutlm/test_modeling_tf_layoutlm.py @@ -21,7 +21,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -107,7 +107,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/longformer/test_modeling_tf_longformer.py b/tests/longformer/test_modeling_tf_longformer.py index 37c1ce534953..6bfa708912dd 100644 --- a/tests/longformer/test_modeling_tf_longformer.py +++ b/tests/longformer/test_modeling_tf_longformer.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_sentencepiece, require_tf, require_tokenizers, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -79,7 +79,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/lxmert/test_modeling_tf_lxmert.py b/tests/lxmert/test_modeling_tf_lxmert.py index 8d91d249d90b..63ec44a1ad90 100644 --- a/tests/lxmert/test_modeling_tf_lxmert.py +++ b/tests/lxmert/test_modeling_tf_lxmert.py @@ -23,7 +23,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -124,7 +124,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_lang_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size) diff --git a/tests/mobilebert/test_modeling_tf_mobilebert.py b/tests/mobilebert/test_modeling_tf_mobilebert.py index 4cbfcefee874..c0ddf043562f 100644 --- a/tests/mobilebert/test_modeling_tf_mobilebert.py +++ b/tests/mobilebert/test_modeling_tf_mobilebert.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -114,7 +114,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/mpnet/test_modeling_tf_mpnet.py b/tests/mpnet/test_modeling_tf_mpnet.py index 23448610cc21..f9f9e2d51201 100644 --- a/tests/mpnet/test_modeling_tf_mpnet.py +++ b/tests/mpnet/test_modeling_tf_mpnet.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -90,7 +90,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) sequence_labels = None token_labels = None diff --git a/tests/openai/test_modeling_tf_openai.py b/tests/openai/test_modeling_tf_openai.py index 227689df59aa..f74a85ee60d6 100644 --- a/tests/openai/test_modeling_tf_openai.py +++ b/tests/openai/test_modeling_tf_openai.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -70,7 +70,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/rembert/test_modeling_tf_rembert.py b/tests/rembert/test_modeling_tf_rembert.py index f8f17f30a9dd..d5d52062e8c9 100644 --- a/tests/rembert/test_modeling_tf_rembert.py +++ b/tests/rembert/test_modeling_tf_rembert.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask if is_tf_available(): @@ -95,7 +95,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/roberta/test_modeling_tf_roberta.py b/tests/roberta/test_modeling_tf_roberta.py index fa947d64f081..9771673d8748 100644 --- a/tests/roberta/test_modeling_tf_roberta.py +++ b/tests/roberta/test_modeling_tf_roberta.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_sentencepiece, require_tf, require_tokenizers, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, floats_tensor, ids_tensor, random_attention_mask if is_tf_available(): @@ -72,7 +72,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/roformer/test_modeling_tf_roformer.py b/tests/roformer/test_modeling_tf_roformer.py index 1f26f7e2adc6..9a23ca3b83d2 100644 --- a/tests/roformer/test_modeling_tf_roformer.py +++ b/tests/roformer/test_modeling_tf_roformer.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -95,7 +95,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = None if self.use_token_type_ids: diff --git a/tests/t5/test_modeling_tf_t5.py b/tests/t5/test_modeling_tf_t5.py index a2ea255faca5..c6585f83b18e 100644 --- a/tests/t5/test_modeling_tf_t5.py +++ b/tests/t5/test_modeling_tf_t5.py @@ -20,7 +20,7 @@ from transformers.utils import cached_property from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -58,7 +58,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_labels = None if self.use_labels: diff --git a/tests/tapas/test_modeling_tf_tapas.py b/tests/tapas/test_modeling_tf_tapas.py index 936273a6ca30..9e3cb63f70b5 100644 --- a/tests/tapas/test_modeling_tf_tapas.py +++ b/tests/tapas/test_modeling_tf_tapas.py @@ -38,7 +38,7 @@ from transformers.utils import cached_property from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -158,7 +158,7 @@ def prepare_config_and_inputs(self): input_mask = None if self.use_input_mask: - input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2) + input_mask = random_attention_mask([self.batch_size, self.seq_length]) token_type_ids = [] for type_vocab_size in self.type_vocab_sizes: diff --git a/tests/test_modeling_tf_common.py b/tests/test_modeling_tf_common.py index 3d2f7976cf6c..9473a50f53aa 100644 --- a/tests/test_modeling_tf_common.py +++ b/tests/test_modeling_tf_common.py @@ -1440,7 +1440,7 @@ def ids_tensor(shape, vocab_size, rng=None, name=None, dtype=None): def random_attention_mask(shape, rng=None, name=None, dtype=None): attn_mask = ids_tensor(shape, vocab_size=2, rng=None, name=None, dtype=dtype) # make sure that at least one token is attended to for each batch - attn_mask = tf.concat([tf.constant(value=1, shape=(shape[0], 1), dtype=dtype), attn_mask[:, 1:]], axis=1) + attn_mask = tf.concat([attn_mask[:, :-1], tf.ones_like(attn_mask[:, -1:], dtype=dtype)], axis=-1) return attn_mask diff --git a/tests/xlm/test_modeling_tf_xlm.py b/tests/xlm/test_modeling_tf_xlm.py index 5fc4d2413f9e..412a8430ad6d 100644 --- a/tests/xlm/test_modeling_tf_xlm.py +++ b/tests/xlm/test_modeling_tf_xlm.py @@ -20,7 +20,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -75,7 +75,7 @@ def __init__( def prepare_config_and_inputs(self): input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) - input_mask = ids_tensor([self.batch_size, self.seq_length], 2, dtype=tf.float32) + input_mask = random_attention_mask([self.batch_size, self.seq_length], dtype=tf.float32) input_lengths = None if self.use_input_lengths: diff --git a/tests/xlnet/test_modeling_tf_xlnet.py b/tests/xlnet/test_modeling_tf_xlnet.py index 4b92581a0efc..8cf4ca2099bd 100644 --- a/tests/xlnet/test_modeling_tf_xlnet.py +++ b/tests/xlnet/test_modeling_tf_xlnet.py @@ -22,7 +22,7 @@ from transformers.testing_utils import require_tf, slow from ..test_configuration_common import ConfigTester -from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor +from ..test_modeling_tf_common import TFModelTesterMixin, ids_tensor, random_attention_mask if is_tf_available(): @@ -75,7 +75,7 @@ def prepare_config_and_inputs(self): input_ids_1 = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) input_ids_2 = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) segment_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size) - input_mask = ids_tensor([self.batch_size, self.seq_length], 2, dtype=tf.float32) + input_mask = random_attention_mask([self.batch_size, self.seq_length], dtype=tf.float32) input_ids_q = ids_tensor([self.batch_size, self.seq_length + 1], self.vocab_size) perm_mask = tf.zeros((self.batch_size, self.seq_length + 1, self.seq_length), dtype=tf.float32) From ecd9f353c92ac62395db9b3570c3f191b3242e40 Mon Sep 17 00:00:00 2001 From: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Date: Fri, 1 Apr 2022 17:19:36 +0200 Subject: [PATCH 12/34] Improve code example (#16450) Co-authored-by: Niels Rogge --- src/transformers/models/glpn/modeling_glpn.py | 28 +++++++++++++++---- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/src/transformers/models/glpn/modeling_glpn.py b/src/transformers/models/glpn/modeling_glpn.py index c8d6bac79b36..86e53c787572 100755 --- a/src/transformers/models/glpn/modeling_glpn.py +++ b/src/transformers/models/glpn/modeling_glpn.py @@ -708,18 +708,36 @@ def forward( ```python >>> from transformers import GLPNFeatureExtractor, GLPNForDepthEstimation + >>> import torch + >>> import numpy as np >>> from PIL import Image >>> import requests - >>> feature_extractor = GLPNFeatureExtractor.from_pretrained("vinvino02/glpn-kitti") - >>> model = GLPNForDepthEstimation.from_pretrained("vinvino02/glpn-kitti") - >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) + >>> feature_extractor = GLPNFeatureExtractor.from_pretrained("vinvino02/glpn-kitti") + >>> model = GLPNForDepthEstimation.from_pretrained("vinvino02/glpn-kitti") + + >>> # prepare image for the model >>> inputs = feature_extractor(images=image, return_tensors="pt") - >>> outputs = model(**inputs) - >>> predicted_depth = outputs.predicted_depth # shape (batch_size, height, width) + + >>> with torch.no_grad(): + ... outputs = model(**inputs) + ... predicted_depth = outputs.predicted_depth + + >>> # interpolate to original size + >>> prediction = torch.nn.functional.interpolate( + ... predicted_depth.unsqueeze(1), + ... size=image.size[::-1], + ... mode="bicubic", + ... align_corners=False, + ... ) + + >>> # visualize the prediction + >>> output = prediction.squeeze().cpu().numpy() + >>> formatted = (output * 255 / np.max(output)).astype("uint8") + >>> depth = Image.fromarray(formatted) ```""" return_dict = return_dict if return_dict is not None else self.config.use_return_dict output_hidden_states = ( From f05d235c226d0b183377c23913f2565b9322af43 Mon Sep 17 00:00:00 2001 From: Lysandre Debut Date: Fri, 1 Apr 2022 17:53:18 +0200 Subject: [PATCH 13/34] Pin tokenizers version <0.13 (#16539) * Pin tokenizers version <0.13 * Style --- setup.py | 2 +- src/transformers/dependency_versions_table.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/setup.py b/setup.py index c9455eaa901d..56ba7d4c4784 100644 --- a/setup.py +++ b/setup.py @@ -151,7 +151,7 @@ "tf2onnx", "timeout-decorator", "timm", - "tokenizers>=0.11.1,!=0.11.3", + "tokenizers>=0.11.1,!=0.11.3,<0.13", "torch>=1.0", "torchaudio", "pyctcdecode>=0.3.0", diff --git a/src/transformers/dependency_versions_table.py b/src/transformers/dependency_versions_table.py index 2ba72f5b9593..334103c20a56 100644 --- a/src/transformers/dependency_versions_table.py +++ b/src/transformers/dependency_versions_table.py @@ -61,7 +61,7 @@ "tf2onnx": "tf2onnx", "timeout-decorator": "timeout-decorator", "timm": "timm", - "tokenizers": "tokenizers>=0.11.1,!=0.11.3", + "tokenizers": "tokenizers>=0.11.1,!=0.11.3,<0.13", "torch": "torch>=1.0", "torchaudio": "torchaudio", "pyctcdecode": "pyctcdecode>=0.3.0", From 085f0f7dea866862e99915981a807e4fc9bd1bc0 Mon Sep 17 00:00:00 2001 From: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Fri, 1 Apr 2022 17:54:01 +0200 Subject: [PATCH 14/34] Add code samples for TF speech models (#16494) Co-authored-by: ydshieh --- src/transformers/utils/doc.py | 63 +++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/src/transformers/utils/doc.py b/src/transformers/utils/doc.py index f395f8d4fb80..eaf59ba50215 100644 --- a/src/transformers/utils/doc.py +++ b/src/transformers/utils/doc.py @@ -794,6 +794,67 @@ def _prepare_output_docstrings(output_type, config_class, min_indent=None): ``` """ +TF_SPEECH_BASE_MODEL_SAMPLE = r""" + Example: + + ```python + >>> from transformers import {processor_class}, {model_class} + >>> from datasets import load_dataset + + >>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") + >>> dataset = dataset.sort("id") + >>> sampling_rate = dataset.features["audio"].sampling_rate + + >>> processor = {processor_class}.from_pretrained("{checkpoint}") + >>> model = {model_class}.from_pretrained("{checkpoint}") + + >>> # audio file is decoded on the fly + >>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="tf") + >>> outputs = model(**inputs) + + >>> last_hidden_states = outputs.last_hidden_state + >>> list(last_hidden_states.shape) + {expected_output} + ``` +""" + +TF_SPEECH_CTC_SAMPLE = r""" + Example: + + ```python + >>> from transformers import {processor_class}, {model_class} + >>> from datasets import load_dataset + >>> import tensorflow as tf + + >>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") + >>> dataset = dataset.sort("id") + >>> sampling_rate = dataset.features["audio"].sampling_rate + + >>> processor = {processor_class}.from_pretrained("{checkpoint}") + >>> model = {model_class}.from_pretrained("{checkpoint}") + + >>> # audio file is decoded on the fly + >>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="tf") + >>> logits = model(**inputs).logits + >>> predicted_ids = tf.math.argmax(logits, axis=-1) + + >>> # transcribe speech + >>> transcription = processor.batch_decode(predicted_ids) + >>> transcription[0] + {expected_output} + ``` + + ```python + >>> with processor.as_target_processor(): + ... inputs["labels"] = processor(dataset[0]["text"], return_tensors="tf").input_ids + + >>> # compute loss + >>> loss = model(**inputs).loss + >>> round(float(loss), 2) + {expected_loss} + ``` +""" + TF_VISION_BASE_MODEL_SAMPLE = r""" Example: @@ -848,6 +909,8 @@ def _prepare_output_docstrings(output_type, config_class, min_indent=None): "MaskedLM": TF_MASKED_LM_SAMPLE, "LMHead": TF_CAUSAL_LM_SAMPLE, "BaseModel": TF_BASE_MODEL_SAMPLE, + "SpeechBaseModel": TF_SPEECH_BASE_MODEL_SAMPLE, + "CTC": TF_SPEECH_CTC_SAMPLE, "VisionBaseModel": TF_VISION_BASE_MODEL_SAMPLE, "ImageClassification": TF_VISION_SEQ_CLASS_SAMPLE, } From cbc776aa3d77b66c573bf4330b7afed50017e308 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 4 Apr 2022 13:53:54 +0200 Subject: [PATCH 15/34] [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581) * [FlaxSpeechEncoderDecoder] Fix dtype bug * more fixes --- .../modeling_flax_speech_encoder_decoder.py | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py b/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py index aff3953b8407..6e36703cf9d7 100644 --- a/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py +++ b/src/transformers/models/speech_encoder_decoder/modeling_flax_speech_encoder_decoder.py @@ -310,7 +310,7 @@ def __call__( decoder_hidden_states=decoder_outputs.hidden_states, decoder_attentions=decoder_outputs.attentions, cross_attentions=decoder_outputs.cross_attentions, - encoder_last_hidden_state=encoder_outputs.last_hidden_state, + encoder_last_hidden_state=encoder_hidden_states, encoder_hidden_states=encoder_outputs.hidden_states, encoder_attentions=encoder_outputs.attentions, ) @@ -363,8 +363,8 @@ def init_weights(self, rng: jax.random.PRNGKey, input_shape: Tuple) -> FrozenDic encoder_input_shape, decoder_input_shape = input_shape # init input DeviceArrays - inputs = jnp.zeros(encoder_input_shape, dtype="i4") - attention_mask = jnp.ones_like(inputs) + inputs = jnp.zeros(encoder_input_shape, dtype="f4") + attention_mask = jnp.ones_like(inputs, dtype="i4") decoder_input_ids = jnp.zeros(decoder_input_shape, dtype="i4") decoder_attention_mask = jnp.ones_like(decoder_input_ids) @@ -472,7 +472,7 @@ def encode( return_dict = return_dict if return_dict is not None else self.config.return_dict if attention_mask is None: - attention_mask = jnp.ones_like(inputs) + attention_mask = jnp.ones_like(inputs, dtype="i4") # Handle any PRNG if needed rngs = {} @@ -485,7 +485,7 @@ def _encoder_forward(module, inputs, attention_mask, **kwargs): outputs = self.module.apply( {"params": params or self.params}, - inputs=jnp.array(inputs, dtype="i4"), + inputs=jnp.array(inputs, dtype="f4"), attention_mask=jnp.array(attention_mask, dtype="i4"), output_attentions=output_attentions, output_hidden_states=output_hidden_states, @@ -680,7 +680,7 @@ def __call__( # prepare encoder inputs if attention_mask is None: - attention_mask = jnp.ones_like(inputs) + attention_mask = jnp.ones_like(inputs, dtype="i4") # prepare decoder inputs if decoder_input_ids is None: @@ -700,7 +700,7 @@ def __call__( return self.module.apply( {"params": params or self.params}, - inputs=jnp.array(inputs, dtype="i4"), + inputs=jnp.array(inputs, dtype="f4"), attention_mask=jnp.array(attention_mask, dtype="i4"), decoder_input_ids=jnp.array(decoder_input_ids, dtype="i4"), decoder_attention_mask=jnp.array(decoder_attention_mask, dtype="i4"), From b615e7c754cc201188d55a5e1abb74f49ea0f5d7 Mon Sep 17 00:00:00 2001 From: Nicolas Patry Date: Mon, 4 Apr 2022 14:26:23 +0200 Subject: [PATCH 16/34] Making the impossible to connect error actually report the right URL. (#16446) --- src/transformers/configuration_utils.py | 3 ++- src/transformers/feature_extraction_utils.py | 3 ++- src/transformers/modeling_flax_utils.py | 3 ++- src/transformers/modeling_tf_utils.py | 3 ++- src/transformers/modeling_utils.py | 5 +++-- 5 files changed, 11 insertions(+), 6 deletions(-) diff --git a/src/transformers/configuration_utils.py b/src/transformers/configuration_utils.py index f572cd9fd5a8..f7318bf8ab84 100755 --- a/src/transformers/configuration_utils.py +++ b/src/transformers/configuration_utils.py @@ -31,6 +31,7 @@ from .dynamic_module_utils import custom_object_save from .utils import ( CONFIG_NAME, + HUGGINGFACE_CO_RESOLVE_ENDPOINT, EntryNotFoundError, PushToHubMixin, RepositoryNotFoundError, @@ -626,7 +627,7 @@ def _get_config_dict( ) except ValueError: raise EnvironmentError( - "We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached " + f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this model, couldn't find it in the cached " f"files and it looks like {pretrained_model_name_or_path} is not the path to a directory containing a " "{configuration_file} file.\nCheckout your internet connection or see how to run the library in " "offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'." diff --git a/src/transformers/feature_extraction_utils.py b/src/transformers/feature_extraction_utils.py index 953ef41ba7db..bb719b98f6e7 100644 --- a/src/transformers/feature_extraction_utils.py +++ b/src/transformers/feature_extraction_utils.py @@ -29,6 +29,7 @@ from .dynamic_module_utils import custom_object_save from .utils import ( FEATURE_EXTRACTOR_NAME, + HUGGINGFACE_CO_RESOLVE_ENDPOINT, EntryNotFoundError, PushToHubMixin, RepositoryNotFoundError, @@ -433,7 +434,7 @@ def get_feature_extractor_dict( ) except ValueError: raise EnvironmentError( - "We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached " + f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this model, couldn't find it in the cached " f"files and it looks like {pretrained_model_name_or_path} is not the path to a directory containing a " f"{FEATURE_EXTRACTOR_NAME} file.\nCheckout your internet connection or see how to run the library in " "offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'." diff --git a/src/transformers/modeling_flax_utils.py b/src/transformers/modeling_flax_utils.py index dd9a7dc29fd7..3ff9ae387582 100644 --- a/src/transformers/modeling_flax_utils.py +++ b/src/transformers/modeling_flax_utils.py @@ -34,6 +34,7 @@ from .modeling_flax_pytorch_utils import load_pytorch_checkpoint_in_flax_state_dict from .utils import ( FLAX_WEIGHTS_NAME, + HUGGINGFACE_CO_RESOLVE_ENDPOINT, WEIGHTS_NAME, EntryNotFoundError, PushToHubMixin, @@ -530,7 +531,7 @@ def from_pretrained( ) except ValueError: raise EnvironmentError( - "We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached " + f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this model, couldn't find it in the cached " f"files and it looks like {pretrained_model_name_or_path} is not the path to a directory " f"containing a file named {FLAX_WEIGHTS_NAME} or {WEIGHTS_NAME}.\n" "Checkout your internet connection or see how to run the library in offline mode at " diff --git a/src/transformers/modeling_tf_utils.py b/src/transformers/modeling_tf_utils.py index d46226a5a1d1..a28a09425087 100644 --- a/src/transformers/modeling_tf_utils.py +++ b/src/transformers/modeling_tf_utils.py @@ -43,6 +43,7 @@ from .tokenization_utils_base import BatchEncoding from .utils import ( DUMMY_INPUTS, + HUGGINGFACE_CO_RESOLVE_ENDPOINT, TF2_WEIGHTS_NAME, WEIGHTS_NAME, EntryNotFoundError, @@ -1685,7 +1686,7 @@ def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ) except ValueError: raise EnvironmentError( - "We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached " + f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this model, couldn't find it in the cached " f"files and it looks like {pretrained_model_name_or_path} is not the path to a directory " f"containing a file named {TF2_WEIGHTS_NAME} or {WEIGHTS_NAME}.\n" "Checkout your internet connection or see how to run the library in offline mode at " diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py index 21b8f2269110..33401c3c093f 100644 --- a/src/transformers/modeling_utils.py +++ b/src/transformers/modeling_utils.py @@ -40,6 +40,7 @@ from .utils import ( DUMMY_INPUTS, FLAX_WEIGHTS_NAME, + HUGGINGFACE_CO_RESOLVE_ENDPOINT, TF2_WEIGHTS_NAME, TF_WEIGHTS_NAME, WEIGHTS_INDEX_NAME, @@ -331,7 +332,7 @@ def get_checkpoint_shard_files( ) except HTTPError: raise EnvironmentError( - f"We couldn't connect to 'https://huggingface.co/' to load {shard_filename}. You should try again " + f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load {shard_filename}. You should try again " "after checking your internet connection." ) @@ -1749,7 +1750,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P ) except ValueError: raise EnvironmentError( - "We couldn't connect to 'https://huggingface.co/' to load this model, couldn't find it in the cached " + f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this model, couldn't find it in the cached " f"files and it looks like {pretrained_model_name_or_path} is not the path to a directory " f"containing a file named {WEIGHTS_NAME}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or " f"{FLAX_WEIGHTS_NAME}.\n" From 0e1dc49aa3d9829ead6d71ff04d12168929f9e68 Mon Sep 17 00:00:00 2001 From: Daniel Stancl <46073029+stancld@users.noreply.github.com> Date: Mon, 4 Apr 2022 14:54:25 +0200 Subject: [PATCH 17/34] Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556) --- src/transformers/models/xglm/__init__.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/transformers/models/xglm/__init__.py b/src/transformers/models/xglm/__init__.py index ddc79c678769..d5934dea6666 100644 --- a/src/transformers/models/xglm/__init__.py +++ b/src/transformers/models/xglm/__init__.py @@ -67,7 +67,7 @@ from .modeling_xglm import XGLM_PRETRAINED_MODEL_ARCHIVE_LIST, XGLMForCausalLM, XGLMModel, XGLMPreTrainedModel if is_flax_available(): - from .modeling_xglm import FlaxXGLMForCausalLM, FlaxXGLMModel, FlaxXGLMPreTrainedModel + from .modeling_flax_xglm import FlaxXGLMForCausalLM, FlaxXGLMModel, FlaxXGLMPreTrainedModel else: From 90ea20434c62b72d0e56749ff3db8b109b805caf Mon Sep 17 00:00:00 2001 From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Mon, 4 Apr 2022 10:06:57 -0400 Subject: [PATCH 18/34] Add utility to find model labels (#16526) * Add utility to find model labels * Use it in the Trainer * Update src/transformers/utils/generic.py Co-authored-by: Matt * Quality Co-authored-by: Matt --- src/transformers/trainer.py | 8 ++---- src/transformers/utils/__init__.py | 1 + src/transformers/utils/generic.py | 21 ++++++++++++++++ tests/utils/test_file_utils.py | 39 +++++++++++++++++++++++++++--- 4 files changed, 59 insertions(+), 10 deletions(-) diff --git a/src/transformers/trainer.py b/src/transformers/trainer.py index 948697e35127..921b9d27ac08 100755 --- a/src/transformers/trainer.py +++ b/src/transformers/trainer.py @@ -67,7 +67,6 @@ from .dependency_versions_check import dep_version_check from .modelcard import TrainingSummary from .modeling_utils import PreTrainedModel, unwrap_model -from .models.auto.modeling_auto import MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES from .optimization import Adafactor, get_scheduler from .tokenization_utils_base import PreTrainedTokenizerBase from .trainer_callback import ( @@ -124,6 +123,7 @@ from .utils import ( CONFIG_NAME, WEIGHTS_NAME, + find_labels, get_full_repo_name, is_apex_available, is_datasets_available, @@ -495,11 +495,7 @@ def __init__( self.current_flos = 0 self.hp_search_backend = None self.use_tune_checkpoints = False - default_label_names = ( - ["start_positions", "end_positions"] - if type(self.model).__name__ in MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES.values() - else ["labels"] - ) + default_label_names = find_labels(self.model.__class__) self.label_names = default_label_names if self.args.label_names is None else self.args.label_names self.control = self.callback_handler.on_init_end(self.args, self.state, self.control) diff --git a/src/transformers/utils/__init__.py b/src/transformers/utils/__init__.py index af326b53e86c..45364fb8fd33 100644 --- a/src/transformers/utils/__init__.py +++ b/src/transformers/utils/__init__.py @@ -37,6 +37,7 @@ PaddingStrategy, TensorType, cached_property, + find_labels, is_tensor, to_numpy, to_py_obj, diff --git a/src/transformers/utils/generic.py b/src/transformers/utils/generic.py index e455cdc6adb0..bea5b3dd4775 100644 --- a/src/transformers/utils/generic.py +++ b/src/transformers/utils/generic.py @@ -15,6 +15,7 @@ Generic utilities """ +import inspect from collections import OrderedDict, UserDict from contextlib import ExitStack from dataclasses import fields @@ -289,3 +290,23 @@ def __enter__(self): def __exit__(self, *args, **kwargs): self.stack.__exit__(*args, **kwargs) + + +def find_labels(model_class): + """ + Find the labels used by a given model. + + Args: + model_class (`type`): The class of the model. + """ + model_name = model_class.__name__ + if model_name.startswith("TF"): + signature = inspect.signature(model_class.call) + elif model_name.startswith("Flax"): + signature = inspect.signature(model_class.__call__) + else: + signature = inspect.signature(model_class.forward) + if "QuestionAnswering" in model_name: + return [p for p in signature.parameters if "label" in p or p in ("start_positions", "end_positions")] + else: + return [p for p in signature.parameters if "label" in p] diff --git a/tests/utils/test_file_utils.py b/tests/utils/test_file_utils.py index decc7fd17c01..75c4f19caa1d 100644 --- a/tests/utils/test_file_utils.py +++ b/tests/utils/test_file_utils.py @@ -35,10 +35,14 @@ RepositoryNotFoundError, RevisionNotFoundError, filename_to_url, + find_labels, get_file_from_repo, get_from_cache, has_file, hf_bucket_url, + is_flax_available, + is_tf_available, + is_torch_available, ) @@ -158,24 +162,51 @@ def test_get_file_from_repo_local(self): self.assertIsNone(get_file_from_repo(tmp_dir, "b.txt")) -class ContextManagerTests(unittest.TestCase): +class GenericUtilTests(unittest.TestCase): @unittest.mock.patch("sys.stdout", new_callable=io.StringIO) - def test_no_context(self, mock_stdout): + def test_context_managers_no_context(self, mock_stdout): with ContextManagers([]): print("Transformers are awesome!") # The print statement adds a new line at the end of the output self.assertEqual(mock_stdout.getvalue(), "Transformers are awesome!\n") @unittest.mock.patch("sys.stdout", new_callable=io.StringIO) - def test_one_context(self, mock_stdout): + def test_context_managers_one_context(self, mock_stdout): with ContextManagers([context_en()]): print("Transformers are awesome!") # The output should be wrapped with an English welcome and goodbye self.assertEqual(mock_stdout.getvalue(), "Welcome!\nTransformers are awesome!\nBye!\n") @unittest.mock.patch("sys.stdout", new_callable=io.StringIO) - def test_two_context(self, mock_stdout): + def test_context_managers_two_context(self, mock_stdout): with ContextManagers([context_fr(), context_en()]): print("Transformers are awesome!") # The output should be wrapped with an English and French welcome and goodbye self.assertEqual(mock_stdout.getvalue(), "Bonjour!\nWelcome!\nTransformers are awesome!\nBye!\nAu revoir!\n") + + def test_find_labels(self): + if is_torch_available(): + from transformers import BertForPreTraining, BertForQuestionAnswering, BertForSequenceClassification + + self.assertEqual(find_labels(BertForSequenceClassification), ["labels"]) + self.assertEqual(find_labels(BertForPreTraining), ["labels", "next_sentence_label"]) + self.assertEqual(find_labels(BertForQuestionAnswering), ["start_positions", "end_positions"]) + + if is_tf_available(): + from transformers import TFBertForPreTraining, TFBertForQuestionAnswering, TFBertForSequenceClassification + + self.assertEqual(find_labels(TFBertForSequenceClassification), ["labels"]) + self.assertEqual(find_labels(TFBertForPreTraining), ["labels", "next_sentence_label"]) + self.assertEqual(find_labels(TFBertForQuestionAnswering), ["start_positions", "end_positions"]) + + if is_flax_available(): + # Flax models don't have labels + from transformers import ( + FlaxBertForPreTraining, + FlaxBertForQuestionAnswering, + FlaxBertForSequenceClassification, + ) + + self.assertEqual(find_labels(FlaxBertForSequenceClassification), []) + self.assertEqual(find_labels(FlaxBertForPreTraining), []) + self.assertEqual(find_labels(FlaxBertForQuestionAnswering), []) From 02d49ea50275b226b50fb8f34d7bd9268c53e870 Mon Sep 17 00:00:00 2001 From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Mon, 4 Apr 2022 10:25:46 -0400 Subject: [PATCH 19/34] Enable doc in Spanish (#16518) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates --- .github/workflows/build_documentation.yml | 1 + .github/workflows/build_pr_documentation.yml | 1 + docs/source/contributing.md | 1 - docs/source/en/_config.py | 14 +++++++ docs/source/{ => en}/_toctree.yml | 0 docs/source/{ => en}/accelerate.mdx | 0 docs/source/{ => en}/add_new_model.mdx | 0 docs/source/{ => en}/add_new_pipeline.mdx | 0 docs/source/{ => en}/autoclass_tutorial.mdx | 0 docs/source/{ => en}/benchmarks.mdx | 0 docs/source/{ => en}/bertology.mdx | 0 docs/source/{ => en}/community.mdx | 0 docs/source/en/contributing.md | 1 + .../{ => en}/converting_tensorflow_models.mdx | 0 docs/source/{ => en}/create_a_model.mdx | 0 docs/source/{ => en}/custom_models.mdx | 0 docs/source/{ => en}/debugging.mdx | 0 docs/source/{ => en}/fast_tokenizers.mdx | 0 docs/source/{ => en}/glossary.mdx | 0 docs/source/{ => en}/index.mdx | 0 docs/source/{ => en}/installation.mdx | 0 docs/source/{ => en}/internal/file_utils.mdx | 0 .../{ => en}/internal/generation_utils.mdx | 0 .../{ => en}/internal/modeling_utils.mdx | 0 .../{ => en}/internal/pipelines_utils.mdx | 0 .../{ => en}/internal/tokenization_utils.mdx | 0 .../{ => en}/internal/trainer_utils.mdx | 0 .../source/{ => en}/main_classes/callback.mdx | 0 .../{ => en}/main_classes/configuration.mdx | 0 .../{ => en}/main_classes/data_collator.mdx | 0 .../{ => en}/main_classes/deepspeed.mdx | 0 .../main_classes/feature_extractor.mdx | 0 .../{ => en}/main_classes/keras_callbacks.mdx | 0 docs/source/{ => en}/main_classes/logging.mdx | 0 docs/source/{ => en}/main_classes/model.mdx | 0 docs/source/{ => en}/main_classes/onnx.mdx | 0 .../main_classes/optimizer_schedules.mdx | 0 docs/source/{ => en}/main_classes/output.mdx | 0 .../{ => en}/main_classes/pipelines.mdx | 0 .../{ => en}/main_classes/processors.mdx | 0 .../{ => en}/main_classes/text_generation.mdx | 0 .../{ => en}/main_classes/tokenizer.mdx | 0 docs/source/{ => en}/main_classes/trainer.mdx | 0 docs/source/{ => en}/migration.mdx | 0 docs/source/{ => en}/model_doc/albert.mdx | 0 docs/source/{ => en}/model_doc/auto.mdx | 0 docs/source/{ => en}/model_doc/bart.mdx | 0 docs/source/{ => en}/model_doc/barthez.mdx | 0 docs/source/{ => en}/model_doc/bartpho.mdx | 0 docs/source/{ => en}/model_doc/beit.mdx | 0 .../{ => en}/model_doc/bert-generation.mdx | 0 .../{ => en}/model_doc/bert-japanese.mdx | 0 docs/source/{ => en}/model_doc/bert.mdx | 0 docs/source/{ => en}/model_doc/bertweet.mdx | 0 docs/source/{ => en}/model_doc/big_bird.mdx | 0 .../{ => en}/model_doc/bigbird_pegasus.mdx | 0 .../{ => en}/model_doc/blenderbot-small.mdx | 0 docs/source/{ => en}/model_doc/blenderbot.mdx | 0 docs/source/{ => en}/model_doc/bort.mdx | 0 docs/source/{ => en}/model_doc/byt5.mdx | 0 docs/source/{ => en}/model_doc/camembert.mdx | 0 docs/source/{ => en}/model_doc/canine.mdx | 0 docs/source/{ => en}/model_doc/clip.mdx | 0 docs/source/{ => en}/model_doc/convbert.mdx | 0 docs/source/{ => en}/model_doc/convnext.mdx | 0 docs/source/{ => en}/model_doc/cpm.mdx | 0 docs/source/{ => en}/model_doc/ctrl.mdx | 0 docs/source/{ => en}/model_doc/data2vec.mdx | 0 docs/source/{ => en}/model_doc/deberta-v2.mdx | 0 docs/source/{ => en}/model_doc/deberta.mdx | 0 .../model_doc/decision_transformer.mdx | 0 docs/source/{ => en}/model_doc/deit.mdx | 0 docs/source/{ => en}/model_doc/detr.mdx | 0 docs/source/{ => en}/model_doc/dialogpt.mdx | 0 docs/source/{ => en}/model_doc/distilbert.mdx | 0 docs/source/{ => en}/model_doc/dit.mdx | 0 docs/source/{ => en}/model_doc/dpr.mdx | 0 docs/source/{ => en}/model_doc/dpt.mdx | 0 docs/source/{ => en}/model_doc/electra.mdx | 0 .../{ => en}/model_doc/encoder-decoder.mdx | 0 docs/source/{ => en}/model_doc/flaubert.mdx | 0 docs/source/{ => en}/model_doc/fnet.mdx | 0 docs/source/{ => en}/model_doc/fsmt.mdx | 0 docs/source/{ => en}/model_doc/funnel.mdx | 0 docs/source/{ => en}/model_doc/glpn.mdx | 0 docs/source/{ => en}/model_doc/gpt2.mdx | 0 docs/source/{ => en}/model_doc/gpt_neo.mdx | 0 docs/source/{ => en}/model_doc/gptj.mdx | 0 docs/source/{ => en}/model_doc/herbert.mdx | 0 docs/source/{ => en}/model_doc/hubert.mdx | 0 docs/source/{ => en}/model_doc/ibert.mdx | 0 docs/source/{ => en}/model_doc/imagegpt.mdx | 0 docs/source/{ => en}/model_doc/layoutlm.mdx | 0 docs/source/{ => en}/model_doc/layoutlmv2.mdx | 0 docs/source/{ => en}/model_doc/layoutxlm.mdx | 0 docs/source/{ => en}/model_doc/led.mdx | 0 docs/source/{ => en}/model_doc/longformer.mdx | 0 docs/source/{ => en}/model_doc/luke.mdx | 0 docs/source/{ => en}/model_doc/lxmert.mdx | 0 docs/source/{ => en}/model_doc/m2m_100.mdx | 0 docs/source/{ => en}/model_doc/marian.mdx | 0 docs/source/{ => en}/model_doc/maskformer.mdx | 0 docs/source/{ => en}/model_doc/mbart.mdx | 0 .../{ => en}/model_doc/megatron-bert.mdx | 0 .../{ => en}/model_doc/megatron_gpt2.mdx | 0 docs/source/{ => en}/model_doc/mluke.mdx | 0 docs/source/{ => en}/model_doc/mobilebert.mdx | 0 docs/source/{ => en}/model_doc/mpnet.mdx | 0 docs/source/{ => en}/model_doc/mt5.mdx | 0 .../{ => en}/model_doc/nystromformer.mdx | 0 docs/source/{ => en}/model_doc/openai-gpt.mdx | 0 docs/source/{ => en}/model_doc/pegasus.mdx | 0 docs/source/{ => en}/model_doc/perceiver.mdx | 0 docs/source/{ => en}/model_doc/phobert.mdx | 0 docs/source/{ => en}/model_doc/plbart.mdx | 0 docs/source/{ => en}/model_doc/poolformer.mdx | 0 docs/source/{ => en}/model_doc/prophetnet.mdx | 0 docs/source/{ => en}/model_doc/qdqbert.mdx | 0 docs/source/{ => en}/model_doc/rag.mdx | 0 docs/source/{ => en}/model_doc/realm.mdx | 0 docs/source/{ => en}/model_doc/reformer.mdx | 0 docs/source/{ => en}/model_doc/rembert.mdx | 0 docs/source/{ => en}/model_doc/resnet.mdx | 0 docs/source/{ => en}/model_doc/retribert.mdx | 0 docs/source/{ => en}/model_doc/roberta.mdx | 0 docs/source/{ => en}/model_doc/roformer.mdx | 0 docs/source/{ => en}/model_doc/segformer.mdx | 0 docs/source/{ => en}/model_doc/sew-d.mdx | 0 docs/source/{ => en}/model_doc/sew.mdx | 0 .../model_doc/speech-encoder-decoder.mdx | 0 .../{ => en}/model_doc/speech_to_text.mdx | 0 .../{ => en}/model_doc/speech_to_text_2.mdx | 0 docs/source/{ => en}/model_doc/splinter.mdx | 0 .../source/{ => en}/model_doc/squeezebert.mdx | 0 docs/source/{ => en}/model_doc/swin.mdx | 0 docs/source/{ => en}/model_doc/t5.mdx | 0 docs/source/{ => en}/model_doc/t5v1.1.mdx | 0 docs/source/{ => en}/model_doc/tapas.mdx | 0 docs/source/{ => en}/model_doc/transfo-xl.mdx | 0 docs/source/{ => en}/model_doc/trocr.mdx | 0 .../{ => en}/model_doc/unispeech-sat.mdx | 0 docs/source/{ => en}/model_doc/unispeech.mdx | 0 docs/source/{ => en}/model_doc/van.mdx | 0 docs/source/{ => en}/model_doc/vilt.mdx | 0 .../model_doc/vision-encoder-decoder.mdx | 0 .../model_doc/vision-text-dual-encoder.mdx | 0 .../source/{ => en}/model_doc/visual_bert.mdx | 0 docs/source/{ => en}/model_doc/vit.mdx | 0 docs/source/{ => en}/model_doc/vit_mae.mdx | 0 docs/source/{ => en}/model_doc/wav2vec2.mdx | 0 .../{ => en}/model_doc/wav2vec2_phoneme.mdx | 0 docs/source/{ => en}/model_doc/wavlm.mdx | 0 docs/source/{ => en}/model_doc/xglm.mdx | 0 .../{ => en}/model_doc/xlm-prophetnet.mdx | 0 .../{ => en}/model_doc/xlm-roberta-xl.mdx | 0 .../source/{ => en}/model_doc/xlm-roberta.mdx | 0 docs/source/{ => en}/model_doc/xlm.mdx | 0 docs/source/{ => en}/model_doc/xlnet.mdx | 0 docs/source/{ => en}/model_doc/xls_r.mdx | 0 .../{ => en}/model_doc/xlsr_wav2vec2.mdx | 0 docs/source/{ => en}/model_doc/yoso.mdx | 0 docs/source/{ => en}/model_sharing.mdx | 0 docs/source/{ => en}/model_summary.mdx | 0 docs/source/{ => en}/multilingual.mdx | 0 docs/source/en/notebooks.md | 1 + docs/source/{ => en}/pad_truncation.mdx | 0 docs/source/{ => en}/parallelism.mdx | 0 docs/source/{ => en}/performance.mdx | 0 docs/source/{ => en}/perplexity.mdx | 0 docs/source/{ => en}/philosophy.mdx | 0 docs/source/{ => en}/pipeline_tutorial.mdx | 0 docs/source/{ => en}/pr_checks.mdx | 0 docs/source/{ => en}/preprocessing.mdx | 0 docs/source/{ => en}/quicktour.mdx | 0 docs/source/{ => en}/run_scripts.mdx | 0 docs/source/{ => en}/sagemaker.mdx | 0 docs/source/{ => en}/serialization.mdx | 0 docs/source/{ => en}/task_summary.mdx | 0 docs/source/{ => en}/tasks/asr.mdx | 0 .../{ => en}/tasks/audio_classification.mdx | 0 .../{ => en}/tasks/image_classification.mdx | 0 .../{ => en}/tasks/language_modeling.mdx | 0 .../source/{ => en}/tasks/multiple_choice.mdx | 0 .../{ => en}/tasks/question_answering.mdx | 0 .../tasks/sequence_classification.mdx | 0 docs/source/{ => en}/tasks/summarization.mdx | 0 .../{ => en}/tasks/token_classification.mdx | 0 docs/source/{ => en}/tasks/translation.mdx | 0 docs/source/{ => en}/testing.mdx | 0 docs/source/{ => en}/tokenizer_summary.mdx | 0 docs/source/{ => en}/training.mdx | 0 docs/source/{ => en}/troubleshooting.mdx | 0 docs/source/es/_config.py | 14 +++++++ docs/source/es/_toctree.yml | 17 +++++++++ docs/{source_es => source/es}/accelerate.mdx | 0 .../{source_es => source/es}/installation.mdx | 38 ++++++++----------- .../{source_es => source/es}/multilingual.mdx | 0 .../es}/pipeline_tutorial.mdx | 0 docs/{source_es => source/es}/quicktour.mdx | 0 docs/{source_es => source/es}/training.mdx | 0 docs/source/notebooks.md | 1 - src/transformers/commands/add_new_model.py | 2 +- .../commands/add_new_model_like.py | 4 +- utils/check_copies.py | 2 +- utils/check_repo.py | 2 +- utils/check_table.py | 2 +- 206 files changed, 71 insertions(+), 30 deletions(-) delete mode 120000 docs/source/contributing.md create mode 100644 docs/source/en/_config.py rename docs/source/{ => en}/_toctree.yml (100%) rename docs/source/{ => en}/accelerate.mdx (100%) rename docs/source/{ => en}/add_new_model.mdx (100%) rename docs/source/{ => en}/add_new_pipeline.mdx (100%) rename docs/source/{ => en}/autoclass_tutorial.mdx (100%) rename docs/source/{ => en}/benchmarks.mdx (100%) rename docs/source/{ => en}/bertology.mdx (100%) rename docs/source/{ => en}/community.mdx (100%) create mode 120000 docs/source/en/contributing.md rename docs/source/{ => en}/converting_tensorflow_models.mdx (100%) rename docs/source/{ => en}/create_a_model.mdx (100%) rename docs/source/{ => en}/custom_models.mdx (100%) rename docs/source/{ => en}/debugging.mdx (100%) rename docs/source/{ => en}/fast_tokenizers.mdx (100%) rename docs/source/{ => en}/glossary.mdx (100%) rename docs/source/{ => en}/index.mdx (100%) rename docs/source/{ => en}/installation.mdx (100%) rename docs/source/{ => en}/internal/file_utils.mdx (100%) rename docs/source/{ => en}/internal/generation_utils.mdx (100%) rename docs/source/{ => en}/internal/modeling_utils.mdx (100%) rename docs/source/{ => en}/internal/pipelines_utils.mdx (100%) rename docs/source/{ => en}/internal/tokenization_utils.mdx (100%) rename docs/source/{ => en}/internal/trainer_utils.mdx (100%) rename docs/source/{ => en}/main_classes/callback.mdx (100%) rename docs/source/{ => en}/main_classes/configuration.mdx (100%) rename docs/source/{ => en}/main_classes/data_collator.mdx (100%) rename docs/source/{ => en}/main_classes/deepspeed.mdx (100%) rename docs/source/{ => en}/main_classes/feature_extractor.mdx (100%) rename docs/source/{ => en}/main_classes/keras_callbacks.mdx (100%) rename docs/source/{ => en}/main_classes/logging.mdx (100%) rename docs/source/{ => en}/main_classes/model.mdx (100%) rename docs/source/{ => en}/main_classes/onnx.mdx (100%) rename docs/source/{ => en}/main_classes/optimizer_schedules.mdx (100%) rename docs/source/{ => en}/main_classes/output.mdx (100%) rename docs/source/{ => en}/main_classes/pipelines.mdx (100%) rename docs/source/{ => en}/main_classes/processors.mdx (100%) rename docs/source/{ => en}/main_classes/text_generation.mdx (100%) rename docs/source/{ => en}/main_classes/tokenizer.mdx (100%) rename docs/source/{ => en}/main_classes/trainer.mdx (100%) rename docs/source/{ => en}/migration.mdx (100%) rename docs/source/{ => en}/model_doc/albert.mdx (100%) rename docs/source/{ => en}/model_doc/auto.mdx (100%) rename docs/source/{ => en}/model_doc/bart.mdx (100%) rename docs/source/{ => en}/model_doc/barthez.mdx (100%) rename docs/source/{ => en}/model_doc/bartpho.mdx (100%) rename docs/source/{ => en}/model_doc/beit.mdx (100%) rename docs/source/{ => en}/model_doc/bert-generation.mdx (100%) rename docs/source/{ => en}/model_doc/bert-japanese.mdx (100%) rename docs/source/{ => en}/model_doc/bert.mdx (100%) rename docs/source/{ => en}/model_doc/bertweet.mdx (100%) rename docs/source/{ => en}/model_doc/big_bird.mdx (100%) rename docs/source/{ => en}/model_doc/bigbird_pegasus.mdx (100%) rename docs/source/{ => en}/model_doc/blenderbot-small.mdx (100%) rename docs/source/{ => en}/model_doc/blenderbot.mdx (100%) rename docs/source/{ => en}/model_doc/bort.mdx (100%) rename docs/source/{ => en}/model_doc/byt5.mdx (100%) rename docs/source/{ => en}/model_doc/camembert.mdx (100%) rename docs/source/{ => en}/model_doc/canine.mdx (100%) rename docs/source/{ => en}/model_doc/clip.mdx (100%) rename docs/source/{ => en}/model_doc/convbert.mdx (100%) rename docs/source/{ => en}/model_doc/convnext.mdx (100%) rename docs/source/{ => en}/model_doc/cpm.mdx (100%) rename docs/source/{ => en}/model_doc/ctrl.mdx (100%) rename docs/source/{ => en}/model_doc/data2vec.mdx (100%) rename docs/source/{ => en}/model_doc/deberta-v2.mdx (100%) rename docs/source/{ => en}/model_doc/deberta.mdx (100%) rename docs/source/{ => en}/model_doc/decision_transformer.mdx (100%) rename docs/source/{ => en}/model_doc/deit.mdx (100%) rename docs/source/{ => en}/model_doc/detr.mdx (100%) rename docs/source/{ => en}/model_doc/dialogpt.mdx (100%) rename docs/source/{ => en}/model_doc/distilbert.mdx (100%) rename docs/source/{ => en}/model_doc/dit.mdx (100%) rename docs/source/{ => en}/model_doc/dpr.mdx (100%) rename docs/source/{ => en}/model_doc/dpt.mdx (100%) rename docs/source/{ => en}/model_doc/electra.mdx (100%) rename docs/source/{ => en}/model_doc/encoder-decoder.mdx (100%) rename docs/source/{ => en}/model_doc/flaubert.mdx (100%) rename docs/source/{ => en}/model_doc/fnet.mdx (100%) rename docs/source/{ => en}/model_doc/fsmt.mdx (100%) rename docs/source/{ => en}/model_doc/funnel.mdx (100%) rename docs/source/{ => en}/model_doc/glpn.mdx (100%) rename docs/source/{ => en}/model_doc/gpt2.mdx (100%) rename docs/source/{ => en}/model_doc/gpt_neo.mdx (100%) rename docs/source/{ => en}/model_doc/gptj.mdx (100%) rename docs/source/{ => en}/model_doc/herbert.mdx (100%) rename docs/source/{ => en}/model_doc/hubert.mdx (100%) rename docs/source/{ => en}/model_doc/ibert.mdx (100%) rename docs/source/{ => en}/model_doc/imagegpt.mdx (100%) rename docs/source/{ => en}/model_doc/layoutlm.mdx (100%) rename docs/source/{ => en}/model_doc/layoutlmv2.mdx (100%) rename docs/source/{ => en}/model_doc/layoutxlm.mdx (100%) rename docs/source/{ => en}/model_doc/led.mdx (100%) rename docs/source/{ => en}/model_doc/longformer.mdx (100%) rename docs/source/{ => en}/model_doc/luke.mdx (100%) rename docs/source/{ => en}/model_doc/lxmert.mdx (100%) rename docs/source/{ => en}/model_doc/m2m_100.mdx (100%) rename docs/source/{ => en}/model_doc/marian.mdx (100%) rename docs/source/{ => en}/model_doc/maskformer.mdx (100%) rename docs/source/{ => en}/model_doc/mbart.mdx (100%) rename docs/source/{ => en}/model_doc/megatron-bert.mdx (100%) rename docs/source/{ => en}/model_doc/megatron_gpt2.mdx (100%) rename docs/source/{ => en}/model_doc/mluke.mdx (100%) rename docs/source/{ => en}/model_doc/mobilebert.mdx (100%) rename docs/source/{ => en}/model_doc/mpnet.mdx (100%) rename docs/source/{ => en}/model_doc/mt5.mdx (100%) rename docs/source/{ => en}/model_doc/nystromformer.mdx (100%) rename docs/source/{ => en}/model_doc/openai-gpt.mdx (100%) rename docs/source/{ => en}/model_doc/pegasus.mdx (100%) rename docs/source/{ => en}/model_doc/perceiver.mdx (100%) rename docs/source/{ => en}/model_doc/phobert.mdx (100%) rename docs/source/{ => en}/model_doc/plbart.mdx (100%) rename docs/source/{ => en}/model_doc/poolformer.mdx (100%) rename docs/source/{ => en}/model_doc/prophetnet.mdx (100%) rename docs/source/{ => en}/model_doc/qdqbert.mdx (100%) rename docs/source/{ => en}/model_doc/rag.mdx (100%) rename docs/source/{ => en}/model_doc/realm.mdx (100%) rename docs/source/{ => en}/model_doc/reformer.mdx (100%) rename docs/source/{ => en}/model_doc/rembert.mdx (100%) rename docs/source/{ => en}/model_doc/resnet.mdx (100%) rename docs/source/{ => en}/model_doc/retribert.mdx (100%) rename docs/source/{ => en}/model_doc/roberta.mdx (100%) rename docs/source/{ => en}/model_doc/roformer.mdx (100%) rename docs/source/{ => en}/model_doc/segformer.mdx (100%) rename docs/source/{ => en}/model_doc/sew-d.mdx (100%) rename docs/source/{ => en}/model_doc/sew.mdx (100%) rename docs/source/{ => en}/model_doc/speech-encoder-decoder.mdx (100%) rename docs/source/{ => en}/model_doc/speech_to_text.mdx (100%) rename docs/source/{ => en}/model_doc/speech_to_text_2.mdx (100%) rename docs/source/{ => en}/model_doc/splinter.mdx (100%) rename docs/source/{ => en}/model_doc/squeezebert.mdx (100%) rename docs/source/{ => en}/model_doc/swin.mdx (100%) rename docs/source/{ => en}/model_doc/t5.mdx (100%) rename docs/source/{ => en}/model_doc/t5v1.1.mdx (100%) rename docs/source/{ => en}/model_doc/tapas.mdx (100%) rename docs/source/{ => en}/model_doc/transfo-xl.mdx (100%) rename docs/source/{ => en}/model_doc/trocr.mdx (100%) rename docs/source/{ => en}/model_doc/unispeech-sat.mdx (100%) rename docs/source/{ => en}/model_doc/unispeech.mdx (100%) rename docs/source/{ => en}/model_doc/van.mdx (100%) rename docs/source/{ => en}/model_doc/vilt.mdx (100%) rename docs/source/{ => en}/model_doc/vision-encoder-decoder.mdx (100%) rename docs/source/{ => en}/model_doc/vision-text-dual-encoder.mdx (100%) rename docs/source/{ => en}/model_doc/visual_bert.mdx (100%) rename docs/source/{ => en}/model_doc/vit.mdx (100%) rename docs/source/{ => en}/model_doc/vit_mae.mdx (100%) rename docs/source/{ => en}/model_doc/wav2vec2.mdx (100%) rename docs/source/{ => en}/model_doc/wav2vec2_phoneme.mdx (100%) rename docs/source/{ => en}/model_doc/wavlm.mdx (100%) rename docs/source/{ => en}/model_doc/xglm.mdx (100%) rename docs/source/{ => en}/model_doc/xlm-prophetnet.mdx (100%) rename docs/source/{ => en}/model_doc/xlm-roberta-xl.mdx (100%) rename docs/source/{ => en}/model_doc/xlm-roberta.mdx (100%) rename docs/source/{ => en}/model_doc/xlm.mdx (100%) rename docs/source/{ => en}/model_doc/xlnet.mdx (100%) rename docs/source/{ => en}/model_doc/xls_r.mdx (100%) rename docs/source/{ => en}/model_doc/xlsr_wav2vec2.mdx (100%) rename docs/source/{ => en}/model_doc/yoso.mdx (100%) rename docs/source/{ => en}/model_sharing.mdx (100%) rename docs/source/{ => en}/model_summary.mdx (100%) rename docs/source/{ => en}/multilingual.mdx (100%) create mode 120000 docs/source/en/notebooks.md rename docs/source/{ => en}/pad_truncation.mdx (100%) rename docs/source/{ => en}/parallelism.mdx (100%) rename docs/source/{ => en}/performance.mdx (100%) rename docs/source/{ => en}/perplexity.mdx (100%) rename docs/source/{ => en}/philosophy.mdx (100%) rename docs/source/{ => en}/pipeline_tutorial.mdx (100%) rename docs/source/{ => en}/pr_checks.mdx (100%) rename docs/source/{ => en}/preprocessing.mdx (100%) rename docs/source/{ => en}/quicktour.mdx (100%) rename docs/source/{ => en}/run_scripts.mdx (100%) rename docs/source/{ => en}/sagemaker.mdx (100%) rename docs/source/{ => en}/serialization.mdx (100%) rename docs/source/{ => en}/task_summary.mdx (100%) rename docs/source/{ => en}/tasks/asr.mdx (100%) rename docs/source/{ => en}/tasks/audio_classification.mdx (100%) rename docs/source/{ => en}/tasks/image_classification.mdx (100%) rename docs/source/{ => en}/tasks/language_modeling.mdx (100%) rename docs/source/{ => en}/tasks/multiple_choice.mdx (100%) rename docs/source/{ => en}/tasks/question_answering.mdx (100%) rename docs/source/{ => en}/tasks/sequence_classification.mdx (100%) rename docs/source/{ => en}/tasks/summarization.mdx (100%) rename docs/source/{ => en}/tasks/token_classification.mdx (100%) rename docs/source/{ => en}/tasks/translation.mdx (100%) rename docs/source/{ => en}/testing.mdx (100%) rename docs/source/{ => en}/tokenizer_summary.mdx (100%) rename docs/source/{ => en}/training.mdx (100%) rename docs/source/{ => en}/troubleshooting.mdx (100%) create mode 100644 docs/source/es/_config.py create mode 100644 docs/source/es/_toctree.yml rename docs/{source_es => source/es}/accelerate.mdx (100%) rename docs/{source_es => source/es}/installation.mdx (92%) rename docs/{source_es => source/es}/multilingual.mdx (100%) rename docs/{source_es => source/es}/pipeline_tutorial.mdx (100%) rename docs/{source_es => source/es}/quicktour.mdx (100%) rename docs/{source_es => source/es}/training.mdx (100%) delete mode 120000 docs/source/notebooks.md diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index 4d02ef020cf5..f69edb4e897f 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -15,5 +15,6 @@ jobs: commit_sha: ${{ github.sha }} package: transformers notebook_folder: transformers_doc + languages: en es secrets: token: ${{ secrets.HUGGINGFACE_PUSH }} diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml index 2225b9cb7083..95bce32bbac0 100644 --- a/.github/workflows/build_pr_documentation.yml +++ b/.github/workflows/build_pr_documentation.yml @@ -14,3 +14,4 @@ jobs: commit_sha: ${{ github.event.pull_request.head.sha }} pr_number: ${{ github.event.number }} package: transformers + languages: en es diff --git a/docs/source/contributing.md b/docs/source/contributing.md deleted file mode 120000 index f939e75f21a8..000000000000 --- a/docs/source/contributing.md +++ /dev/null @@ -1 +0,0 @@ -../../CONTRIBUTING.md \ No newline at end of file diff --git a/docs/source/en/_config.py b/docs/source/en/_config.py new file mode 100644 index 000000000000..cd76263e9a5c --- /dev/null +++ b/docs/source/en/_config.py @@ -0,0 +1,14 @@ +# docstyle-ignore +INSTALL_CONTENT = """ +# Transformers installation +! pip install transformers datasets +# To install from source instead of the last release, comment the command above and uncomment the following one. +# ! pip install git+https://github.com/huggingface/transformers.git +""" + +notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}] +black_avoid_patterns = { + "{processor_class}": "FakeProcessorClass", + "{model_class}": "FakeModelClass", + "{object_class}": "FakeObjectClass", +} diff --git a/docs/source/_toctree.yml b/docs/source/en/_toctree.yml similarity index 100% rename from docs/source/_toctree.yml rename to docs/source/en/_toctree.yml diff --git a/docs/source/accelerate.mdx b/docs/source/en/accelerate.mdx similarity index 100% rename from docs/source/accelerate.mdx rename to docs/source/en/accelerate.mdx diff --git a/docs/source/add_new_model.mdx b/docs/source/en/add_new_model.mdx similarity index 100% rename from docs/source/add_new_model.mdx rename to docs/source/en/add_new_model.mdx diff --git a/docs/source/add_new_pipeline.mdx b/docs/source/en/add_new_pipeline.mdx similarity index 100% rename from docs/source/add_new_pipeline.mdx rename to docs/source/en/add_new_pipeline.mdx diff --git a/docs/source/autoclass_tutorial.mdx b/docs/source/en/autoclass_tutorial.mdx similarity index 100% rename from docs/source/autoclass_tutorial.mdx rename to docs/source/en/autoclass_tutorial.mdx diff --git a/docs/source/benchmarks.mdx b/docs/source/en/benchmarks.mdx similarity index 100% rename from docs/source/benchmarks.mdx rename to docs/source/en/benchmarks.mdx diff --git a/docs/source/bertology.mdx b/docs/source/en/bertology.mdx similarity index 100% rename from docs/source/bertology.mdx rename to docs/source/en/bertology.mdx diff --git a/docs/source/community.mdx b/docs/source/en/community.mdx similarity index 100% rename from docs/source/community.mdx rename to docs/source/en/community.mdx diff --git a/docs/source/en/contributing.md b/docs/source/en/contributing.md new file mode 120000 index 000000000000..c97564d93a7f --- /dev/null +++ b/docs/source/en/contributing.md @@ -0,0 +1 @@ +../../../CONTRIBUTING.md \ No newline at end of file diff --git a/docs/source/converting_tensorflow_models.mdx b/docs/source/en/converting_tensorflow_models.mdx similarity index 100% rename from docs/source/converting_tensorflow_models.mdx rename to docs/source/en/converting_tensorflow_models.mdx diff --git a/docs/source/create_a_model.mdx b/docs/source/en/create_a_model.mdx similarity index 100% rename from docs/source/create_a_model.mdx rename to docs/source/en/create_a_model.mdx diff --git a/docs/source/custom_models.mdx b/docs/source/en/custom_models.mdx similarity index 100% rename from docs/source/custom_models.mdx rename to docs/source/en/custom_models.mdx diff --git a/docs/source/debugging.mdx b/docs/source/en/debugging.mdx similarity index 100% rename from docs/source/debugging.mdx rename to docs/source/en/debugging.mdx diff --git a/docs/source/fast_tokenizers.mdx b/docs/source/en/fast_tokenizers.mdx similarity index 100% rename from docs/source/fast_tokenizers.mdx rename to docs/source/en/fast_tokenizers.mdx diff --git a/docs/source/glossary.mdx b/docs/source/en/glossary.mdx similarity index 100% rename from docs/source/glossary.mdx rename to docs/source/en/glossary.mdx diff --git a/docs/source/index.mdx b/docs/source/en/index.mdx similarity index 100% rename from docs/source/index.mdx rename to docs/source/en/index.mdx diff --git a/docs/source/installation.mdx b/docs/source/en/installation.mdx similarity index 100% rename from docs/source/installation.mdx rename to docs/source/en/installation.mdx diff --git a/docs/source/internal/file_utils.mdx b/docs/source/en/internal/file_utils.mdx similarity index 100% rename from docs/source/internal/file_utils.mdx rename to docs/source/en/internal/file_utils.mdx diff --git a/docs/source/internal/generation_utils.mdx b/docs/source/en/internal/generation_utils.mdx similarity index 100% rename from docs/source/internal/generation_utils.mdx rename to docs/source/en/internal/generation_utils.mdx diff --git a/docs/source/internal/modeling_utils.mdx b/docs/source/en/internal/modeling_utils.mdx similarity index 100% rename from docs/source/internal/modeling_utils.mdx rename to docs/source/en/internal/modeling_utils.mdx diff --git a/docs/source/internal/pipelines_utils.mdx b/docs/source/en/internal/pipelines_utils.mdx similarity index 100% rename from docs/source/internal/pipelines_utils.mdx rename to docs/source/en/internal/pipelines_utils.mdx diff --git a/docs/source/internal/tokenization_utils.mdx b/docs/source/en/internal/tokenization_utils.mdx similarity index 100% rename from docs/source/internal/tokenization_utils.mdx rename to docs/source/en/internal/tokenization_utils.mdx diff --git a/docs/source/internal/trainer_utils.mdx b/docs/source/en/internal/trainer_utils.mdx similarity index 100% rename from docs/source/internal/trainer_utils.mdx rename to docs/source/en/internal/trainer_utils.mdx diff --git a/docs/source/main_classes/callback.mdx b/docs/source/en/main_classes/callback.mdx similarity index 100% rename from docs/source/main_classes/callback.mdx rename to docs/source/en/main_classes/callback.mdx diff --git a/docs/source/main_classes/configuration.mdx b/docs/source/en/main_classes/configuration.mdx similarity index 100% rename from docs/source/main_classes/configuration.mdx rename to docs/source/en/main_classes/configuration.mdx diff --git a/docs/source/main_classes/data_collator.mdx b/docs/source/en/main_classes/data_collator.mdx similarity index 100% rename from docs/source/main_classes/data_collator.mdx rename to docs/source/en/main_classes/data_collator.mdx diff --git a/docs/source/main_classes/deepspeed.mdx b/docs/source/en/main_classes/deepspeed.mdx similarity index 100% rename from docs/source/main_classes/deepspeed.mdx rename to docs/source/en/main_classes/deepspeed.mdx diff --git a/docs/source/main_classes/feature_extractor.mdx b/docs/source/en/main_classes/feature_extractor.mdx similarity index 100% rename from docs/source/main_classes/feature_extractor.mdx rename to docs/source/en/main_classes/feature_extractor.mdx diff --git a/docs/source/main_classes/keras_callbacks.mdx b/docs/source/en/main_classes/keras_callbacks.mdx similarity index 100% rename from docs/source/main_classes/keras_callbacks.mdx rename to docs/source/en/main_classes/keras_callbacks.mdx diff --git a/docs/source/main_classes/logging.mdx b/docs/source/en/main_classes/logging.mdx similarity index 100% rename from docs/source/main_classes/logging.mdx rename to docs/source/en/main_classes/logging.mdx diff --git a/docs/source/main_classes/model.mdx b/docs/source/en/main_classes/model.mdx similarity index 100% rename from docs/source/main_classes/model.mdx rename to docs/source/en/main_classes/model.mdx diff --git a/docs/source/main_classes/onnx.mdx b/docs/source/en/main_classes/onnx.mdx similarity index 100% rename from docs/source/main_classes/onnx.mdx rename to docs/source/en/main_classes/onnx.mdx diff --git a/docs/source/main_classes/optimizer_schedules.mdx b/docs/source/en/main_classes/optimizer_schedules.mdx similarity index 100% rename from docs/source/main_classes/optimizer_schedules.mdx rename to docs/source/en/main_classes/optimizer_schedules.mdx diff --git a/docs/source/main_classes/output.mdx b/docs/source/en/main_classes/output.mdx similarity index 100% rename from docs/source/main_classes/output.mdx rename to docs/source/en/main_classes/output.mdx diff --git a/docs/source/main_classes/pipelines.mdx b/docs/source/en/main_classes/pipelines.mdx similarity index 100% rename from docs/source/main_classes/pipelines.mdx rename to docs/source/en/main_classes/pipelines.mdx diff --git a/docs/source/main_classes/processors.mdx b/docs/source/en/main_classes/processors.mdx similarity index 100% rename from docs/source/main_classes/processors.mdx rename to docs/source/en/main_classes/processors.mdx diff --git a/docs/source/main_classes/text_generation.mdx b/docs/source/en/main_classes/text_generation.mdx similarity index 100% rename from docs/source/main_classes/text_generation.mdx rename to docs/source/en/main_classes/text_generation.mdx diff --git a/docs/source/main_classes/tokenizer.mdx b/docs/source/en/main_classes/tokenizer.mdx similarity index 100% rename from docs/source/main_classes/tokenizer.mdx rename to docs/source/en/main_classes/tokenizer.mdx diff --git a/docs/source/main_classes/trainer.mdx b/docs/source/en/main_classes/trainer.mdx similarity index 100% rename from docs/source/main_classes/trainer.mdx rename to docs/source/en/main_classes/trainer.mdx diff --git a/docs/source/migration.mdx b/docs/source/en/migration.mdx similarity index 100% rename from docs/source/migration.mdx rename to docs/source/en/migration.mdx diff --git a/docs/source/model_doc/albert.mdx b/docs/source/en/model_doc/albert.mdx similarity index 100% rename from docs/source/model_doc/albert.mdx rename to docs/source/en/model_doc/albert.mdx diff --git a/docs/source/model_doc/auto.mdx b/docs/source/en/model_doc/auto.mdx similarity index 100% rename from docs/source/model_doc/auto.mdx rename to docs/source/en/model_doc/auto.mdx diff --git a/docs/source/model_doc/bart.mdx b/docs/source/en/model_doc/bart.mdx similarity index 100% rename from docs/source/model_doc/bart.mdx rename to docs/source/en/model_doc/bart.mdx diff --git a/docs/source/model_doc/barthez.mdx b/docs/source/en/model_doc/barthez.mdx similarity index 100% rename from docs/source/model_doc/barthez.mdx rename to docs/source/en/model_doc/barthez.mdx diff --git a/docs/source/model_doc/bartpho.mdx b/docs/source/en/model_doc/bartpho.mdx similarity index 100% rename from docs/source/model_doc/bartpho.mdx rename to docs/source/en/model_doc/bartpho.mdx diff --git a/docs/source/model_doc/beit.mdx b/docs/source/en/model_doc/beit.mdx similarity index 100% rename from docs/source/model_doc/beit.mdx rename to docs/source/en/model_doc/beit.mdx diff --git a/docs/source/model_doc/bert-generation.mdx b/docs/source/en/model_doc/bert-generation.mdx similarity index 100% rename from docs/source/model_doc/bert-generation.mdx rename to docs/source/en/model_doc/bert-generation.mdx diff --git a/docs/source/model_doc/bert-japanese.mdx b/docs/source/en/model_doc/bert-japanese.mdx similarity index 100% rename from docs/source/model_doc/bert-japanese.mdx rename to docs/source/en/model_doc/bert-japanese.mdx diff --git a/docs/source/model_doc/bert.mdx b/docs/source/en/model_doc/bert.mdx similarity index 100% rename from docs/source/model_doc/bert.mdx rename to docs/source/en/model_doc/bert.mdx diff --git a/docs/source/model_doc/bertweet.mdx b/docs/source/en/model_doc/bertweet.mdx similarity index 100% rename from docs/source/model_doc/bertweet.mdx rename to docs/source/en/model_doc/bertweet.mdx diff --git a/docs/source/model_doc/big_bird.mdx b/docs/source/en/model_doc/big_bird.mdx similarity index 100% rename from docs/source/model_doc/big_bird.mdx rename to docs/source/en/model_doc/big_bird.mdx diff --git a/docs/source/model_doc/bigbird_pegasus.mdx b/docs/source/en/model_doc/bigbird_pegasus.mdx similarity index 100% rename from docs/source/model_doc/bigbird_pegasus.mdx rename to docs/source/en/model_doc/bigbird_pegasus.mdx diff --git a/docs/source/model_doc/blenderbot-small.mdx b/docs/source/en/model_doc/blenderbot-small.mdx similarity index 100% rename from docs/source/model_doc/blenderbot-small.mdx rename to docs/source/en/model_doc/blenderbot-small.mdx diff --git a/docs/source/model_doc/blenderbot.mdx b/docs/source/en/model_doc/blenderbot.mdx similarity index 100% rename from docs/source/model_doc/blenderbot.mdx rename to docs/source/en/model_doc/blenderbot.mdx diff --git a/docs/source/model_doc/bort.mdx b/docs/source/en/model_doc/bort.mdx similarity index 100% rename from docs/source/model_doc/bort.mdx rename to docs/source/en/model_doc/bort.mdx diff --git a/docs/source/model_doc/byt5.mdx b/docs/source/en/model_doc/byt5.mdx similarity index 100% rename from docs/source/model_doc/byt5.mdx rename to docs/source/en/model_doc/byt5.mdx diff --git a/docs/source/model_doc/camembert.mdx b/docs/source/en/model_doc/camembert.mdx similarity index 100% rename from docs/source/model_doc/camembert.mdx rename to docs/source/en/model_doc/camembert.mdx diff --git a/docs/source/model_doc/canine.mdx b/docs/source/en/model_doc/canine.mdx similarity index 100% rename from docs/source/model_doc/canine.mdx rename to docs/source/en/model_doc/canine.mdx diff --git a/docs/source/model_doc/clip.mdx b/docs/source/en/model_doc/clip.mdx similarity index 100% rename from docs/source/model_doc/clip.mdx rename to docs/source/en/model_doc/clip.mdx diff --git a/docs/source/model_doc/convbert.mdx b/docs/source/en/model_doc/convbert.mdx similarity index 100% rename from docs/source/model_doc/convbert.mdx rename to docs/source/en/model_doc/convbert.mdx diff --git a/docs/source/model_doc/convnext.mdx b/docs/source/en/model_doc/convnext.mdx similarity index 100% rename from docs/source/model_doc/convnext.mdx rename to docs/source/en/model_doc/convnext.mdx diff --git a/docs/source/model_doc/cpm.mdx b/docs/source/en/model_doc/cpm.mdx similarity index 100% rename from docs/source/model_doc/cpm.mdx rename to docs/source/en/model_doc/cpm.mdx diff --git a/docs/source/model_doc/ctrl.mdx b/docs/source/en/model_doc/ctrl.mdx similarity index 100% rename from docs/source/model_doc/ctrl.mdx rename to docs/source/en/model_doc/ctrl.mdx diff --git a/docs/source/model_doc/data2vec.mdx b/docs/source/en/model_doc/data2vec.mdx similarity index 100% rename from docs/source/model_doc/data2vec.mdx rename to docs/source/en/model_doc/data2vec.mdx diff --git a/docs/source/model_doc/deberta-v2.mdx b/docs/source/en/model_doc/deberta-v2.mdx similarity index 100% rename from docs/source/model_doc/deberta-v2.mdx rename to docs/source/en/model_doc/deberta-v2.mdx diff --git a/docs/source/model_doc/deberta.mdx b/docs/source/en/model_doc/deberta.mdx similarity index 100% rename from docs/source/model_doc/deberta.mdx rename to docs/source/en/model_doc/deberta.mdx diff --git a/docs/source/model_doc/decision_transformer.mdx b/docs/source/en/model_doc/decision_transformer.mdx similarity index 100% rename from docs/source/model_doc/decision_transformer.mdx rename to docs/source/en/model_doc/decision_transformer.mdx diff --git a/docs/source/model_doc/deit.mdx b/docs/source/en/model_doc/deit.mdx similarity index 100% rename from docs/source/model_doc/deit.mdx rename to docs/source/en/model_doc/deit.mdx diff --git a/docs/source/model_doc/detr.mdx b/docs/source/en/model_doc/detr.mdx similarity index 100% rename from docs/source/model_doc/detr.mdx rename to docs/source/en/model_doc/detr.mdx diff --git a/docs/source/model_doc/dialogpt.mdx b/docs/source/en/model_doc/dialogpt.mdx similarity index 100% rename from docs/source/model_doc/dialogpt.mdx rename to docs/source/en/model_doc/dialogpt.mdx diff --git a/docs/source/model_doc/distilbert.mdx b/docs/source/en/model_doc/distilbert.mdx similarity index 100% rename from docs/source/model_doc/distilbert.mdx rename to docs/source/en/model_doc/distilbert.mdx diff --git a/docs/source/model_doc/dit.mdx b/docs/source/en/model_doc/dit.mdx similarity index 100% rename from docs/source/model_doc/dit.mdx rename to docs/source/en/model_doc/dit.mdx diff --git a/docs/source/model_doc/dpr.mdx b/docs/source/en/model_doc/dpr.mdx similarity index 100% rename from docs/source/model_doc/dpr.mdx rename to docs/source/en/model_doc/dpr.mdx diff --git a/docs/source/model_doc/dpt.mdx b/docs/source/en/model_doc/dpt.mdx similarity index 100% rename from docs/source/model_doc/dpt.mdx rename to docs/source/en/model_doc/dpt.mdx diff --git a/docs/source/model_doc/electra.mdx b/docs/source/en/model_doc/electra.mdx similarity index 100% rename from docs/source/model_doc/electra.mdx rename to docs/source/en/model_doc/electra.mdx diff --git a/docs/source/model_doc/encoder-decoder.mdx b/docs/source/en/model_doc/encoder-decoder.mdx similarity index 100% rename from docs/source/model_doc/encoder-decoder.mdx rename to docs/source/en/model_doc/encoder-decoder.mdx diff --git a/docs/source/model_doc/flaubert.mdx b/docs/source/en/model_doc/flaubert.mdx similarity index 100% rename from docs/source/model_doc/flaubert.mdx rename to docs/source/en/model_doc/flaubert.mdx diff --git a/docs/source/model_doc/fnet.mdx b/docs/source/en/model_doc/fnet.mdx similarity index 100% rename from docs/source/model_doc/fnet.mdx rename to docs/source/en/model_doc/fnet.mdx diff --git a/docs/source/model_doc/fsmt.mdx b/docs/source/en/model_doc/fsmt.mdx similarity index 100% rename from docs/source/model_doc/fsmt.mdx rename to docs/source/en/model_doc/fsmt.mdx diff --git a/docs/source/model_doc/funnel.mdx b/docs/source/en/model_doc/funnel.mdx similarity index 100% rename from docs/source/model_doc/funnel.mdx rename to docs/source/en/model_doc/funnel.mdx diff --git a/docs/source/model_doc/glpn.mdx b/docs/source/en/model_doc/glpn.mdx similarity index 100% rename from docs/source/model_doc/glpn.mdx rename to docs/source/en/model_doc/glpn.mdx diff --git a/docs/source/model_doc/gpt2.mdx b/docs/source/en/model_doc/gpt2.mdx similarity index 100% rename from docs/source/model_doc/gpt2.mdx rename to docs/source/en/model_doc/gpt2.mdx diff --git a/docs/source/model_doc/gpt_neo.mdx b/docs/source/en/model_doc/gpt_neo.mdx similarity index 100% rename from docs/source/model_doc/gpt_neo.mdx rename to docs/source/en/model_doc/gpt_neo.mdx diff --git a/docs/source/model_doc/gptj.mdx b/docs/source/en/model_doc/gptj.mdx similarity index 100% rename from docs/source/model_doc/gptj.mdx rename to docs/source/en/model_doc/gptj.mdx diff --git a/docs/source/model_doc/herbert.mdx b/docs/source/en/model_doc/herbert.mdx similarity index 100% rename from docs/source/model_doc/herbert.mdx rename to docs/source/en/model_doc/herbert.mdx diff --git a/docs/source/model_doc/hubert.mdx b/docs/source/en/model_doc/hubert.mdx similarity index 100% rename from docs/source/model_doc/hubert.mdx rename to docs/source/en/model_doc/hubert.mdx diff --git a/docs/source/model_doc/ibert.mdx b/docs/source/en/model_doc/ibert.mdx similarity index 100% rename from docs/source/model_doc/ibert.mdx rename to docs/source/en/model_doc/ibert.mdx diff --git a/docs/source/model_doc/imagegpt.mdx b/docs/source/en/model_doc/imagegpt.mdx similarity index 100% rename from docs/source/model_doc/imagegpt.mdx rename to docs/source/en/model_doc/imagegpt.mdx diff --git a/docs/source/model_doc/layoutlm.mdx b/docs/source/en/model_doc/layoutlm.mdx similarity index 100% rename from docs/source/model_doc/layoutlm.mdx rename to docs/source/en/model_doc/layoutlm.mdx diff --git a/docs/source/model_doc/layoutlmv2.mdx b/docs/source/en/model_doc/layoutlmv2.mdx similarity index 100% rename from docs/source/model_doc/layoutlmv2.mdx rename to docs/source/en/model_doc/layoutlmv2.mdx diff --git a/docs/source/model_doc/layoutxlm.mdx b/docs/source/en/model_doc/layoutxlm.mdx similarity index 100% rename from docs/source/model_doc/layoutxlm.mdx rename to docs/source/en/model_doc/layoutxlm.mdx diff --git a/docs/source/model_doc/led.mdx b/docs/source/en/model_doc/led.mdx similarity index 100% rename from docs/source/model_doc/led.mdx rename to docs/source/en/model_doc/led.mdx diff --git a/docs/source/model_doc/longformer.mdx b/docs/source/en/model_doc/longformer.mdx similarity index 100% rename from docs/source/model_doc/longformer.mdx rename to docs/source/en/model_doc/longformer.mdx diff --git a/docs/source/model_doc/luke.mdx b/docs/source/en/model_doc/luke.mdx similarity index 100% rename from docs/source/model_doc/luke.mdx rename to docs/source/en/model_doc/luke.mdx diff --git a/docs/source/model_doc/lxmert.mdx b/docs/source/en/model_doc/lxmert.mdx similarity index 100% rename from docs/source/model_doc/lxmert.mdx rename to docs/source/en/model_doc/lxmert.mdx diff --git a/docs/source/model_doc/m2m_100.mdx b/docs/source/en/model_doc/m2m_100.mdx similarity index 100% rename from docs/source/model_doc/m2m_100.mdx rename to docs/source/en/model_doc/m2m_100.mdx diff --git a/docs/source/model_doc/marian.mdx b/docs/source/en/model_doc/marian.mdx similarity index 100% rename from docs/source/model_doc/marian.mdx rename to docs/source/en/model_doc/marian.mdx diff --git a/docs/source/model_doc/maskformer.mdx b/docs/source/en/model_doc/maskformer.mdx similarity index 100% rename from docs/source/model_doc/maskformer.mdx rename to docs/source/en/model_doc/maskformer.mdx diff --git a/docs/source/model_doc/mbart.mdx b/docs/source/en/model_doc/mbart.mdx similarity index 100% rename from docs/source/model_doc/mbart.mdx rename to docs/source/en/model_doc/mbart.mdx diff --git a/docs/source/model_doc/megatron-bert.mdx b/docs/source/en/model_doc/megatron-bert.mdx similarity index 100% rename from docs/source/model_doc/megatron-bert.mdx rename to docs/source/en/model_doc/megatron-bert.mdx diff --git a/docs/source/model_doc/megatron_gpt2.mdx b/docs/source/en/model_doc/megatron_gpt2.mdx similarity index 100% rename from docs/source/model_doc/megatron_gpt2.mdx rename to docs/source/en/model_doc/megatron_gpt2.mdx diff --git a/docs/source/model_doc/mluke.mdx b/docs/source/en/model_doc/mluke.mdx similarity index 100% rename from docs/source/model_doc/mluke.mdx rename to docs/source/en/model_doc/mluke.mdx diff --git a/docs/source/model_doc/mobilebert.mdx b/docs/source/en/model_doc/mobilebert.mdx similarity index 100% rename from docs/source/model_doc/mobilebert.mdx rename to docs/source/en/model_doc/mobilebert.mdx diff --git a/docs/source/model_doc/mpnet.mdx b/docs/source/en/model_doc/mpnet.mdx similarity index 100% rename from docs/source/model_doc/mpnet.mdx rename to docs/source/en/model_doc/mpnet.mdx diff --git a/docs/source/model_doc/mt5.mdx b/docs/source/en/model_doc/mt5.mdx similarity index 100% rename from docs/source/model_doc/mt5.mdx rename to docs/source/en/model_doc/mt5.mdx diff --git a/docs/source/model_doc/nystromformer.mdx b/docs/source/en/model_doc/nystromformer.mdx similarity index 100% rename from docs/source/model_doc/nystromformer.mdx rename to docs/source/en/model_doc/nystromformer.mdx diff --git a/docs/source/model_doc/openai-gpt.mdx b/docs/source/en/model_doc/openai-gpt.mdx similarity index 100% rename from docs/source/model_doc/openai-gpt.mdx rename to docs/source/en/model_doc/openai-gpt.mdx diff --git a/docs/source/model_doc/pegasus.mdx b/docs/source/en/model_doc/pegasus.mdx similarity index 100% rename from docs/source/model_doc/pegasus.mdx rename to docs/source/en/model_doc/pegasus.mdx diff --git a/docs/source/model_doc/perceiver.mdx b/docs/source/en/model_doc/perceiver.mdx similarity index 100% rename from docs/source/model_doc/perceiver.mdx rename to docs/source/en/model_doc/perceiver.mdx diff --git a/docs/source/model_doc/phobert.mdx b/docs/source/en/model_doc/phobert.mdx similarity index 100% rename from docs/source/model_doc/phobert.mdx rename to docs/source/en/model_doc/phobert.mdx diff --git a/docs/source/model_doc/plbart.mdx b/docs/source/en/model_doc/plbart.mdx similarity index 100% rename from docs/source/model_doc/plbart.mdx rename to docs/source/en/model_doc/plbart.mdx diff --git a/docs/source/model_doc/poolformer.mdx b/docs/source/en/model_doc/poolformer.mdx similarity index 100% rename from docs/source/model_doc/poolformer.mdx rename to docs/source/en/model_doc/poolformer.mdx diff --git a/docs/source/model_doc/prophetnet.mdx b/docs/source/en/model_doc/prophetnet.mdx similarity index 100% rename from docs/source/model_doc/prophetnet.mdx rename to docs/source/en/model_doc/prophetnet.mdx diff --git a/docs/source/model_doc/qdqbert.mdx b/docs/source/en/model_doc/qdqbert.mdx similarity index 100% rename from docs/source/model_doc/qdqbert.mdx rename to docs/source/en/model_doc/qdqbert.mdx diff --git a/docs/source/model_doc/rag.mdx b/docs/source/en/model_doc/rag.mdx similarity index 100% rename from docs/source/model_doc/rag.mdx rename to docs/source/en/model_doc/rag.mdx diff --git a/docs/source/model_doc/realm.mdx b/docs/source/en/model_doc/realm.mdx similarity index 100% rename from docs/source/model_doc/realm.mdx rename to docs/source/en/model_doc/realm.mdx diff --git a/docs/source/model_doc/reformer.mdx b/docs/source/en/model_doc/reformer.mdx similarity index 100% rename from docs/source/model_doc/reformer.mdx rename to docs/source/en/model_doc/reformer.mdx diff --git a/docs/source/model_doc/rembert.mdx b/docs/source/en/model_doc/rembert.mdx similarity index 100% rename from docs/source/model_doc/rembert.mdx rename to docs/source/en/model_doc/rembert.mdx diff --git a/docs/source/model_doc/resnet.mdx b/docs/source/en/model_doc/resnet.mdx similarity index 100% rename from docs/source/model_doc/resnet.mdx rename to docs/source/en/model_doc/resnet.mdx diff --git a/docs/source/model_doc/retribert.mdx b/docs/source/en/model_doc/retribert.mdx similarity index 100% rename from docs/source/model_doc/retribert.mdx rename to docs/source/en/model_doc/retribert.mdx diff --git a/docs/source/model_doc/roberta.mdx b/docs/source/en/model_doc/roberta.mdx similarity index 100% rename from docs/source/model_doc/roberta.mdx rename to docs/source/en/model_doc/roberta.mdx diff --git a/docs/source/model_doc/roformer.mdx b/docs/source/en/model_doc/roformer.mdx similarity index 100% rename from docs/source/model_doc/roformer.mdx rename to docs/source/en/model_doc/roformer.mdx diff --git a/docs/source/model_doc/segformer.mdx b/docs/source/en/model_doc/segformer.mdx similarity index 100% rename from docs/source/model_doc/segformer.mdx rename to docs/source/en/model_doc/segformer.mdx diff --git a/docs/source/model_doc/sew-d.mdx b/docs/source/en/model_doc/sew-d.mdx similarity index 100% rename from docs/source/model_doc/sew-d.mdx rename to docs/source/en/model_doc/sew-d.mdx diff --git a/docs/source/model_doc/sew.mdx b/docs/source/en/model_doc/sew.mdx similarity index 100% rename from docs/source/model_doc/sew.mdx rename to docs/source/en/model_doc/sew.mdx diff --git a/docs/source/model_doc/speech-encoder-decoder.mdx b/docs/source/en/model_doc/speech-encoder-decoder.mdx similarity index 100% rename from docs/source/model_doc/speech-encoder-decoder.mdx rename to docs/source/en/model_doc/speech-encoder-decoder.mdx diff --git a/docs/source/model_doc/speech_to_text.mdx b/docs/source/en/model_doc/speech_to_text.mdx similarity index 100% rename from docs/source/model_doc/speech_to_text.mdx rename to docs/source/en/model_doc/speech_to_text.mdx diff --git a/docs/source/model_doc/speech_to_text_2.mdx b/docs/source/en/model_doc/speech_to_text_2.mdx similarity index 100% rename from docs/source/model_doc/speech_to_text_2.mdx rename to docs/source/en/model_doc/speech_to_text_2.mdx diff --git a/docs/source/model_doc/splinter.mdx b/docs/source/en/model_doc/splinter.mdx similarity index 100% rename from docs/source/model_doc/splinter.mdx rename to docs/source/en/model_doc/splinter.mdx diff --git a/docs/source/model_doc/squeezebert.mdx b/docs/source/en/model_doc/squeezebert.mdx similarity index 100% rename from docs/source/model_doc/squeezebert.mdx rename to docs/source/en/model_doc/squeezebert.mdx diff --git a/docs/source/model_doc/swin.mdx b/docs/source/en/model_doc/swin.mdx similarity index 100% rename from docs/source/model_doc/swin.mdx rename to docs/source/en/model_doc/swin.mdx diff --git a/docs/source/model_doc/t5.mdx b/docs/source/en/model_doc/t5.mdx similarity index 100% rename from docs/source/model_doc/t5.mdx rename to docs/source/en/model_doc/t5.mdx diff --git a/docs/source/model_doc/t5v1.1.mdx b/docs/source/en/model_doc/t5v1.1.mdx similarity index 100% rename from docs/source/model_doc/t5v1.1.mdx rename to docs/source/en/model_doc/t5v1.1.mdx diff --git a/docs/source/model_doc/tapas.mdx b/docs/source/en/model_doc/tapas.mdx similarity index 100% rename from docs/source/model_doc/tapas.mdx rename to docs/source/en/model_doc/tapas.mdx diff --git a/docs/source/model_doc/transfo-xl.mdx b/docs/source/en/model_doc/transfo-xl.mdx similarity index 100% rename from docs/source/model_doc/transfo-xl.mdx rename to docs/source/en/model_doc/transfo-xl.mdx diff --git a/docs/source/model_doc/trocr.mdx b/docs/source/en/model_doc/trocr.mdx similarity index 100% rename from docs/source/model_doc/trocr.mdx rename to docs/source/en/model_doc/trocr.mdx diff --git a/docs/source/model_doc/unispeech-sat.mdx b/docs/source/en/model_doc/unispeech-sat.mdx similarity index 100% rename from docs/source/model_doc/unispeech-sat.mdx rename to docs/source/en/model_doc/unispeech-sat.mdx diff --git a/docs/source/model_doc/unispeech.mdx b/docs/source/en/model_doc/unispeech.mdx similarity index 100% rename from docs/source/model_doc/unispeech.mdx rename to docs/source/en/model_doc/unispeech.mdx diff --git a/docs/source/model_doc/van.mdx b/docs/source/en/model_doc/van.mdx similarity index 100% rename from docs/source/model_doc/van.mdx rename to docs/source/en/model_doc/van.mdx diff --git a/docs/source/model_doc/vilt.mdx b/docs/source/en/model_doc/vilt.mdx similarity index 100% rename from docs/source/model_doc/vilt.mdx rename to docs/source/en/model_doc/vilt.mdx diff --git a/docs/source/model_doc/vision-encoder-decoder.mdx b/docs/source/en/model_doc/vision-encoder-decoder.mdx similarity index 100% rename from docs/source/model_doc/vision-encoder-decoder.mdx rename to docs/source/en/model_doc/vision-encoder-decoder.mdx diff --git a/docs/source/model_doc/vision-text-dual-encoder.mdx b/docs/source/en/model_doc/vision-text-dual-encoder.mdx similarity index 100% rename from docs/source/model_doc/vision-text-dual-encoder.mdx rename to docs/source/en/model_doc/vision-text-dual-encoder.mdx diff --git a/docs/source/model_doc/visual_bert.mdx b/docs/source/en/model_doc/visual_bert.mdx similarity index 100% rename from docs/source/model_doc/visual_bert.mdx rename to docs/source/en/model_doc/visual_bert.mdx diff --git a/docs/source/model_doc/vit.mdx b/docs/source/en/model_doc/vit.mdx similarity index 100% rename from docs/source/model_doc/vit.mdx rename to docs/source/en/model_doc/vit.mdx diff --git a/docs/source/model_doc/vit_mae.mdx b/docs/source/en/model_doc/vit_mae.mdx similarity index 100% rename from docs/source/model_doc/vit_mae.mdx rename to docs/source/en/model_doc/vit_mae.mdx diff --git a/docs/source/model_doc/wav2vec2.mdx b/docs/source/en/model_doc/wav2vec2.mdx similarity index 100% rename from docs/source/model_doc/wav2vec2.mdx rename to docs/source/en/model_doc/wav2vec2.mdx diff --git a/docs/source/model_doc/wav2vec2_phoneme.mdx b/docs/source/en/model_doc/wav2vec2_phoneme.mdx similarity index 100% rename from docs/source/model_doc/wav2vec2_phoneme.mdx rename to docs/source/en/model_doc/wav2vec2_phoneme.mdx diff --git a/docs/source/model_doc/wavlm.mdx b/docs/source/en/model_doc/wavlm.mdx similarity index 100% rename from docs/source/model_doc/wavlm.mdx rename to docs/source/en/model_doc/wavlm.mdx diff --git a/docs/source/model_doc/xglm.mdx b/docs/source/en/model_doc/xglm.mdx similarity index 100% rename from docs/source/model_doc/xglm.mdx rename to docs/source/en/model_doc/xglm.mdx diff --git a/docs/source/model_doc/xlm-prophetnet.mdx b/docs/source/en/model_doc/xlm-prophetnet.mdx similarity index 100% rename from docs/source/model_doc/xlm-prophetnet.mdx rename to docs/source/en/model_doc/xlm-prophetnet.mdx diff --git a/docs/source/model_doc/xlm-roberta-xl.mdx b/docs/source/en/model_doc/xlm-roberta-xl.mdx similarity index 100% rename from docs/source/model_doc/xlm-roberta-xl.mdx rename to docs/source/en/model_doc/xlm-roberta-xl.mdx diff --git a/docs/source/model_doc/xlm-roberta.mdx b/docs/source/en/model_doc/xlm-roberta.mdx similarity index 100% rename from docs/source/model_doc/xlm-roberta.mdx rename to docs/source/en/model_doc/xlm-roberta.mdx diff --git a/docs/source/model_doc/xlm.mdx b/docs/source/en/model_doc/xlm.mdx similarity index 100% rename from docs/source/model_doc/xlm.mdx rename to docs/source/en/model_doc/xlm.mdx diff --git a/docs/source/model_doc/xlnet.mdx b/docs/source/en/model_doc/xlnet.mdx similarity index 100% rename from docs/source/model_doc/xlnet.mdx rename to docs/source/en/model_doc/xlnet.mdx diff --git a/docs/source/model_doc/xls_r.mdx b/docs/source/en/model_doc/xls_r.mdx similarity index 100% rename from docs/source/model_doc/xls_r.mdx rename to docs/source/en/model_doc/xls_r.mdx diff --git a/docs/source/model_doc/xlsr_wav2vec2.mdx b/docs/source/en/model_doc/xlsr_wav2vec2.mdx similarity index 100% rename from docs/source/model_doc/xlsr_wav2vec2.mdx rename to docs/source/en/model_doc/xlsr_wav2vec2.mdx diff --git a/docs/source/model_doc/yoso.mdx b/docs/source/en/model_doc/yoso.mdx similarity index 100% rename from docs/source/model_doc/yoso.mdx rename to docs/source/en/model_doc/yoso.mdx diff --git a/docs/source/model_sharing.mdx b/docs/source/en/model_sharing.mdx similarity index 100% rename from docs/source/model_sharing.mdx rename to docs/source/en/model_sharing.mdx diff --git a/docs/source/model_summary.mdx b/docs/source/en/model_summary.mdx similarity index 100% rename from docs/source/model_summary.mdx rename to docs/source/en/model_summary.mdx diff --git a/docs/source/multilingual.mdx b/docs/source/en/multilingual.mdx similarity index 100% rename from docs/source/multilingual.mdx rename to docs/source/en/multilingual.mdx diff --git a/docs/source/en/notebooks.md b/docs/source/en/notebooks.md new file mode 120000 index 000000000000..10fb7a7b979a --- /dev/null +++ b/docs/source/en/notebooks.md @@ -0,0 +1 @@ +../../../notebooks/README.md \ No newline at end of file diff --git a/docs/source/pad_truncation.mdx b/docs/source/en/pad_truncation.mdx similarity index 100% rename from docs/source/pad_truncation.mdx rename to docs/source/en/pad_truncation.mdx diff --git a/docs/source/parallelism.mdx b/docs/source/en/parallelism.mdx similarity index 100% rename from docs/source/parallelism.mdx rename to docs/source/en/parallelism.mdx diff --git a/docs/source/performance.mdx b/docs/source/en/performance.mdx similarity index 100% rename from docs/source/performance.mdx rename to docs/source/en/performance.mdx diff --git a/docs/source/perplexity.mdx b/docs/source/en/perplexity.mdx similarity index 100% rename from docs/source/perplexity.mdx rename to docs/source/en/perplexity.mdx diff --git a/docs/source/philosophy.mdx b/docs/source/en/philosophy.mdx similarity index 100% rename from docs/source/philosophy.mdx rename to docs/source/en/philosophy.mdx diff --git a/docs/source/pipeline_tutorial.mdx b/docs/source/en/pipeline_tutorial.mdx similarity index 100% rename from docs/source/pipeline_tutorial.mdx rename to docs/source/en/pipeline_tutorial.mdx diff --git a/docs/source/pr_checks.mdx b/docs/source/en/pr_checks.mdx similarity index 100% rename from docs/source/pr_checks.mdx rename to docs/source/en/pr_checks.mdx diff --git a/docs/source/preprocessing.mdx b/docs/source/en/preprocessing.mdx similarity index 100% rename from docs/source/preprocessing.mdx rename to docs/source/en/preprocessing.mdx diff --git a/docs/source/quicktour.mdx b/docs/source/en/quicktour.mdx similarity index 100% rename from docs/source/quicktour.mdx rename to docs/source/en/quicktour.mdx diff --git a/docs/source/run_scripts.mdx b/docs/source/en/run_scripts.mdx similarity index 100% rename from docs/source/run_scripts.mdx rename to docs/source/en/run_scripts.mdx diff --git a/docs/source/sagemaker.mdx b/docs/source/en/sagemaker.mdx similarity index 100% rename from docs/source/sagemaker.mdx rename to docs/source/en/sagemaker.mdx diff --git a/docs/source/serialization.mdx b/docs/source/en/serialization.mdx similarity index 100% rename from docs/source/serialization.mdx rename to docs/source/en/serialization.mdx diff --git a/docs/source/task_summary.mdx b/docs/source/en/task_summary.mdx similarity index 100% rename from docs/source/task_summary.mdx rename to docs/source/en/task_summary.mdx diff --git a/docs/source/tasks/asr.mdx b/docs/source/en/tasks/asr.mdx similarity index 100% rename from docs/source/tasks/asr.mdx rename to docs/source/en/tasks/asr.mdx diff --git a/docs/source/tasks/audio_classification.mdx b/docs/source/en/tasks/audio_classification.mdx similarity index 100% rename from docs/source/tasks/audio_classification.mdx rename to docs/source/en/tasks/audio_classification.mdx diff --git a/docs/source/tasks/image_classification.mdx b/docs/source/en/tasks/image_classification.mdx similarity index 100% rename from docs/source/tasks/image_classification.mdx rename to docs/source/en/tasks/image_classification.mdx diff --git a/docs/source/tasks/language_modeling.mdx b/docs/source/en/tasks/language_modeling.mdx similarity index 100% rename from docs/source/tasks/language_modeling.mdx rename to docs/source/en/tasks/language_modeling.mdx diff --git a/docs/source/tasks/multiple_choice.mdx b/docs/source/en/tasks/multiple_choice.mdx similarity index 100% rename from docs/source/tasks/multiple_choice.mdx rename to docs/source/en/tasks/multiple_choice.mdx diff --git a/docs/source/tasks/question_answering.mdx b/docs/source/en/tasks/question_answering.mdx similarity index 100% rename from docs/source/tasks/question_answering.mdx rename to docs/source/en/tasks/question_answering.mdx diff --git a/docs/source/tasks/sequence_classification.mdx b/docs/source/en/tasks/sequence_classification.mdx similarity index 100% rename from docs/source/tasks/sequence_classification.mdx rename to docs/source/en/tasks/sequence_classification.mdx diff --git a/docs/source/tasks/summarization.mdx b/docs/source/en/tasks/summarization.mdx similarity index 100% rename from docs/source/tasks/summarization.mdx rename to docs/source/en/tasks/summarization.mdx diff --git a/docs/source/tasks/token_classification.mdx b/docs/source/en/tasks/token_classification.mdx similarity index 100% rename from docs/source/tasks/token_classification.mdx rename to docs/source/en/tasks/token_classification.mdx diff --git a/docs/source/tasks/translation.mdx b/docs/source/en/tasks/translation.mdx similarity index 100% rename from docs/source/tasks/translation.mdx rename to docs/source/en/tasks/translation.mdx diff --git a/docs/source/testing.mdx b/docs/source/en/testing.mdx similarity index 100% rename from docs/source/testing.mdx rename to docs/source/en/testing.mdx diff --git a/docs/source/tokenizer_summary.mdx b/docs/source/en/tokenizer_summary.mdx similarity index 100% rename from docs/source/tokenizer_summary.mdx rename to docs/source/en/tokenizer_summary.mdx diff --git a/docs/source/training.mdx b/docs/source/en/training.mdx similarity index 100% rename from docs/source/training.mdx rename to docs/source/en/training.mdx diff --git a/docs/source/troubleshooting.mdx b/docs/source/en/troubleshooting.mdx similarity index 100% rename from docs/source/troubleshooting.mdx rename to docs/source/en/troubleshooting.mdx diff --git a/docs/source/es/_config.py b/docs/source/es/_config.py new file mode 100644 index 000000000000..cd76263e9a5c --- /dev/null +++ b/docs/source/es/_config.py @@ -0,0 +1,14 @@ +# docstyle-ignore +INSTALL_CONTENT = """ +# Transformers installation +! pip install transformers datasets +# To install from source instead of the last release, comment the command above and uncomment the following one. +# ! pip install git+https://github.com/huggingface/transformers.git +""" + +notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}] +black_avoid_patterns = { + "{processor_class}": "FakeProcessorClass", + "{model_class}": "FakeModelClass", + "{object_class}": "FakeObjectClass", +} diff --git a/docs/source/es/_toctree.yml b/docs/source/es/_toctree.yml new file mode 100644 index 000000000000..525683955e71 --- /dev/null +++ b/docs/source/es/_toctree.yml @@ -0,0 +1,17 @@ +- sections: + - local: quicktour + title: Quick tour + - local: installation + title: Instalaci贸n + title: Get started +- sections: + - local: pipeline_tutorial + title: Pipelines para inferencia + - local: training + title: Fine-tuning a un modelo pre-entrenado + - local: accelerate + title: Entrenamiento distribuido con 馃 Accelerate + title: Tutorials +- sections: + - local: multilingual + title: Modelos multiling眉es para inferencia \ No newline at end of file diff --git a/docs/source_es/accelerate.mdx b/docs/source/es/accelerate.mdx similarity index 100% rename from docs/source_es/accelerate.mdx rename to docs/source/es/accelerate.mdx diff --git a/docs/source_es/installation.mdx b/docs/source/es/installation.mdx similarity index 92% rename from docs/source_es/installation.mdx rename to docs/source/es/installation.mdx index 1e0b587e283b..cc7601c117cd 100644 --- a/docs/source_es/installation.mdx +++ b/docs/source/es/installation.mdx @@ -185,43 +185,43 @@ Otra opci贸n para usar 馃 Transformers offline es descargando previamente los * Utiliza el flujo de [`PreTrainedModel.from_pretrained`] y [`PreTrainedModel.save_pretrained`]: 1. Descarga previamente los archivos con [`PreTrainedModel.from_pretrained`]: - ```py - >>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM + ```py + >>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM - >>> tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B") - >>> model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B") - ``` + >>> tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B") + >>> model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B") + ``` 2. Guarda los archivos en un directorio espec铆fico con [`PreTrainedModel.save_pretrained`]: ```py - >>> tokenizer.save_pretrained("./your/path/bigscience_t0") - >>> model.save_pretrained("./your/path/bigscience_t0") - ``` + >>> tokenizer.save_pretrained("./your/path/bigscience_t0") + >>> model.save_pretrained("./your/path/bigscience_t0") + ``` 3. Cuando te encuentres offline, recarga los archivos con [`PreTrainedModel.from_pretrained`] desde el directorio especificado: ```py - >>> tokenizer = AutoTokenizer.from_pretrained("./your/path/bigscience_t0") - >>> model = AutoModel.from_pretrained("./your/path/bigscience_t0") - ``` + >>> tokenizer = AutoTokenizer.from_pretrained("./your/path/bigscience_t0") + >>> model = AutoModel.from_pretrained("./your/path/bigscience_t0") + ``` * Descarga de manera program谩tica los archivos con la biblioteca [huggingface_hub](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub): 1. Instala la biblioteca [huggingface_hub](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub) en tu entorno virtual: ```bash - python -m pip install huggingface_hub - ``` + python -m pip install huggingface_hub + ``` 2. Utiliza la funci贸n [`hf_hub_download`](https://huggingface.co/docs/hub/adding-a-library#download-files-from-the-hub) para descargar un archivo a un path espec铆fico. Por ejemplo, el siguiente comando descarga el archivo `config.json` del modelo [T0](https://huggingface.co/bigscience/T0_3B) al path deseado: ```py - >>> from huggingface_hub import hf_hub_download + >>> from huggingface_hub import hf_hub_download - >>> hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0") - ``` + >>> hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0") + ``` Una vez que el archivo se descargue y se almacene en cach茅 localmente, especifica tu ruta local para cargarlo y usarlo: @@ -236,9 +236,3 @@ Una vez que el archivo se descargue y se almacene en cach茅 localmente, especifi Para m谩s detalles sobre c贸mo descargar archivos almacenados en el Hub consulta la secci贸n [How to download files from the Hub](https://huggingface.co/docs/hub/how-to-downstream). - - - - - - diff --git a/docs/source_es/multilingual.mdx b/docs/source/es/multilingual.mdx similarity index 100% rename from docs/source_es/multilingual.mdx rename to docs/source/es/multilingual.mdx diff --git a/docs/source_es/pipeline_tutorial.mdx b/docs/source/es/pipeline_tutorial.mdx similarity index 100% rename from docs/source_es/pipeline_tutorial.mdx rename to docs/source/es/pipeline_tutorial.mdx diff --git a/docs/source_es/quicktour.mdx b/docs/source/es/quicktour.mdx similarity index 100% rename from docs/source_es/quicktour.mdx rename to docs/source/es/quicktour.mdx diff --git a/docs/source_es/training.mdx b/docs/source/es/training.mdx similarity index 100% rename from docs/source_es/training.mdx rename to docs/source/es/training.mdx diff --git a/docs/source/notebooks.md b/docs/source/notebooks.md deleted file mode 120000 index 1ffa21de255f..000000000000 --- a/docs/source/notebooks.md +++ /dev/null @@ -1 +0,0 @@ -../../notebooks/README.md \ No newline at end of file diff --git a/src/transformers/commands/add_new_model.py b/src/transformers/commands/add_new_model.py index a5854863d2dc..276032eefe63 100644 --- a/src/transformers/commands/add_new_model.py +++ b/src/transformers/commands/add_new_model.py @@ -178,7 +178,7 @@ def remove_copy_lines(path): shutil.move( f"{directory}/{lowercase_model_name}.mdx", - f"{path_to_transformer_root}/docs/source/model_doc/{lowercase_model_name}.mdx", + f"{path_to_transformer_root}/docs/source/en/model_doc/{lowercase_model_name}.mdx", ) shutil.move( diff --git a/src/transformers/commands/add_new_model_like.py b/src/transformers/commands/add_new_model_like.py index 31a5d714ab68..8ef5adf445b8 100644 --- a/src/transformers/commands/add_new_model_like.py +++ b/src/transformers/commands/add_new_model_like.py @@ -541,7 +541,7 @@ def get_model_files(model_type: str, frameworks: Optional[List[str]] = None) -> model_files = list(model_module.glob("*.py")) model_files = filter_framework_files(model_files, frameworks=frameworks) - doc_file = REPO_PATH / "docs" / "source" / "model_doc" / f"{model_type}.mdx" + doc_file = REPO_PATH / "docs" / "source" / "en" / "model_doc" / f"{model_type}.mdx" # Basic pattern for test files test_files = [ @@ -1256,7 +1256,7 @@ def disable_fx_test(filename: Path) -> bool: add_model_to_auto_classes(old_model_patterns, new_model_patterns, model_classes) # 5. Add doc file - doc_file = REPO_PATH / "docs" / "source" / "model_doc" / f"{old_model_patterns.model_type}.mdx" + doc_file = REPO_PATH / "docs" / "source" / "en" / "model_doc" / f"{old_model_patterns.model_type}.mdx" duplicate_doc_file(doc_file, old_model_patterns, new_model_patterns, frameworks=frameworks) # 6. Warn the user for duplicate patterns diff --git a/utils/check_copies.py b/utils/check_copies.py index e823b866d2a7..5363fd1ff338 100644 --- a/utils/check_copies.py +++ b/utils/check_copies.py @@ -25,7 +25,7 @@ # All paths are set with the intent you should run this script from the root of the repo with the command # python utils/check_copies.py TRANSFORMERS_PATH = "src/transformers" -PATH_TO_DOCS = "docs/source" +PATH_TO_DOCS = "docs/source/en" REPO_PATH = "." # Mapping for files that are full copies of others (keys are copies, values the file to keep them up to data with) diff --git a/utils/check_repo.py b/utils/check_repo.py index 5f81c3bcfca6..99af09355274 100644 --- a/utils/check_repo.py +++ b/utils/check_repo.py @@ -31,7 +31,7 @@ # python utils/check_repo.py PATH_TO_TRANSFORMERS = "src/transformers" PATH_TO_TESTS = "tests" -PATH_TO_DOC = "docs/source" +PATH_TO_DOC = "docs/source/en" # Update this list with models that are supposed to be private. PRIVATE_MODELS = [ diff --git a/utils/check_table.py b/utils/check_table.py index 9d948fbb6d9f..d59f3e7b1e5a 100644 --- a/utils/check_table.py +++ b/utils/check_table.py @@ -23,7 +23,7 @@ # All paths are set with the intent you should run this script from the root of the repo with the command # python utils/check_table.py TRANSFORMERS_PATH = "src/transformers" -PATH_TO_DOCS = "docs/source" +PATH_TO_DOCS = "docs/source/en" REPO_PATH = "." From 96494dc2badde11a59a9e238f9586fd74c4169ba Mon Sep 17 00:00:00 2001 From: Karim Foda <35491698+KMFODA@users.noreply.github.com> Date: Mon, 4 Apr 2022 15:27:45 +0100 Subject: [PATCH 20/34] Add use_auth to load_datasets for private datasets to PT and TF examples (#16521) * fix formatting and remove use_auth * Add use_auth_token to Flax examples --- .../run_image_captioning_flax.py | 25 +++++++- .../flax/language-modeling/run_clm_flax.py | 58 +++++++++++++++--- .../flax/language-modeling/run_mlm_flax.py | 58 +++++++++++++++--- .../flax/language-modeling/run_t5_mlm_flax.py | 59 ++++++++++++++++--- examples/flax/question-answering/run_qa.py | 13 +++- .../summarization/run_summarization_flax.py | 53 ++++++++++++++--- .../flax/text-classification/run_flax_glue.py | 27 +++++++-- .../flax/token-classification/run_flax_ner.py | 12 +++- .../flax/vision/run_image_classification.py | 20 ++++++- .../run_audio_classification.py | 10 +++- .../contrastive-image-text/run_clip.py | 8 ++- .../run_image_classification.py | 1 + examples/pytorch/image-pretraining/run_mae.py | 1 + examples/pytorch/image-pretraining/run_mim.py | 1 + examples/pytorch/language-modeling/run_clm.py | 17 +++++- examples/pytorch/language-modeling/run_mlm.py | 16 ++++- examples/pytorch/language-modeling/run_plm.py | 9 ++- examples/pytorch/multiple-choice/run_swag.py | 14 ++++- examples/pytorch/question-answering/run_qa.py | 13 +++- .../question-answering/run_qa_beam_search.py | 13 +++- .../run_wav2vec2_pretraining_no_trainer.py | 5 +- .../run_speech_recognition_seq2seq.py | 10 +++- .../summarization/run_summarization.py | 12 +++- .../pytorch/text-classification/run_glue.py | 26 ++++++-- .../pytorch/text-classification/run_xnli.py | 30 ++++++++-- .../pytorch/token-classification/run_ner.py | 5 +- .../pytorch/translation/run_translation.py | 12 +++- .../tensorflow/language-modeling/run_clm.py | 15 ++++- .../tensorflow/language-modeling/run_mlm.py | 14 ++++- .../tensorflow/multiple-choice/run_swag.py | 14 ++++- .../tensorflow/question-answering/run_qa.py | 15 ++++- .../summarization/run_summarization.py | 12 +++- .../text-classification/run_glue.py | 7 ++- .../run_text_classification.py | 7 ++- .../token-classification/run_ner.py | 12 +++- .../tensorflow/translation/run_translation.py | 12 +++- 36 files changed, 544 insertions(+), 92 deletions(-) diff --git a/examples/flax/image-captioning/run_image_captioning_flax.py b/examples/flax/image-captioning/run_image_captioning_flax.py index b4b9afe0d305..b1c9012777ac 100644 --- a/examples/flax/image-captioning/run_image_captioning_flax.py +++ b/examples/flax/image-captioning/run_image_captioning_flax.py @@ -178,6 +178,13 @@ class ModelArguments: "help": "Floating-point format in which the model weights should be initialized and trained. Choose one of `[float32, float16, bfloat16]`." }, ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) @dataclass @@ -418,6 +425,7 @@ def main(): cache_dir=model_args.cache_dir, keep_in_memory=False, data_dir=data_args.data_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -430,7 +438,12 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - dataset = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + dataset = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. @@ -439,12 +452,18 @@ def main(): model_args.model_name_or_path, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) feature_extractor = AutoFeatureExtractor.from_pretrained( - model_args.model_name_or_path, cache_dir=model_args.cache_dir + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) tokenizer = AutoTokenizer.from_pretrained( - model_args.model_name_or_path, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) tokenizer.pad_token = tokenizer.convert_ids_to_tokens(model.config.pad_token_id) diff --git a/examples/flax/language-modeling/run_clm_flax.py b/examples/flax/language-modeling/run_clm_flax.py index 82a9757d5c26..afb6d75b3857 100755 --- a/examples/flax/language-modeling/run_clm_flax.py +++ b/examples/flax/language-modeling/run_clm_flax.py @@ -165,6 +165,13 @@ class ModelArguments: "help": "Floating-point format in which the model weights should be initialized and trained. Choose one of `[float32, float16, bfloat16]`." }, ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) @dataclass @@ -363,7 +370,11 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. dataset = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir, keep_in_memory=False + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + keep_in_memory=False, + use_auth_token=True if model_args.use_auth_token else None, ) if "validation" not in dataset.keys(): @@ -372,12 +383,14 @@ def main(): data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) dataset["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -390,7 +403,13 @@ def main(): if extension == "txt": extension = "text" dataset_args["keep_linebreaks"] = data_args.keep_linebreaks - dataset = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir, **dataset_args) + dataset = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + **dataset_args, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in dataset.keys(): dataset["validation"] = load_dataset( @@ -399,6 +418,7 @@ def main(): split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, **dataset_args, + use_auth_token=True if model_args.use_auth_token else None, ) dataset["train"] = load_dataset( extension, @@ -406,6 +426,7 @@ def main(): split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, **dataset_args, + use_auth_token=True if model_args.use_auth_token else None, ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. @@ -416,20 +437,34 @@ def main(): # The .from_pretrained methods guarantee that only one local process can concurrently # download model & vocab. if model_args.config_name: - config = AutoConfig.from_pretrained(model_args.config_name, cache_dir=model_args.cache_dir) + config = AutoConfig.from_pretrained( + model_args.config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) elif model_args.model_name_or_path: - config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir) + config = AutoConfig.from_pretrained( + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: config = CONFIG_MAPPING[model_args.model_type]() logger.warning("You are instantiating a new config instance from scratch.") if model_args.tokenizer_name: tokenizer = AutoTokenizer.from_pretrained( - model_args.tokenizer_name, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.tokenizer_name, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) elif model_args.model_name_or_path: tokenizer = AutoTokenizer.from_pretrained( - model_args.model_name_or_path, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) else: raise ValueError( @@ -439,11 +474,18 @@ def main(): if model_args.model_name_or_path: model = FlaxAutoModelForCausalLM.from_pretrained( - model_args.model_name_or_path, config=config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + model_args.model_name_or_path, + config=config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) else: model = FlaxAutoModelForCausalLM.from_config( - config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) # Preprocessing the datasets. diff --git a/examples/flax/language-modeling/run_mlm_flax.py b/examples/flax/language-modeling/run_mlm_flax.py index daa247ecaae0..6ea0f6e1564f 100755 --- a/examples/flax/language-modeling/run_mlm_flax.py +++ b/examples/flax/language-modeling/run_mlm_flax.py @@ -163,6 +163,13 @@ class ModelArguments: "help": "Floating-point format in which the model weights should be initialized and trained. Choose one of `[float32, float16, bfloat16]`." }, ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) @dataclass @@ -396,7 +403,12 @@ def main(): # download the dataset. if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. - datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir) + datasets = load_dataset( + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in datasets.keys(): datasets["validation"] = load_dataset( @@ -404,12 +416,14 @@ def main(): data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -420,7 +434,12 @@ def main(): extension = data_args.train_file.split(".")[-1] if extension == "txt": extension = "text" - datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in datasets.keys(): datasets["validation"] = load_dataset( @@ -428,12 +447,14 @@ def main(): data_files=data_files, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) datasets["train"] = load_dataset( extension, data_files=data_files, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. @@ -444,20 +465,34 @@ def main(): # The .from_pretrained methods guarantee that only one local process can concurrently # download model & vocab. if model_args.config_name: - config = AutoConfig.from_pretrained(model_args.config_name, cache_dir=model_args.cache_dir) + config = AutoConfig.from_pretrained( + model_args.config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) elif model_args.model_name_or_path: - config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir) + config = AutoConfig.from_pretrained( + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: config = CONFIG_MAPPING[model_args.model_type]() logger.warning("You are instantiating a new config instance from scratch.") if model_args.tokenizer_name: tokenizer = AutoTokenizer.from_pretrained( - model_args.tokenizer_name, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.tokenizer_name, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) elif model_args.model_name_or_path: tokenizer = AutoTokenizer.from_pretrained( - model_args.model_name_or_path, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) else: raise ValueError( @@ -572,11 +607,18 @@ def group_texts(examples): if model_args.model_name_or_path: model = FlaxAutoModelForMaskedLM.from_pretrained( - model_args.model_name_or_path, config=config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + model_args.model_name_or_path, + config=config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) else: model = FlaxAutoModelForMaskedLM.from_config( - config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) # Store some constant diff --git a/examples/flax/language-modeling/run_t5_mlm_flax.py b/examples/flax/language-modeling/run_t5_mlm_flax.py index 622f11f5de2a..5b1067cd993e 100755 --- a/examples/flax/language-modeling/run_t5_mlm_flax.py +++ b/examples/flax/language-modeling/run_t5_mlm_flax.py @@ -162,6 +162,13 @@ class ModelArguments: "help": "Floating-point format in which the model weights should be initialized and trained. Choose one of `[float32, float16, bfloat16]`." }, ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) @dataclass @@ -525,7 +532,12 @@ def main(): # 'text' is found. You can easily tweak this behavior (see below). if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. - datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir) + datasets = load_dataset( + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in datasets.keys(): datasets["validation"] = load_dataset( @@ -533,12 +545,14 @@ def main(): data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -549,7 +563,12 @@ def main(): extension = data_args.train_file.split(".")[-1] if extension == "txt": extension = "text" - datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in datasets.keys(): datasets["validation"] = load_dataset( @@ -557,12 +576,14 @@ def main(): data_files=data_files, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) datasets["train"] = load_dataset( extension, data_files=data_files, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. @@ -571,11 +592,17 @@ def main(): if model_args.tokenizer_name: tokenizer = AutoTokenizer.from_pretrained( - model_args.tokenizer_name, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.tokenizer_name, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) elif model_args.model_name_or_path: tokenizer = AutoTokenizer.from_pretrained( - model_args.model_name_or_path, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) else: raise ValueError( @@ -585,10 +612,17 @@ def main(): if model_args.config_name: config = T5Config.from_pretrained( - model_args.config_name, cache_dir=model_args.cache_dir, vocab_size=len(tokenizer) + model_args.config_name, + cache_dir=model_args.cache_dir, + vocab_size=len(tokenizer), + use_auth_token=True if model_args.use_auth_token else None, ) elif model_args.model_name_or_path: - config = T5Config.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir) + config = T5Config.from_pretrained( + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: config = CONFIG_MAPPING[model_args.model_type]() logger.warning("You are instantiating a new config instance from scratch.") @@ -678,11 +712,20 @@ def group_texts(examples): if model_args.model_name_or_path: model = FlaxT5ForConditionalGeneration.from_pretrained( - model_args.model_name_or_path, config=config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + model_args.model_name_or_path, + config=config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) else: config.vocab_size = len(tokenizer) - model = FlaxT5ForConditionalGeneration(config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype)) + model = FlaxT5ForConditionalGeneration( + config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, + ) # Data collator # This one will take care of randomly masking the tokens. diff --git a/examples/flax/question-answering/run_qa.py b/examples/flax/question-answering/run_qa.py index a15cca6607cc..6ab150a762b0 100644 --- a/examples/flax/question-answering/run_qa.py +++ b/examples/flax/question-answering/run_qa.py @@ -448,7 +448,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: # Loading the dataset from local csv or json file. @@ -463,7 +466,13 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, field="data", cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + field="data", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. # endregion diff --git a/examples/flax/summarization/run_summarization_flax.py b/examples/flax/summarization/run_summarization_flax.py index effe3b58839f..3ebff73b98ff 100644 --- a/examples/flax/summarization/run_summarization_flax.py +++ b/examples/flax/summarization/run_summarization_flax.py @@ -176,6 +176,13 @@ class ModelArguments: "help": "Floating-point format in which the model weights should be initialized and trained. Choose one of `[float32, float16, bfloat16]`." }, ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) @dataclass @@ -421,7 +428,11 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. dataset = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir, keep_in_memory=False + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + keep_in_memory=False, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -434,27 +445,46 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - dataset = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + dataset = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. # Load pretrained model and tokenizer if model_args.config_name: - config = AutoConfig.from_pretrained(model_args.config_name, cache_dir=model_args.cache_dir) + config = AutoConfig.from_pretrained( + model_args.config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) elif model_args.model_name_or_path: - config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir) + config = AutoConfig.from_pretrained( + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: config = CONFIG_MAPPING[model_args.model_type]() logger.warning("You are instantiating a new config instance from scratch.") if model_args.tokenizer_name: tokenizer = AutoTokenizer.from_pretrained( - model_args.tokenizer_name, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.tokenizer_name, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) elif model_args.model_name_or_path: tokenizer = AutoTokenizer.from_pretrained( - model_args.model_name_or_path, cache_dir=model_args.cache_dir, use_fast=model_args.use_fast_tokenizer + model_args.model_name_or_path, + cache_dir=model_args.cache_dir, + use_fast=model_args.use_fast_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, ) else: raise ValueError( @@ -464,11 +494,18 @@ def main(): if model_args.model_name_or_path: model = FlaxAutoModelForSeq2SeqLM.from_pretrained( - model_args.model_name_or_path, config=config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + model_args.model_name_or_path, + config=config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) else: model = FlaxAutoModelForSeq2SeqLM.from_config( - config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) if model.config.decoder_start_token_id is None: diff --git a/examples/flax/text-classification/run_flax_glue.py b/examples/flax/text-classification/run_flax_glue.py index d56d23d2734e..06f9caba8943 100755 --- a/examples/flax/text-classification/run_flax_glue.py +++ b/examples/flax/text-classification/run_flax_glue.py @@ -337,7 +337,11 @@ def main(): # download the dataset. if data_args.task_name is not None: # Downloading and loading a dataset from the hub. - raw_datasets = load_dataset("glue", data_args.task_name) + raw_datasets = load_dataset( + "glue", + data_args.task_name, + use_auth_token=True if model_args.use_auth_token else None, + ) else: # Loading the dataset from local csv or json file. data_files = {} @@ -346,7 +350,11 @@ def main(): if data_args.validation_file is not None: data_files["validation"] = data_args.validation_file extension = (data_args.train_file if data_args.train_file is not None else data_args.valid_file).split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files) + raw_datasets = load_dataset( + extension, + data_files=data_files, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset at # https://huggingface.co/docs/datasets/loading_datasets.html. @@ -372,12 +380,21 @@ def main(): # Load pretrained model and tokenizer config = AutoConfig.from_pretrained( - model_args.model_name_or_path, num_labels=num_labels, finetuning_task=data_args.task_name + model_args.model_name_or_path, + num_labels=num_labels, + finetuning_task=data_args.task_name, + use_auth_token=True if model_args.use_auth_token else None, ) tokenizer = AutoTokenizer.from_pretrained( - model_args.model_name_or_path, use_fast=not model_args.use_slow_tokenizer + model_args.model_name_or_path, + use_fast=not model_args.use_slow_tokenizer, + use_auth_token=True if model_args.use_auth_token else None, + ) + model = FlaxAutoModelForSequenceClassification.from_pretrained( + model_args.model_name_or_path, + config=config, + use_auth_token=True if model_args.use_auth_token else None, ) - model = FlaxAutoModelForSequenceClassification.from_pretrained(model_args.model_name_or_path, config=config) # Preprocessing the datasets if data_args.task_name is not None: diff --git a/examples/flax/token-classification/run_flax_ner.py b/examples/flax/token-classification/run_flax_ner.py index abf1b8d0c117..32f0104b8929 100644 --- a/examples/flax/token-classification/run_flax_ner.py +++ b/examples/flax/token-classification/run_flax_ner.py @@ -391,7 +391,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: # Loading the dataset from local csv or json file. @@ -401,7 +404,12 @@ def main(): if data_args.validation_file is not None: data_files["validation"] = data_args.validation_file extension = (data_args.train_file if data_args.train_file is not None else data_args.valid_file).split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/flax/vision/run_image_classification.py b/examples/flax/vision/run_image_classification.py index 7459d24c6346..0dc7b2f95742 100644 --- a/examples/flax/vision/run_image_classification.py +++ b/examples/flax/vision/run_image_classification.py @@ -154,6 +154,13 @@ class ModelArguments: "help": "Floating-point format in which the model weights should be initialized and trained. Choose one of `[float32, float16, bfloat16]`." }, ) + use_auth_token: bool = field( + default=False, + metadata={ + "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script " + "with private models)." + }, + ) @dataclass @@ -315,6 +322,7 @@ def main(): num_labels=len(train_dataset.classes), image_size=data_args.image_size, cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) elif model_args.model_name_or_path: config = AutoConfig.from_pretrained( @@ -322,6 +330,7 @@ def main(): num_labels=len(train_dataset.classes), image_size=data_args.image_size, cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: config = CONFIG_MAPPING[model_args.model_type]() @@ -329,11 +338,18 @@ def main(): if model_args.model_name_or_path: model = FlaxAutoModelForImageClassification.from_pretrained( - model_args.model_name_or_path, config=config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + model_args.model_name_or_path, + config=config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) else: model = FlaxAutoModelForImageClassification.from_config( - config, seed=training_args.seed, dtype=getattr(jnp, model_args.dtype) + config, + seed=training_args.seed, + dtype=getattr(jnp, model_args.dtype), + use_auth_token=True if model_args.use_auth_token else None, ) # Store some constant diff --git a/examples/pytorch/audio-classification/run_audio_classification.py b/examples/pytorch/audio-classification/run_audio_classification.py index 14c0a026fda4..c0eb755b6a5a 100644 --- a/examples/pytorch/audio-classification/run_audio_classification.py +++ b/examples/pytorch/audio-classification/run_audio_classification.py @@ -227,10 +227,16 @@ def main(): # Initialize our dataset and prepare it for the audio classification task. raw_datasets = DatasetDict() raw_datasets["train"] = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, split=data_args.train_split_name + data_args.dataset_name, + data_args.dataset_config_name, + split=data_args.train_split_name, + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["eval"] = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, split=data_args.eval_split_name + data_args.dataset_name, + data_args.dataset_config_name, + split=data_args.eval_split_name, + use_auth_token=True if model_args.use_auth_token else None, ) if data_args.audio_column_name not in raw_datasets["train"].column_names: diff --git a/examples/pytorch/contrastive-image-text/run_clip.py b/examples/pytorch/contrastive-image-text/run_clip.py index 79fd123064a1..02f20936873b 100644 --- a/examples/pytorch/contrastive-image-text/run_clip.py +++ b/examples/pytorch/contrastive-image-text/run_clip.py @@ -276,6 +276,7 @@ def main(): cache_dir=model_args.cache_dir, keep_in_memory=False, data_dir=data_args.data_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -288,7 +289,12 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - dataset = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + dataset = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/pytorch/image-classification/run_image_classification.py b/examples/pytorch/image-classification/run_image_classification.py index b7de0f5f7b6e..fef52c4bf5e5 100644 --- a/examples/pytorch/image-classification/run_image_classification.py +++ b/examples/pytorch/image-classification/run_image_classification.py @@ -207,6 +207,7 @@ def main(): data_files=data_args.data_files, cache_dir=model_args.cache_dir, task="image-classification", + use_auth_token=True if model_args.use_auth_token else None, ) # If we don't have a validation split, split off a percentage of train as validation. diff --git a/examples/pytorch/image-pretraining/run_mae.py b/examples/pytorch/image-pretraining/run_mae.py index 3b634d691832..e2182ec783da 100644 --- a/examples/pytorch/image-pretraining/run_mae.py +++ b/examples/pytorch/image-pretraining/run_mae.py @@ -207,6 +207,7 @@ def main(): data_args.dataset_config_name, data_files=data_args.data_files, cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) # If we don't have a validation split, split off a percentage of train as validation. diff --git a/examples/pytorch/image-pretraining/run_mim.py b/examples/pytorch/image-pretraining/run_mim.py index 0377a505e02d..323c38489589 100644 --- a/examples/pytorch/image-pretraining/run_mim.py +++ b/examples/pytorch/image-pretraining/run_mim.py @@ -266,6 +266,7 @@ def main(): data_args.dataset_config_name, data_files=data_args.data_files, cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) # If we don't have a validation split, split off a percentage of train as validation. diff --git a/examples/pytorch/language-modeling/run_clm.py b/examples/pytorch/language-modeling/run_clm.py index a1cdcf9ee4a9..3d2af72ccaf6 100755 --- a/examples/pytorch/language-modeling/run_clm.py +++ b/examples/pytorch/language-modeling/run_clm.py @@ -254,7 +254,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) if "validation" not in raw_datasets.keys(): raw_datasets["validation"] = load_dataset( @@ -262,12 +265,14 @@ def main(): data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -284,7 +289,13 @@ def main(): if extension == "txt": extension = "text" dataset_args["keep_linebreaks"] = data_args.keep_linebreaks - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir, **dataset_args) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + **dataset_args, + ) # If no validation data is there, validation_split_percentage will be used to divide the dataset. if "validation" not in raw_datasets.keys(): raw_datasets["validation"] = load_dataset( @@ -292,6 +303,7 @@ def main(): data_files=data_files, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, **dataset_args, ) raw_datasets["train"] = load_dataset( @@ -299,6 +311,7 @@ def main(): data_files=data_files, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, **dataset_args, ) diff --git a/examples/pytorch/language-modeling/run_mlm.py b/examples/pytorch/language-modeling/run_mlm.py index 6ea3c2c934d3..f829e86781f1 100755 --- a/examples/pytorch/language-modeling/run_mlm.py +++ b/examples/pytorch/language-modeling/run_mlm.py @@ -263,7 +263,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) if "validation" not in raw_datasets.keys(): raw_datasets["validation"] = load_dataset( @@ -271,12 +274,14 @@ def main(): data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -288,7 +293,12 @@ def main(): extension = data_args.validation_file.split(".")[-1] if extension == "txt": extension = "text" - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # If no validation data is there, validation_split_percentage will be used to divide the dataset. if "validation" not in raw_datasets.keys(): @@ -297,12 +307,14 @@ def main(): data_files=data_files, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( extension, data_files=data_files, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at diff --git a/examples/pytorch/language-modeling/run_plm.py b/examples/pytorch/language-modeling/run_plm.py index d1c09896d8e7..cc4ad602329c 100755 --- a/examples/pytorch/language-modeling/run_plm.py +++ b/examples/pytorch/language-modeling/run_plm.py @@ -256,7 +256,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) if "validation" not in raw_datasets.keys(): raw_datasets["validation"] = load_dataset( @@ -264,12 +267,14 @@ def main(): data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -288,12 +293,14 @@ def main(): data_files=data_files, split=f"train[:{data_args.validation_split_percentage}%]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( extension, data_files=data_files, split=f"train[{data_args.validation_split_percentage}%:]", cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at diff --git a/examples/pytorch/multiple-choice/run_swag.py b/examples/pytorch/multiple-choice/run_swag.py index 01c9e8bcf7d2..4578e4570aa0 100755 --- a/examples/pytorch/multiple-choice/run_swag.py +++ b/examples/pytorch/multiple-choice/run_swag.py @@ -269,10 +269,20 @@ def main(): if data_args.validation_file is not None: data_files["validation"] = data_args.validation_file extension = data_args.train_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: # Downloading and loading the swag dataset from the hub. - raw_datasets = load_dataset("swag", "regular", cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + "swag", + "regular", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/pytorch/question-answering/run_qa.py b/examples/pytorch/question-answering/run_qa.py index 67aaf1d84ff0..90d199b14d6d 100755 --- a/examples/pytorch/question-answering/run_qa.py +++ b/examples/pytorch/question-answering/run_qa.py @@ -262,7 +262,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -276,7 +279,13 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, field="data", cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + field="data", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/pytorch/question-answering/run_qa_beam_search.py b/examples/pytorch/question-answering/run_qa_beam_search.py index 4c79be08b91b..96aa07a8086b 100755 --- a/examples/pytorch/question-answering/run_qa_beam_search.py +++ b/examples/pytorch/question-answering/run_qa_beam_search.py @@ -260,7 +260,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -273,7 +276,13 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, field="data", cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + field="data", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py b/examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py index 51ac5191181e..88021a428503 100755 --- a/examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py +++ b/examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py @@ -403,7 +403,10 @@ def main(): for dataset_config_name, train_split_name in zip(args.dataset_config_names, args.dataset_split_names): # load dataset dataset_split = load_dataset( - args.dataset_name, dataset_config_name, split=train_split_name, cache_dir=args.cache_dir + args.dataset_name, + dataset_config_name, + split=train_split_name, + cache_dir=args.cache_dir, ) datasets_splits.append(dataset_split) diff --git a/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py b/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py index 695a5b24fd18..46d4785fa8f8 100755 --- a/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py +++ b/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py @@ -278,12 +278,18 @@ def main(): if training_args.do_train: raw_datasets["train"] = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, split=data_args.train_split_name + data_args.dataset_name, + data_args.dataset_config_name, + split=data_args.train_split_name, + use_auth_token=True if model_args.use_auth_token else None, ) if training_args.do_eval: raw_datasets["eval"] = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, split=data_args.eval_split_name + data_args.dataset_name, + data_args.dataset_config_name, + split=data_args.eval_split_name, + use_auth_token=True if model_args.use_auth_token else None, ) if data_args.audio_column_name not in next(iter(raw_datasets.values())).column_names: diff --git a/examples/pytorch/summarization/run_summarization.py b/examples/pytorch/summarization/run_summarization.py index 66aeb981bdf4..7b39cb8e48f9 100755 --- a/examples/pytorch/summarization/run_summarization.py +++ b/examples/pytorch/summarization/run_summarization.py @@ -341,7 +341,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -354,7 +357,12 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/pytorch/text-classification/run_glue.py b/examples/pytorch/text-classification/run_glue.py index 88be878faea2..a0730f609820 100755 --- a/examples/pytorch/text-classification/run_glue.py +++ b/examples/pytorch/text-classification/run_glue.py @@ -252,11 +252,19 @@ def main(): # download the dataset. if data_args.task_name is not None: # Downloading and loading a dataset from the hub. - raw_datasets = load_dataset("glue", data_args.task_name, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + "glue", + data_args.task_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) elif data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: # Loading a dataset from your local files. @@ -281,10 +289,20 @@ def main(): if data_args.train_file.endswith(".csv"): # Loading a dataset from local csv files - raw_datasets = load_dataset("csv", data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + "csv", + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: # Loading a dataset from local json files - raw_datasets = load_dataset("json", data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + "json", + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/pytorch/text-classification/run_xnli.py b/examples/pytorch/text-classification/run_xnli.py index f54b1ec2aa60..4a17a5d702ba 100755 --- a/examples/pytorch/text-classification/run_xnli.py +++ b/examples/pytorch/text-classification/run_xnli.py @@ -213,19 +213,41 @@ def main(): # Downloading and loading xnli dataset from the hub. if training_args.do_train: if model_args.train_language is None: - train_dataset = load_dataset("xnli", model_args.language, split="train", cache_dir=model_args.cache_dir) + train_dataset = load_dataset( + "xnli", + model_args.language, + split="train", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: train_dataset = load_dataset( - "xnli", model_args.train_language, split="train", cache_dir=model_args.cache_dir + "xnli", + model_args.train_language, + split="train", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) label_list = train_dataset.features["label"].names if training_args.do_eval: - eval_dataset = load_dataset("xnli", model_args.language, split="validation", cache_dir=model_args.cache_dir) + eval_dataset = load_dataset( + "xnli", + model_args.language, + split="validation", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) label_list = eval_dataset.features["label"].names if training_args.do_predict: - predict_dataset = load_dataset("xnli", model_args.language, split="test", cache_dir=model_args.cache_dir) + predict_dataset = load_dataset( + "xnli", + model_args.language, + split="test", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) label_list = predict_dataset.features["label"].names # Labels diff --git a/examples/pytorch/token-classification/run_ner.py b/examples/pytorch/token-classification/run_ner.py index 9ff64b37978c..5545b35862b3 100755 --- a/examples/pytorch/token-classification/run_ner.py +++ b/examples/pytorch/token-classification/run_ner.py @@ -249,7 +249,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} diff --git a/examples/pytorch/translation/run_translation.py b/examples/pytorch/translation/run_translation.py index b458a3f0cd65..f7e98276dc7b 100755 --- a/examples/pytorch/translation/run_translation.py +++ b/examples/pytorch/translation/run_translation.py @@ -306,7 +306,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -319,7 +322,12 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/tensorflow/language-modeling/run_clm.py b/examples/tensorflow/language-modeling/run_clm.py index 4cbc00b3cdc9..84e71efe50d1 100755 --- a/examples/tensorflow/language-modeling/run_clm.py +++ b/examples/tensorflow/language-modeling/run_clm.py @@ -280,17 +280,23 @@ def main(): # download the dataset. if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. - raw_datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name) + raw_datasets = load_dataset( + data_args.dataset_name, + data_args.dataset_config_name, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in raw_datasets.keys(): raw_datasets["validation"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -303,7 +309,12 @@ def main(): if extension == "txt": extension = "text" dataset_args["keep_linebreaks"] = data_args.keep_linebreaks - raw_datasets = load_dataset(extension, data_files=data_files, **dataset_args) + raw_datasets = load_dataset( + extension, + data_files=data_files, + use_auth_token=True if model_args.use_auth_token else None, + **dataset_args, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. # endregion diff --git a/examples/tensorflow/language-modeling/run_mlm.py b/examples/tensorflow/language-modeling/run_mlm.py index 44c5d230318b..8b32070b2dd1 100755 --- a/examples/tensorflow/language-modeling/run_mlm.py +++ b/examples/tensorflow/language-modeling/run_mlm.py @@ -292,17 +292,23 @@ def main(): # download the dataset. if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. - raw_datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name) + raw_datasets = load_dataset( + data_args.dataset_name, + data_args.dataset_config_name, + use_auth_token=True if model_args.use_auth_token else None, + ) if "validation" not in raw_datasets.keys(): raw_datasets["validation"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[:{data_args.validation_split_percentage}%]", + use_auth_token=True if model_args.use_auth_token else None, ) raw_datasets["train"] = load_dataset( data_args.dataset_name, data_args.dataset_config_name, split=f"train[{data_args.validation_split_percentage}%:]", + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -313,7 +319,11 @@ def main(): extension = data_args.train_file.split(".")[-1] if extension == "txt": extension = "text" - raw_datasets = load_dataset(extension, data_files=data_files) + raw_datasets = load_dataset( + extension, + data_files=data_files, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/tensorflow/multiple-choice/run_swag.py b/examples/tensorflow/multiple-choice/run_swag.py index e14815cf81f3..2c78ab39fa60 100644 --- a/examples/tensorflow/multiple-choice/run_swag.py +++ b/examples/tensorflow/multiple-choice/run_swag.py @@ -290,10 +290,20 @@ def main(): if data_args.validation_file is not None: data_files["validation"] = data_args.validation_file extension = data_args.train_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: # Downloading and loading the swag dataset from the hub. - raw_datasets = load_dataset("swag", "regular", cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + "swag", + "regular", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/tensorflow/question-answering/run_qa.py b/examples/tensorflow/question-answering/run_qa.py index 50e8c7f50d96..891219d3a1a2 100755 --- a/examples/tensorflow/question-answering/run_qa.py +++ b/examples/tensorflow/question-answering/run_qa.py @@ -278,7 +278,12 @@ def main(): # download the dataset. if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. - datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir) + datasets = load_dataset( + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: data_files = {} if data_args.train_file is not None: @@ -291,7 +296,13 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - datasets = load_dataset(extension, data_files=data_files, field="data", cache_dir=model_args.cache_dir) + datasets = load_dataset( + extension, + data_files=data_files, + field="data", + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. # endregion diff --git a/examples/tensorflow/summarization/run_summarization.py b/examples/tensorflow/summarization/run_summarization.py index e40c763530c0..09aa8f90de3d 100644 --- a/examples/tensorflow/summarization/run_summarization.py +++ b/examples/tensorflow/summarization/run_summarization.py @@ -391,7 +391,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -404,7 +407,12 @@ def main(): if data_args.test_file is not None: data_files["test"] = data_args.test_file extension = data_args.test_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. # endregion diff --git a/examples/tensorflow/text-classification/run_glue.py b/examples/tensorflow/text-classification/run_glue.py index 03d7df675b78..fa8cb98a5a6e 100644 --- a/examples/tensorflow/text-classification/run_glue.py +++ b/examples/tensorflow/text-classification/run_glue.py @@ -236,7 +236,12 @@ def main(): # Downloading and loading a dataset from the hub. In distributed training, the load_dataset function guarantee # that only one local process can concurrently download the dataset. - datasets = load_dataset("glue", data_args.task_name, cache_dir=model_args.cache_dir) + datasets = load_dataset( + "glue", + data_args.task_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/tensorflow/text-classification/run_text_classification.py b/examples/tensorflow/text-classification/run_text_classification.py index 114caacdbf54..3f3d64b6236d 100644 --- a/examples/tensorflow/text-classification/run_text_classification.py +++ b/examples/tensorflow/text-classification/run_text_classification.py @@ -236,7 +236,12 @@ def main(): if data_args.input_file_extension == "csv": # Loading a dataset from local csv files - datasets = load_dataset("csv", data_files=data_files, cache_dir=model_args.cache_dir) + datasets = load_dataset( + "csv", + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) else: # Loading a dataset from local json files datasets = load_dataset("json", data_files=data_files, cache_dir=model_args.cache_dir) diff --git a/examples/tensorflow/token-classification/run_ner.py b/examples/tensorflow/token-classification/run_ner.py index acb72855666d..e580ed94b061 100644 --- a/examples/tensorflow/token-classification/run_ner.py +++ b/examples/tensorflow/token-classification/run_ner.py @@ -266,7 +266,11 @@ def main(): # download the dataset. if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. - raw_datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name) + raw_datasets = load_dataset( + data_args.dataset_name, + data_args.dataset_config_name, + use_auth_token=True if model_args.use_auth_token else None, + ) else: data_files = {} if data_args.train_file is not None: @@ -274,7 +278,11 @@ def main(): if data_args.validation_file is not None: data_files["validation"] = data_args.validation_file extension = data_args.train_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files) + raw_datasets = load_dataset( + extension, + data_files=data_files, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. diff --git a/examples/tensorflow/translation/run_translation.py b/examples/tensorflow/translation/run_translation.py index fce150b712ad..c6921bbf3c51 100644 --- a/examples/tensorflow/translation/run_translation.py +++ b/examples/tensorflow/translation/run_translation.py @@ -347,7 +347,10 @@ def main(): if data_args.dataset_name is not None: # Downloading and loading a dataset from the hub. raw_datasets = load_dataset( - data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir + data_args.dataset_name, + data_args.dataset_config_name, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, ) else: data_files = {} @@ -357,7 +360,12 @@ def main(): if data_args.validation_file is not None: data_files["validation"] = data_args.validation_file extension = data_args.validation_file.split(".")[-1] - raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) + raw_datasets = load_dataset( + extension, + data_files=data_files, + cache_dir=model_args.cache_dir, + use_auth_token=True if model_args.use_auth_token else None, + ) # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # https://huggingface.co/docs/datasets/loading_datasets.html. # endregion From cb50ff9caeb14f45abd9ac8ccf75355f175b588e Mon Sep 17 00:00:00 2001 From: SaulLu <55560583+SaulLu@users.noreply.github.com> Date: Mon, 4 Apr 2022 16:57:24 +0200 Subject: [PATCH 21/34] add a test checking the format of `convert_tokens_to_string`'s output (#16540) * add new tests * add comment to overridden tests --- tests/byt5/test_tokenization_byt5.py | 11 +++++++++++ tests/perceiver/test_tokenization_perceiver.py | 11 +++++++++++ tests/test_tokenization_common.py | 9 +++++++++ tests/wav2vec2/test_tokenization_wav2vec2.py | 11 +++++++++++ .../test_tokenization_wav2vec2_phoneme.py | 11 +++++++++++ 5 files changed, 53 insertions(+) diff --git a/tests/byt5/test_tokenization_byt5.py b/tests/byt5/test_tokenization_byt5.py index afdcae0ee389..eb210530f0f3 100644 --- a/tests/byt5/test_tokenization_byt5.py +++ b/tests/byt5/test_tokenization_byt5.py @@ -321,3 +321,14 @@ def test_pretokenized_inputs(self): # tests all ids in vocab => vocab doesn't exist so unnecessary to test def test_conversion_reversible(self): pass + + def test_convert_tokens_to_string_format(self): + # The default common tokenizer tests uses invalid tokens for ByT5 that can only accept one-character strings + # and special added tokens as tokens + tokenizers = self.get_tokenizers(fast=True, do_lower_case=True) + for tokenizer in tokenizers: + with self.subTest(f"{tokenizer.__class__.__name__}"): + tokens = ["t", "h", "i", "s", " ", "i", "s", " ", "a", " ", "t", "e", "x", "t", ""] + string = tokenizer.convert_tokens_to_string(tokens) + + self.assertIsInstance(string, str) diff --git a/tests/perceiver/test_tokenization_perceiver.py b/tests/perceiver/test_tokenization_perceiver.py index 214e6aff38e9..0b6b7d4c75a8 100644 --- a/tests/perceiver/test_tokenization_perceiver.py +++ b/tests/perceiver/test_tokenization_perceiver.py @@ -286,3 +286,14 @@ def test_pretokenized_inputs(self): # tests all ids in vocab => vocab doesn't exist so unnecessary to test def test_conversion_reversible(self): pass + + def test_convert_tokens_to_string_format(self): + # The default common tokenizer tests uses invalid tokens for Perceiver that can only accept one-character + # strings and special added tokens as tokens + tokenizers = self.get_tokenizers(fast=True, do_lower_case=True) + for tokenizer in tokenizers: + with self.subTest(f"{tokenizer.__class__.__name__}"): + tokens = ["[CLS]", "t", "h", "i", "s", " ", "i", "s", " ", "a", " ", "t", "e", "s", "t", "[SEP]"] + string = tokenizer.convert_tokens_to_string(tokens) + + self.assertIsInstance(string, str) diff --git a/tests/test_tokenization_common.py b/tests/test_tokenization_common.py index f260fa71fff1..2d26d76b9a08 100644 --- a/tests/test_tokenization_common.py +++ b/tests/test_tokenization_common.py @@ -3713,6 +3713,15 @@ def test_saving_tokenizer_trainer(self): trainer.save_model(os.path.join(tmp_dir, "checkpoint")) self.assertIn("tokenizer.json", os.listdir(os.path.join(tmp_dir, "checkpoint"))) + def test_convert_tokens_to_string_format(self): + tokenizers = self.get_tokenizers(fast=True, do_lower_case=True) + for tokenizer in tokenizers: + with self.subTest(f"{tokenizer.__class__.__name__}"): + tokens = ["this", "is", "a", "test"] + string = tokenizer.convert_tokens_to_string(tokens) + + self.assertIsInstance(string, str) + def test_save_slow_from_fast_and_reload_fast(self): if not self.test_slow_tokenizer or not self.test_rust_tokenizer: # we need both slow and fast versions diff --git a/tests/wav2vec2/test_tokenization_wav2vec2.py b/tests/wav2vec2/test_tokenization_wav2vec2.py index 98c6f126bbfb..775b3916e7a6 100644 --- a/tests/wav2vec2/test_tokenization_wav2vec2.py +++ b/tests/wav2vec2/test_tokenization_wav2vec2.py @@ -753,3 +753,14 @@ def test_tf_encode_plus_sent_to_model(self): @unittest.skip("The tokenizer shouldn't be used to encode input IDs (except for labels), only to decode.") def test_torch_encode_plus_sent_to_model(self): pass + + def test_convert_tokens_to_string_format(self): + # The default common tokenizer tests assumes that the output of `convert_tokens_to_string` is a string which + # is not the case for Wav2vec2. + tokenizers = self.get_tokenizers(fast=True, do_lower_case=True) + for tokenizer in tokenizers: + with self.subTest(f"{tokenizer.__class__.__name__}"): + tokens = ["T", "H", "I", "S", "|", "I", "S", "|", "A", "|", "T", "E", "X", "T"] + output = tokenizer.convert_tokens_to_string(tokens) + + self.assertIsInstance(output["text"], str) diff --git a/tests/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py b/tests/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py index 73f47010b777..24582cefbbd9 100644 --- a/tests/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py +++ b/tests/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py @@ -398,3 +398,14 @@ def test_tf_encode_plus_sent_to_model(self): @unittest.skip("The tokenizer shouldn't be used to encode input IDs (except for labels), only to decode.") def test_torch_encode_plus_sent_to_model(self): pass + + def test_convert_tokens_to_string_format(self): + # The default common tokenizer tests assumes that the output of `convert_tokens_to_string` is a string which + # is not the case for Wav2Vec2PhonemeCTCTokenizer. + tokenizers = self.get_tokenizers(fast=True, do_lower_case=True) + for tokenizer in tokenizers: + with self.subTest(f"{tokenizer.__class__.__name__}"): + tokens = ["冒", "瑟", "s", "瑟", "z", "蓯", "t", "蓻", "k", "s", "t"] + output = tokenizer.convert_tokens_to_string(tokens) + + self.assertIsInstance(output["text"], str) From 7d6488100fe2ffdcd7726b5026382d14404c42c9 Mon Sep 17 00:00:00 2001 From: Joao Gante Date: Mon, 4 Apr 2022 16:37:33 +0100 Subject: [PATCH 22/34] TF: Finalize `unpack_inputs`-related changes (#16499) * Add unpack_inputs to remaining models * removed kwargs to `call()` in TF models * fix TF T5 tests --- src/transformers/modeling_tf_utils.py | 31 ++++++++------ .../models/albert/modeling_tf_albert.py | 8 ---- .../models/bart/modeling_tf_bart.py | 3 -- .../models/bert/modeling_tf_bert.py | 9 ----- .../blenderbot/modeling_tf_blenderbot.py | 3 -- .../modeling_tf_blenderbot_small.py | 3 -- .../models/clip/modeling_tf_clip.py | 12 ------ .../models/convbert/modeling_tf_convbert.py | 7 ---- .../models/convnext/modeling_tf_convnext.py | 3 -- .../models/ctrl/modeling_tf_ctrl.py | 4 -- .../models/deberta/modeling_tf_deberta.py | 6 --- .../deberta_v2/modeling_tf_deberta_v2.py | 6 --- .../distilbert/modeling_tf_distilbert.py | 7 ---- .../models/dpr/modeling_tf_dpr.py | 7 ---- .../models/electra/modeling_tf_electra.py | 8 ---- .../modeling_tf_encoder_decoder.py | 22 +++------- .../models/flaubert/modeling_tf_flaubert.py | 3 -- .../models/funnel/modeling_tf_funnel.py | 9 ----- .../models/gpt2/modeling_tf_gpt2.py | 5 --- .../models/gptj/modeling_tf_gptj.py | 5 --- .../models/layoutlm/modeling_tf_layoutlm.py | 5 --- .../models/led/modeling_tf_led.py | 5 +-- .../longformer/modeling_tf_longformer.py | 7 ---- .../models/lxmert/modeling_tf_lxmert.py | 3 -- .../models/marian/modeling_tf_marian.py | 3 -- .../models/mbart/modeling_tf_mbart.py | 5 +-- .../mobilebert/modeling_tf_mobilebert.py | 9 ----- .../models/mpnet/modeling_tf_mpnet.py | 6 --- .../models/openai/modeling_tf_openai.py | 5 --- .../models/pegasus/modeling_tf_pegasus.py | 3 -- .../models/rembert/modeling_tf_rembert.py | 8 ---- .../models/roberta/modeling_tf_roberta.py | 8 ---- .../models/roformer/modeling_tf_roformer.py | 8 ---- .../modeling_tf_speech_to_text.py | 2 - src/transformers/models/t5/modeling_tf_t5.py | 21 ++++++++-- .../models/tapas/modeling_tf_tapas.py | 5 --- .../transfo_xl/modeling_tf_transfo_xl.py | 4 -- .../modeling_tf_vision_encoder_decoder.py | 20 +++------- .../models/vit/modeling_tf_vit.py | 3 -- .../models/vit_mae/modeling_tf_vit_mae.py | 3 -- .../models/xlm/modeling_tf_xlm.py | 7 ---- .../models/xlnet/modeling_tf_xlnet.py | 7 ---- ...tf_{{cookiecutter.lowercase_modelname}}.py | 11 ----- tests/convbert/test_modeling_tf_convbert.py | 1 - tests/t5/test_modeling_tf_t5.py | 5 +++ tests/test_modeling_tf_common.py | 40 +++++++++++-------- 46 files changed, 78 insertions(+), 287 deletions(-) diff --git a/src/transformers/modeling_tf_utils.py b/src/transformers/modeling_tf_utils.py index a28a09425087..ee5b32886b07 100644 --- a/src/transformers/modeling_tf_utils.py +++ b/src/transformers/modeling_tf_utils.py @@ -312,10 +312,12 @@ def booleans_processing(config, **kwargs): final_booleans = {} if tf.executing_eagerly(): - # Pure conv models (such as ConvNext) do not have `output_attentions` - final_booleans["output_attentions"] = kwargs.get("output_attentions", None) - if final_booleans["output_attentions"] is None: - final_booleans["output_attentions"] = config.output_attentions + # Pure conv models (such as ConvNext) do not have `output_attentions`. If the signature has + # `output_attentions`, it will be present here in `kwargs`, even if unset (in that case, as `None`) + if "output_attentions" in kwargs: + final_booleans["output_attentions"] = ( + kwargs["output_attentions"] if kwargs["output_attentions"] is not None else config.output_attentions + ) final_booleans["output_hidden_states"] = ( kwargs["output_hidden_states"] if kwargs["output_hidden_states"] is not None @@ -330,7 +332,10 @@ def booleans_processing(config, **kwargs): kwargs["use_cache"] if kwargs["use_cache"] is not None else getattr(config, "use_cache", None) ) else: - final_booleans["output_attentions"] = config.output_attentions + # Pure conv models (such as ConvNext) do not have `output_attentions`. If the signature has + # `output_attentions`, it will be present here in `kwargs`, even if unset (in that case, as `None`) + if "output_attentions" in kwargs: + final_booleans["output_attentions"] = config.output_attentions final_booleans["output_hidden_states"] = config.output_hidden_states if kwargs.get("return_dict", None) not in (None, True): @@ -403,7 +408,7 @@ def input_processing(func, config, input_ids, **kwargs): Two lists, one for the missing layers, and another one for the unexpected layers. """ signature = dict(inspect.signature(func).parameters) - signature.pop("kwargs", None) + has_kwargs = bool(signature.pop("kwargs", None)) signature.pop("self", None) parameter_names = list(signature.keys()) output = {} @@ -433,12 +438,14 @@ def input_processing(func, config, input_ids, **kwargs): elif "past_key_values" in kwargs["kwargs_call"] and "past" in parameter_names: kwargs["past"] = kwargs["kwargs_call"].pop("past_key_values") - if len(kwargs["kwargs_call"]) > 0: - raise ValueError( - f"The following keyword arguments are not supported by this model: {list(kwargs['kwargs_call'].keys())}." - ) - - kwargs.pop("kwargs_call") + if has_kwargs: + output["kwargs"] = kwargs.pop("kwargs_call", {}) + else: + if len(kwargs["kwargs_call"]) > 0: + raise ValueError( + f"The following keyword arguments are not supported by this model: {list(kwargs['kwargs_call'].keys())}." + ) + kwargs.pop("kwargs_call") for k, v in kwargs.items(): if isinstance(v, allowed_types) or v is None: diff --git a/src/transformers/models/albert/modeling_tf_albert.py b/src/transformers/models/albert/modeling_tf_albert.py index 51bc5c0ae77b..ae325558cd73 100644 --- a/src/transformers/models/albert/modeling_tf_albert.py +++ b/src/transformers/models/albert/modeling_tf_albert.py @@ -551,7 +551,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -785,7 +784,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: outputs = self.albert( input_ids=input_ids, @@ -854,7 +852,6 @@ def call( labels: Optional[Union[np.ndarray, tf.Tensor]] = None, sentence_order_label: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFAlbertForPreTrainingOutput, Tuple[tf.Tensor]]: r""" Return: @@ -976,7 +973,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1064,7 +1060,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1158,7 +1153,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1244,7 +1238,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1355,7 +1348,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/bart/modeling_tf_bart.py b/src/transformers/models/bart/modeling_tf_bart.py index 106a87c043c9..9cf3e04054ec 100644 --- a/src/transformers/models/bart/modeling_tf_bart.py +++ b/src/transformers/models/bart/modeling_tf_bart.py @@ -679,7 +679,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: """ Args: @@ -834,7 +833,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: r""" Args: @@ -1273,7 +1271,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSeq2SeqLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/bert/modeling_tf_bert.py b/src/transformers/models/bert/modeling_tf_bert.py index 6dfae3d5fb60..5e8775ab0deb 100644 --- a/src/transformers/models/bert/modeling_tf_bert.py +++ b/src/transformers/models/bert/modeling_tf_bert.py @@ -737,7 +737,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: if not self.config.is_decoder: @@ -1067,7 +1066,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1174,7 +1172,6 @@ def call( labels: Optional[Union[np.ndarray, tf.Tensor]] = None, next_sentence_label: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBertForPreTrainingOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1302,7 +1299,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1520,7 +1516,6 @@ def call( return_dict: Optional[bool] = None, next_sentence_label: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFNextSentencePredictorOutput, Tuple[tf.Tensor]]: r""" Return: @@ -1628,7 +1623,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1723,7 +1717,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1857,7 +1850,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1949,7 +1941,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/blenderbot/modeling_tf_blenderbot.py b/src/transformers/models/blenderbot/modeling_tf_blenderbot.py index 80236fab0211..4225f8e14e58 100644 --- a/src/transformers/models/blenderbot/modeling_tf_blenderbot.py +++ b/src/transformers/models/blenderbot/modeling_tf_blenderbot.py @@ -662,7 +662,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -823,7 +822,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: @@ -1276,7 +1274,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFSeq2SeqLMOutput]: r""" labels (`tf.tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py b/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py index af575e6418b7..2d7fe2af6137 100644 --- a/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py +++ b/src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py @@ -667,7 +667,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -827,7 +826,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: @@ -1253,7 +1251,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFSeq2SeqLMOutput]: r""" labels (`tf.tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/clip/modeling_tf_clip.py b/src/transformers/models/clip/modeling_tf_clip.py index f8192ac7aa05..366d0a9eb1dd 100644 --- a/src/transformers/models/clip/modeling_tf_clip.py +++ b/src/transformers/models/clip/modeling_tf_clip.py @@ -504,7 +504,6 @@ def call( output_hidden_states: bool, return_dict: bool, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: input_shape = shape_list(input_ids) @@ -593,7 +592,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: if input_ids is None: raise ValueError("You have to specify input_ids") @@ -632,7 +630,6 @@ def call( output_hidden_states: bool, return_dict: bool, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: embedding_output = self.embeddings(pixel_values=pixel_values) @@ -683,7 +680,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: if pixel_values is None: @@ -762,7 +758,6 @@ def get_text_features( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> tf.Tensor: if input_ids is None: @@ -796,7 +791,6 @@ def get_image_features( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> tf.Tensor: if pixel_values is None: raise ValueError("You have to specify pixel_values") @@ -826,7 +820,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFCLIPOutput, Tuple[tf.Tensor]]: if input_ids is None: @@ -1058,7 +1051,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: r""" Returns: @@ -1153,7 +1145,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: r""" Returns: @@ -1258,7 +1249,6 @@ def get_text_features( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> tf.Tensor: r""" Returns: @@ -1297,7 +1287,6 @@ def get_image_features( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> tf.Tensor: r""" Returns: @@ -1345,7 +1334,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFCLIPOutput, Tuple[tf.Tensor]]: r""" Returns: diff --git a/src/transformers/models/convbert/modeling_tf_convbert.py b/src/transformers/models/convbert/modeling_tf_convbert.py index f167325527b6..8ec1b18ae748 100644 --- a/src/transformers/models/convbert/modeling_tf_convbert.py +++ b/src/transformers/models/convbert/modeling_tf_convbert.py @@ -581,7 +581,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") @@ -751,7 +750,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.convbert( input_ids=input_ids, @@ -870,7 +868,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFMaskedLMOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -979,7 +976,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1073,7 +1069,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFMultipleChoiceModelOutput]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1188,7 +1183,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFTokenClassifierOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1268,7 +1262,6 @@ def call( start_positions: Optional[tf.Tensor] = None, end_positions: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFQuestionAnsweringModelOutput]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/convnext/modeling_tf_convnext.py b/src/transformers/models/convnext/modeling_tf_convnext.py index b952b6775248..1cb1b71b6130 100644 --- a/src/transformers/models/convnext/modeling_tf_convnext.py +++ b/src/transformers/models/convnext/modeling_tf_convnext.py @@ -293,7 +293,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: output_hidden_states = ( output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states @@ -439,7 +438,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: r""" Returns: @@ -518,7 +516,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/ctrl/modeling_tf_ctrl.py b/src/transformers/models/ctrl/modeling_tf_ctrl.py index 89d3ef561141..2a58467119ae 100644 --- a/src/transformers/models/ctrl/modeling_tf_ctrl.py +++ b/src/transformers/models/ctrl/modeling_tf_ctrl.py @@ -268,7 +268,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): # If using past key value states, only the last tokens @@ -541,7 +540,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.transformer( input_ids=input_ids, @@ -653,7 +651,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -765,7 +762,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/deberta/modeling_tf_deberta.py b/src/transformers/models/deberta/modeling_tf_deberta.py index c97b676596fb..90ec5ca2c89e 100644 --- a/src/transformers/models/deberta/modeling_tf_deberta.py +++ b/src/transformers/models/deberta/modeling_tf_deberta.py @@ -928,7 +928,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -1096,7 +1095,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: outputs = self.deberta( input_ids=input_ids, @@ -1156,7 +1154,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1242,7 +1239,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1325,7 +1321,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1404,7 +1399,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py b/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py index 0a77a6057d9d..39cf57a146f6 100644 --- a/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py +++ b/src/transformers/models/deberta_v2/modeling_tf_deberta_v2.py @@ -1028,7 +1028,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -1198,7 +1197,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: outputs = self.deberta( input_ids=input_ids, @@ -1259,7 +1257,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1346,7 +1343,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1430,7 +1426,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1510,7 +1505,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/distilbert/modeling_tf_distilbert.py b/src/transformers/models/distilbert/modeling_tf_distilbert.py index ccae454ebe05..07aeee9e1f97 100644 --- a/src/transformers/models/distilbert/modeling_tf_distilbert.py +++ b/src/transformers/models/distilbert/modeling_tf_distilbert.py @@ -372,7 +372,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") @@ -543,7 +542,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: outputs = self.distilbert( input_ids=input_ids, @@ -647,7 +645,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -735,7 +732,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -817,7 +813,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -911,7 +906,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1021,7 +1015,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/dpr/modeling_tf_dpr.py b/src/transformers/models/dpr/modeling_tf_dpr.py index f2b1a1606e4d..df290f6f5d72 100644 --- a/src/transformers/models/dpr/modeling_tf_dpr.py +++ b/src/transformers/models/dpr/modeling_tf_dpr.py @@ -174,7 +174,6 @@ def call( output_hidden_states: bool = None, return_dict: bool = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor, ...]]: outputs = self.bert_model( input_ids=input_ids, @@ -235,7 +234,6 @@ def call( output_hidden_states: bool = False, return_dict: bool = False, training: bool = False, - **kwargs, ) -> Union[TFDPRReaderOutput, Tuple[tf.Tensor, ...]]: # notations: N - number of questions in a batch, M - number of passages per questions, L - sequence length n_passages, sequence_length = shape_list(input_ids) if input_ids is not None else shape_list(inputs_embeds)[:2] @@ -294,7 +292,6 @@ def call( output_hidden_states: bool = False, return_dict: bool = False, training: bool = False, - **kwargs, ) -> Union[TFDPRReaderOutput, Tuple[tf.Tensor, ...]]: outputs = self.encoder( input_ids=input_ids, @@ -328,7 +325,6 @@ def call( output_hidden_states: bool = False, return_dict: bool = False, training: bool = False, - **kwargs, ) -> Union[TFDPRReaderOutput, Tuple[tf.Tensor, ...]]: outputs = self.encoder( input_ids=input_ids, @@ -560,7 +556,6 @@ def call( output_hidden_states=None, return_dict=None, training: bool = False, - **kwargs, ) -> Union[TFDPRContextEncoderOutput, Tuple[tf.Tensor, ...]]: r""" Return: @@ -648,7 +643,6 @@ def call( output_hidden_states=None, return_dict=None, training: bool = False, - **kwargs, ) -> Union[TFDPRQuestionEncoderOutput, Tuple[tf.Tensor, ...]]: r""" Return: @@ -734,7 +728,6 @@ def call( output_hidden_states: bool = None, return_dict=None, training: bool = False, - **kwargs, ) -> Union[TFDPRReaderOutput, Tuple[tf.Tensor, ...]]: r""" Return: diff --git a/src/transformers/models/electra/modeling_tf_electra.py b/src/transformers/models/electra/modeling_tf_electra.py index 9cbbd4b7e1e5..eccb321f1005 100644 --- a/src/transformers/models/electra/modeling_tf_electra.py +++ b/src/transformers/models/electra/modeling_tf_electra.py @@ -719,7 +719,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: if not self.config.is_decoder: use_cache = False @@ -953,7 +952,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1043,7 +1041,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFElectraForPreTrainingOutput, Tuple[tf.Tensor]]: r""" Returns: @@ -1180,7 +1177,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1290,7 +1286,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1383,7 +1378,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1501,7 +1495,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1583,7 +1576,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py b/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py index 1c59493e1bf7..9e92e767b1b8 100644 --- a/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py +++ b/src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py @@ -23,7 +23,7 @@ from ...configuration_utils import PretrainedConfig from ...modeling_tf_outputs import TFBaseModelOutput, TFSeq2SeqLMOutput -from ...modeling_tf_utils import TFCausalLanguageModelingLoss, TFPreTrainedModel, get_initializer, input_processing +from ...modeling_tf_utils import TFCausalLanguageModelingLoss, TFPreTrainedModel, get_initializer, unpack_inputs from ...tf_utils import shape_list from ...utils import ( DUMMY_INPUTS, @@ -491,6 +491,7 @@ def from_encoder_decoder_pretrained( config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder.config, decoder.config, **kwargs) return cls(encoder=encoder, decoder=decoder, config=config) + @unpack_inputs @add_start_docstrings_to_model_forward(ENCODER_DECODER_INPUTS_DOCSTRING.format("batch_size, sequence_length")) @replace_return_docstrings(output_type=TFSeq2SeqLMOutput, config_class=_CONFIG_FOR_DOC) def call( @@ -559,9 +560,7 @@ def call( if encoder_outputs is None: - encoder_processing_inputs = { - "func": self.encoder.call, - "config": self.encoder.config, + encoder_inputs = { "input_ids": input_ids, "attention_mask": attention_mask, "inputs_embeds": inputs_embeds, @@ -569,14 +568,10 @@ def call( "output_hidden_states": output_hidden_states, "return_dict": return_dict, "training": training, - "kwargs_call": {}, } # Add arguments to encoder from `kwargs_encoder` - for k, v in kwargs_encoder.items(): - encoder_processing_inputs[k] = v - - encoder_inputs = input_processing(**encoder_processing_inputs) + encoder_inputs.update(kwargs_encoder) # Handle the case where the inputs are passed as a single dict which contains `labels`. # The `labels` shouldn't be passed to `self.encoder` below, because it is a based model without this @@ -607,9 +602,7 @@ def call( labels, self.config.pad_token_id, self.config.decoder_start_token_id ) - decoder_processing_inputs = { - "func": self.decoder.call, - "config": self.decoder.config, + decoder_inputs = { "input_ids": decoder_input_ids, "attention_mask": decoder_attention_mask, "encoder_hidden_states": encoder_hidden_states, @@ -621,14 +614,11 @@ def call( "past_key_values": past_key_values, "return_dict": return_dict, "training": training, - "kwargs_call": {}, } # Add arguments to decoder from `kwargs_decoder` - for k, v in kwargs_decoder.items(): - decoder_processing_inputs[k] = v + decoder_inputs.update(kwargs_decoder) - decoder_inputs = input_processing(**decoder_processing_inputs) decoder_outputs = self.decoder(**decoder_inputs) logits = decoder_outputs[0] diff --git a/src/transformers/models/flaubert/modeling_tf_flaubert.py b/src/transformers/models/flaubert/modeling_tf_flaubert.py index 8441e1801730..f751c0f22502 100644 --- a/src/transformers/models/flaubert/modeling_tf_flaubert.py +++ b/src/transformers/models/flaubert/modeling_tf_flaubert.py @@ -258,7 +258,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFBaseModelOutput]: outputs = self.transformer( input_ids=input_ids, @@ -490,7 +489,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFBaseModelOutput]: # removed: src_enc=None, src_len=None @@ -808,7 +806,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFFlaubertWithLMHeadModelOutput]: transformer_outputs = self.transformer( diff --git a/src/transformers/models/funnel/modeling_tf_funnel.py b/src/transformers/models/funnel/modeling_tf_funnel.py index 56e6bf13b494..c1ddef0ad9cd 100644 --- a/src/transformers/models/funnel/modeling_tf_funnel.py +++ b/src/transformers/models/funnel/modeling_tf_funnel.py @@ -761,7 +761,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: @@ -835,7 +834,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") @@ -1117,7 +1115,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFBaseModelOutput]: return self.funnel( input_ids=input_ids, @@ -1165,7 +1162,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFBaseModelOutput]: return self.funnel( @@ -1293,7 +1289,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFMaskedLMOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1369,7 +1364,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFSequenceClassifierOutput]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1455,7 +1449,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFMultipleChoiceModelOutput]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1566,7 +1559,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFTokenClassifierOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1645,7 +1637,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: bool = False, - **kwargs, ) -> Union[Tuple[tf.Tensor], TFQuestionAnsweringModelOutput]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/gpt2/modeling_tf_gpt2.py b/src/transformers/models/gpt2/modeling_tf_gpt2.py index 88b4fb5ed607..8a35208b52e8 100644 --- a/src/transformers/models/gpt2/modeling_tf_gpt2.py +++ b/src/transformers/models/gpt2/modeling_tf_gpt2.py @@ -367,7 +367,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -730,7 +729,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -920,7 +918,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1038,7 +1035,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFGPT2DoubleHeadsModelOutput, Tuple[tf.Tensor]]: r""" mc_token_ids (`tf.Tensor` or `Numpy array` of shape `(batch_size, num_choices)`, *optional*, default to index of the last token of the input): @@ -1195,7 +1191,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutputWithPast, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/gptj/modeling_tf_gptj.py b/src/transformers/models/gptj/modeling_tf_gptj.py index ce5c5d78e5ae..702b163f4719 100644 --- a/src/transformers/models/gptj/modeling_tf_gptj.py +++ b/src/transformers/models/gptj/modeling_tf_gptj.py @@ -390,7 +390,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: @@ -672,7 +671,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ): r""" use_cache (`bool`, *optional*, defaults to `True`): @@ -781,7 +779,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ): r""" labels (`np.ndarray` or `tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -886,7 +883,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ): r""" labels (`np.ndarray` or `tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1011,7 +1007,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ): r""" start_positions (`np.ndarray` or `tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/layoutlm/modeling_tf_layoutlm.py b/src/transformers/models/layoutlm/modeling_tf_layoutlm.py index e6fd771d37e2..86b2fc5a38ae 100644 --- a/src/transformers/models/layoutlm/modeling_tf_layoutlm.py +++ b/src/transformers/models/layoutlm/modeling_tf_layoutlm.py @@ -706,7 +706,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -928,7 +927,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: r""" Returns: @@ -1048,7 +1046,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1172,7 +1169,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1303,7 +1299,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/led/modeling_tf_led.py b/src/transformers/models/led/modeling_tf_led.py index 4519f5df9808..8381d81afb4d 100644 --- a/src/transformers/models/led/modeling_tf_led.py +++ b/src/transformers/models/led/modeling_tf_led.py @@ -1666,7 +1666,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -1911,7 +1910,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: @@ -2333,7 +2331,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): """ Returns: @@ -2429,7 +2426,7 @@ def prepare_inputs_for_generation( decoder_head_mask=None, use_cache=None, encoder_outputs=None, - **kwargs, + **kwargs ): # cut decoder_input_ids if past is used if past is not None: diff --git a/src/transformers/models/longformer/modeling_tf_longformer.py b/src/transformers/models/longformer/modeling_tf_longformer.py index 762f872ee709..850a8113f6ad 100644 --- a/src/transformers/models/longformer/modeling_tf_longformer.py +++ b/src/transformers/models/longformer/modeling_tf_longformer.py @@ -1676,7 +1676,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: @@ -2023,7 +2022,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFLongformerBaseModelOutputWithPooling, Tuple[tf.Tensor]]: outputs = self.longformer( @@ -2100,7 +2098,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFLongformerMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -2194,7 +2191,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFLongformerQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -2340,7 +2336,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFLongformerSequenceClassifierOutput, Tuple[tf.Tensor]]: if global_attention_mask is None and input_ids is not None: @@ -2450,7 +2445,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFLongformerMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -2580,7 +2574,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.array, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFLongformerTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/lxmert/modeling_tf_lxmert.py b/src/transformers/models/lxmert/modeling_tf_lxmert.py index efa812a59654..2101b7cf1f54 100644 --- a/src/transformers/models/lxmert/modeling_tf_lxmert.py +++ b/src/transformers/models/lxmert/modeling_tf_lxmert.py @@ -685,7 +685,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: @@ -946,7 +945,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.lxmert( input_ids, @@ -1282,7 +1280,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" masked_lm_labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/marian/modeling_tf_marian.py b/src/transformers/models/marian/modeling_tf_marian.py index aa766e681544..a696a5648fe4 100644 --- a/src/transformers/models/marian/modeling_tf_marian.py +++ b/src/transformers/models/marian/modeling_tf_marian.py @@ -707,7 +707,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -866,7 +865,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: @@ -1296,7 +1294,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/mbart/modeling_tf_mbart.py b/src/transformers/models/mbart/modeling_tf_mbart.py index 3f2ea655f455..021dc21f21a1 100644 --- a/src/transformers/models/mbart/modeling_tf_mbart.py +++ b/src/transformers/models/mbart/modeling_tf_mbart.py @@ -684,7 +684,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: """ Args: @@ -848,7 +847,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[ TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor] ]: @@ -1278,7 +1276,7 @@ def call( decoder_head_mask: Optional[tf.Tensor] = None, cross_attn_head_mask: Optional[tf.Tensor] = None, encoder_outputs: Optional[TFBaseModelOutput] = None, - past_key_values: [Tuple[Tuple[tf.Tensor]]] = None, + past_key_values: Tuple[Tuple[tf.Tensor]] = None, inputs_embeds: Optional[tf.Tensor] = None, decoder_inputs_embeds: Optional[tf.Tensor] = None, use_cache: Optional[bool] = None, @@ -1287,7 +1285,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[tf.Tensor] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSeq2SeqLMOutput, Tuple[tf.Tensor]]: """ labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/mobilebert/modeling_tf_mobilebert.py b/src/transformers/models/mobilebert/modeling_tf_mobilebert.py index 007be43f5f06..5d1c74252e9b 100644 --- a/src/transformers/models/mobilebert/modeling_tf_mobilebert.py +++ b/src/transformers/models/mobilebert/modeling_tf_mobilebert.py @@ -692,7 +692,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") @@ -928,7 +927,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.mobilebert( input_ids=input_ids, @@ -993,7 +991,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Return: @@ -1092,7 +1089,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1176,7 +1172,6 @@ def call( return_dict=None, next_sentence_label=None, training=False, - **kwargs, ): r""" Return: @@ -1287,7 +1282,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1381,7 +1375,6 @@ def call( start_positions=None, end_positions=None, training=False, - **kwargs, ): r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1498,7 +1491,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1626,7 +1618,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/mpnet/modeling_tf_mpnet.py b/src/transformers/models/mpnet/modeling_tf_mpnet.py index 5edd73c4170b..0e8c61e3403c 100644 --- a/src/transformers/models/mpnet/modeling_tf_mpnet.py +++ b/src/transformers/models/mpnet/modeling_tf_mpnet.py @@ -497,7 +497,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: @@ -686,7 +685,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.mpnet( input_ids=input_ids, @@ -803,7 +801,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -909,7 +906,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1000,7 +996,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1112,7 +1107,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/openai/modeling_tf_openai.py b/src/transformers/models/openai/modeling_tf_openai.py index 80d7a9abd192..40a94c18815e 100644 --- a/src/transformers/models/openai/modeling_tf_openai.py +++ b/src/transformers/models/openai/modeling_tf_openai.py @@ -249,7 +249,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if input_ids is not None and inputs_embeds is not None: @@ -522,7 +521,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFBaseModelOutput]: outputs = self.transformer( @@ -586,7 +584,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFCausalLMOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -669,7 +666,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFOpenAIGPTDoubleHeadsModelOutput]: r""" mc_token_ids (`tf.Tensor` or `Numpy array` of shape `(batch_size, num_choices)`, *optional*, default to index of the last token of the input): @@ -813,7 +809,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFSequenceClassifierOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/pegasus/modeling_tf_pegasus.py b/src/transformers/models/pegasus/modeling_tf_pegasus.py index 26f3ef461198..d7eea1660a40 100644 --- a/src/transformers/models/pegasus/modeling_tf_pegasus.py +++ b/src/transformers/models/pegasus/modeling_tf_pegasus.py @@ -710,7 +710,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -872,7 +871,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: @@ -1305,7 +1303,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): """ labels (`tf.tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/rembert/modeling_tf_rembert.py b/src/transformers/models/rembert/modeling_tf_rembert.py index 9a3892f409fe..f40ea6f6f1c4 100644 --- a/src/transformers/models/rembert/modeling_tf_rembert.py +++ b/src/transformers/models/rembert/modeling_tf_rembert.py @@ -660,7 +660,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: if not self.config.is_decoder: @@ -959,7 +958,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1060,7 +1058,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1155,7 +1152,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1283,7 +1279,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1374,7 +1369,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1494,7 +1488,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1575,7 +1568,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/roberta/modeling_tf_roberta.py b/src/transformers/models/roberta/modeling_tf_roberta.py index a62659582b7e..b63d99a901b7 100644 --- a/src/transformers/models/roberta/modeling_tf_roberta.py +++ b/src/transformers/models/roberta/modeling_tf_roberta.py @@ -624,7 +624,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]: if not self.config.is_decoder: @@ -936,7 +935,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFBaseModelOutputWithPoolingAndCrossAttentions]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1093,7 +1091,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1196,7 +1193,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1353,7 +1349,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1449,7 +1444,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1567,7 +1561,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1655,7 +1648,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/roformer/modeling_tf_roformer.py b/src/transformers/models/roformer/modeling_tf_roformer.py index bed8ecf975c2..020824bb37e2 100644 --- a/src/transformers/models/roformer/modeling_tf_roformer.py +++ b/src/transformers/models/roformer/modeling_tf_roformer.py @@ -614,7 +614,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutput, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -817,7 +816,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: outputs = self.roformer( input_ids=input_ids, @@ -877,7 +875,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -953,7 +950,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFCausalLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1064,7 +1060,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1155,7 +1150,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1269,7 +1263,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1348,7 +1341,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py b/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py index 6c78ab1b58f3..7848630314d4 100755 --- a/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py +++ b/src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py @@ -791,7 +791,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -957,7 +956,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: diff --git a/src/transformers/models/t5/modeling_tf_t5.py b/src/transformers/models/t5/modeling_tf_t5.py index 133103f3e855..d7fd5b30145d 100644 --- a/src/transformers/models/t5/modeling_tf_t5.py +++ b/src/transformers/models/t5/modeling_tf_t5.py @@ -654,7 +654,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ) -> Tuple: if input_ids is not None and inputs_embeds is not None: @@ -1152,7 +1151,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFSeq2SeqModelOutput]: r""" Returns: @@ -1329,7 +1327,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFSeq2SeqLMOutput]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1611,6 +1608,10 @@ def __init__(self, config, *inputs, **kwargs): encoder_config.use_cache = False self.encoder = TFT5MainLayer(encoder_config, embed_tokens, name="encoder") + @property + def dummy_inputs(self): + return {"input_ids": tf.constant(DUMMY_INPUTS)} + def get_encoder(self): return self.encoder @@ -1627,7 +1628,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFBaseModelOutput]: r""" Returns: @@ -1670,6 +1670,19 @@ def call( attentions=encoder_outputs.attentions, ) + @tf.function( + input_signature=[ + { + "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"), + "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"), + } + ] + ) + def serving(self, inputs): + output = self.call(inputs) + + return self.serving_output(output) + # Copied from transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel.serving_output def serving_output(self, output): hs = tf.convert_to_tensor(output.hidden_states) if self.config.output_hidden_states else None diff --git a/src/transformers/models/tapas/modeling_tf_tapas.py b/src/transformers/models/tapas/modeling_tf_tapas.py index 8f2138f2fbad..b6a2f10d1205 100644 --- a/src/transformers/models/tapas/modeling_tf_tapas.py +++ b/src/transformers/models/tapas/modeling_tf_tapas.py @@ -770,7 +770,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: if input_ids is not None and inputs_embeds is not None: @@ -980,7 +979,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: r""" Returns: @@ -1067,7 +1065,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1285,7 +1282,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTableQuestionAnsweringOutput, Tuple[tf.Tensor]]: r""" table_mask (`tf.Tensor` of shape `(batch_size, seq_length)`, *optional*): @@ -1602,7 +1598,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py b/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py index d5dc28c36503..8ad931150edc 100644 --- a/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py +++ b/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py @@ -550,7 +550,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): # the original code for Transformer-XL used shapes [len, bsz] but we want a unified interface in the library @@ -898,7 +897,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.transformer( input_ids=input_ids, @@ -979,7 +977,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): if input_ids is not None: bsz, tgt_len = shape_list(input_ids)[:2] @@ -1088,7 +1085,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[Tuple, TFTransfoXLSequenceClassifierOutputWithPast]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py b/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py index eeaca58c5a01..edc2973a0734 100644 --- a/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py +++ b/src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py @@ -23,7 +23,7 @@ from ...configuration_utils import PretrainedConfig from ...modeling_tf_outputs import TFBaseModelOutput, TFSeq2SeqLMOutput -from ...modeling_tf_utils import TFCausalLanguageModelingLoss, TFPreTrainedModel, get_initializer, input_processing +from ...modeling_tf_utils import TFCausalLanguageModelingLoss, TFPreTrainedModel, get_initializer, unpack_inputs from ...tf_utils import shape_list from ...utils import ( DUMMY_INPUTS, @@ -510,6 +510,7 @@ def from_encoder_decoder_pretrained( config = VisionEncoderDecoderConfig.from_encoder_decoder_configs(encoder.config, decoder.config, **kwargs) return cls(encoder=encoder, decoder=decoder, config=config) + @unpack_inputs @add_start_docstrings_to_model_forward( VISION_ENCODER_DECODER_INPUTS_DOCSTRING.format("batch_size, sequence_length") ) @@ -585,21 +586,16 @@ def call( if encoder_outputs is None: - encoder_processing_inputs = { - "func": self.encoder.call, - "config": self.encoder.config, + encoder_inputs = { "input_ids": pixel_values, "output_attentions": output_attentions, "output_hidden_states": output_hidden_states, "return_dict": return_dict, "training": training, - "kwargs_call": {}, } # Add arguments to encoder from `kwargs_encoder` - encoder_processing_inputs.update(kwargs_encoder) - - encoder_inputs = input_processing(**encoder_processing_inputs) + encoder_inputs.update(kwargs_encoder) if "input_ids" in encoder_inputs: encoder_inputs["pixel_values"] = encoder_inputs.pop("input_ids") @@ -639,9 +635,7 @@ def call( batch_size, sequence_length = shape_list(encoder_hidden_states)[:2] encoder_attention_mask = tf.ones(shape=(batch_size, sequence_length), dtype=tf.int32) - decoder_processing_inputs = { - "func": self.decoder.call, - "config": self.decoder.config, + decoder_inputs = { "input_ids": decoder_input_ids, "attention_mask": decoder_attention_mask, "encoder_hidden_states": encoder_hidden_states, @@ -653,13 +647,11 @@ def call( "past_key_values": past_key_values, "return_dict": return_dict, "training": training, - "kwargs_call": {}, } # Add arguments to decoder from `kwargs_decoder` - decoder_processing_inputs.update(kwargs_decoder) + decoder_inputs.update(kwargs_decoder) - decoder_inputs = input_processing(**decoder_processing_inputs) decoder_outputs = self.decoder(**decoder_inputs) logits = decoder_outputs[0] diff --git a/src/transformers/models/vit/modeling_tf_vit.py b/src/transformers/models/vit/modeling_tf_vit.py index e2e946d8c9f4..cbf935f4f743 100644 --- a/src/transformers/models/vit/modeling_tf_vit.py +++ b/src/transformers/models/vit/modeling_tf_vit.py @@ -486,7 +486,6 @@ def call( interpolate_pos_encoding: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: if pixel_values is None: @@ -656,7 +655,6 @@ def call( interpolate_pos_encoding: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPooling, Tuple[tf.Tensor]]: r""" Returns: @@ -757,7 +755,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/vit_mae/modeling_tf_vit_mae.py b/src/transformers/models/vit_mae/modeling_tf_vit_mae.py index 40f100b64ff1..6ff588fce3d4 100644 --- a/src/transformers/models/vit_mae/modeling_tf_vit_mae.py +++ b/src/transformers/models/vit_mae/modeling_tf_vit_mae.py @@ -647,7 +647,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFViTMAEModelOutput, Tuple[tf.Tensor]]: embedding_output, mask, ids_restore = self.embeddings( pixel_values=pixel_values, training=training, noise=noise @@ -811,7 +810,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFViTMAEModelOutput, Tuple[tf.Tensor]]: r""" Returns: @@ -1028,7 +1026,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFViTMAEForPreTrainingOutput, Tuple[tf.Tensor]]: r""" Returns: diff --git a/src/transformers/models/xlm/modeling_tf_xlm.py b/src/transformers/models/xlm/modeling_tf_xlm.py index dbb994ed47c1..46b41fba3ae7 100644 --- a/src/transformers/models/xlm/modeling_tf_xlm.py +++ b/src/transformers/models/xlm/modeling_tf_xlm.py @@ -360,7 +360,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): # removed: src_enc=None, src_len=None @@ -707,7 +706,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.transformer( input_ids=input_ids, @@ -843,7 +841,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): transformer_outputs = self.transformer( input_ids=input_ids, @@ -917,7 +914,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1025,7 +1021,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): if input_ids is not None: num_choices = shape_list(input_ids)[1] @@ -1150,7 +1145,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1237,7 +1231,6 @@ def call( start_positions=None, end_positions=None, training=False, - **kwargs, ): r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/src/transformers/models/xlnet/modeling_tf_xlnet.py b/src/transformers/models/xlnet/modeling_tf_xlnet.py index 3a77c4845dfd..d81924d3451e 100644 --- a/src/transformers/models/xlnet/modeling_tf_xlnet.py +++ b/src/transformers/models/xlnet/modeling_tf_xlnet.py @@ -597,7 +597,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): if training and use_mems is None: @@ -1152,7 +1151,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): outputs = self.transformer( input_ids=input_ids, @@ -1262,7 +1260,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFXLNetLMHeadModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1394,7 +1391,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFXLNetForSequenceClassificationOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1501,7 +1497,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFXLNetForMultipleChoiceOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size,)`, *optional*): @@ -1623,7 +1618,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFXLNetForTokenClassificationOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): @@ -1711,7 +1705,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFXLNetForQuestionAnsweringSimpleOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` of shape `(batch_size,)`, *optional*): diff --git a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py index da2a0a3828f9..2d9914eebd6c 100644 --- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py +++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py @@ -653,7 +653,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: bool = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: if not self.config.is_decoder: @@ -949,7 +948,6 @@ def call( output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFBaseModelOutputWithPastAndCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1049,7 +1047,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMaskedLMOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1146,7 +1143,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFCausalLMOutputWithCrossAttentions, Tuple[tf.Tensor]]: r""" encoder_hidden_states (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): @@ -1289,7 +1285,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFSequenceClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1379,7 +1374,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFMultipleChoiceModelOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -1506,7 +1500,6 @@ def call( return_dict: Optional[bool] = None, labels: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFTokenClassifierOutput, Tuple[tf.Tensor]]: r""" labels (`tf.Tensor` or `np.ndarray` of shape `(batch_size, sequence_length)`, *optional*): @@ -1588,7 +1581,6 @@ def call( start_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, end_positions: Optional[Union[np.ndarray, tf.Tensor]] = None, training: Optional[bool] = False, - **kwargs, ) -> Union[TFQuestionAnsweringModelOutput, Tuple[tf.Tensor]]: r""" start_positions (`tf.Tensor` or `np.ndarray` of shape `(batch_size,)`, *optional*): @@ -2262,7 +2254,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): """ Args: @@ -2421,7 +2412,6 @@ def call( output_hidden_states=None, return_dict=None, training=False, - **kwargs, ): r""" Args: @@ -2876,7 +2866,6 @@ def call( return_dict=None, labels=None, training=False, - **kwargs, ): """ Returns: diff --git a/tests/convbert/test_modeling_tf_convbert.py b/tests/convbert/test_modeling_tf_convbert.py index e2d68876263a..2ae29c3e4a5a 100644 --- a/tests/convbert/test_modeling_tf_convbert.py +++ b/tests/convbert/test_modeling_tf_convbert.py @@ -355,7 +355,6 @@ def check_encoder_attentions_output(outputs): for model_class in self.all_model_classes: inputs_dict["output_attentions"] = True - inputs_dict["use_cache"] = False config.output_hidden_states = False model = model_class(config) outputs = model(self._prepare_for_class(inputs_dict, model_class)) diff --git a/tests/t5/test_modeling_tf_t5.py b/tests/t5/test_modeling_tf_t5.py index c6585f83b18e..7ac0b33e426b 100644 --- a/tests/t5/test_modeling_tf_t5.py +++ b/tests/t5/test_modeling_tf_t5.py @@ -346,6 +346,11 @@ def test_resize_embeddings(self): self.assertEqual(model.get_input_embeddings().weight.shape[0], len(tokenizer)) self.assertNotEqual(model.get_input_embeddings().weight.shape[0], original_vocab_size) + # This test is run in `TFT5EncoderOnlyModelTest`, where the main layer has the same inputs as the model + @unittest.skip(reason="The inputs of the Main Layer are different.") + def test_keras_save_load(self): + pass + class TFT5EncoderOnlyModelTester: def __init__( diff --git a/tests/test_modeling_tf_common.py b/tests/test_modeling_tf_common.py index 9473a50f53aa..b72034de6958 100644 --- a/tests/test_modeling_tf_common.py +++ b/tests/test_modeling_tf_common.py @@ -573,7 +573,12 @@ def check_pt_tf_models(tf_model, pt_model): pt_model = pt_model_class(config) tf_inputs_dict = self._prepare_for_class(inputs_dict, model_class) - tf_inputs_dict_maybe_with_labels = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + tf_inputs_dict_maybe_with_labels = self._prepare_for_class( + inputs_dict, + model_class, + # Not all models accept "labels" in the forward pass (yet :) ) + return_labels=True if "labels" in inspect.signature(model_class.call).parameters.keys() else False, + ) # Check we can load pt model in tf and vice-versa with model => model functions tf_model = transformers.load_pytorch_model_in_tf2_model(tf_model, pt_model, tf_inputs=tf_inputs_dict) @@ -722,7 +727,6 @@ def check_encoder_attentions_output(outputs): for model_class in self.all_model_classes: inputs_dict["output_attentions"] = True - inputs_dict["use_cache"] = False config.output_hidden_states = False model = model_class(config) outputs = model(self._prepare_for_class(inputs_dict, model_class)) @@ -944,10 +948,6 @@ def recursive_check(tuple_object, dict_object): dict_inputs = self._prepare_for_class(inputs_dict, model_class) check_equivalence(model, tuple_inputs, dict_inputs) - tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - check_equivalence(model, tuple_inputs, dict_inputs) - tuple_inputs = self._prepare_for_class(inputs_dict, model_class) dict_inputs = self._prepare_for_class(inputs_dict, model_class) check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True}) @@ -956,19 +956,25 @@ def recursive_check(tuple_object, dict_object): dict_inputs = self._prepare_for_class(inputs_dict, model_class) check_equivalence(model, tuple_inputs, dict_inputs, {"output_attentions": True}) - tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True}) + # Not all models accept "labels" in the forward pass (yet :) ) + if "labels" in inspect.signature(model.call).parameters.keys(): + tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + check_equivalence(model, tuple_inputs, dict_inputs) - tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - check_equivalence(model, tuple_inputs, dict_inputs, {"output_attentions": True}) + tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True}) - tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) - check_equivalence( - model, tuple_inputs, dict_inputs, {"output_hidden_states": True, "output_attentions": True} - ) + tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + check_equivalence(model, tuple_inputs, dict_inputs, {"output_attentions": True}) + + tuple_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + dict_inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True) + check_equivalence( + model, tuple_inputs, dict_inputs, {"output_hidden_states": True, "output_attentions": True} + ) def test_inputs_embeds(self): config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() From b442b3348585c871c355a39171e89d6f047aeb95 Mon Sep 17 00:00:00 2001 From: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Date: Mon, 4 Apr 2022 17:50:56 +0200 Subject: [PATCH 23/34] [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586) --- .../speech_encoder_decoder/modeling_speech_encoder_decoder.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py b/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py index 45262ad940fe..3722c123c3bb 100644 --- a/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py +++ b/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py @@ -572,7 +572,7 @@ def forward( decoder_hidden_states=decoder_outputs.hidden_states, decoder_attentions=decoder_outputs.attentions, cross_attentions=decoder_outputs.cross_attentions, - encoder_last_hidden_state=encoder_outputs.last_hidden_state, + encoder_last_hidden_state=encoder_hidden_states, encoder_hidden_states=encoder_outputs.hidden_states, encoder_attentions=encoder_outputs.attentions, ) From 29a3b42737a1ee129f517491e90b627bf8d5c899 Mon Sep 17 00:00:00 2001 From: Andres Codas Date: Mon, 4 Apr 2022 12:20:26 -0400 Subject: [PATCH 24/34] initialize the default rank set on TrainerState (#16530) * initialize the default rank set on TrainerState * fix style --- src/transformers/trainer.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/transformers/trainer.py b/src/transformers/trainer.py index 921b9d27ac08..a2fb10b9e040 100755 --- a/src/transformers/trainer.py +++ b/src/transformers/trainer.py @@ -488,7 +488,11 @@ def __init__( else: self.label_smoother = None - self.state = TrainerState() + self.state = TrainerState( + is_local_process_zero=self.is_local_process_zero(), + is_world_process_zero=self.is_world_process_zero(), + ) + self.control = TrainerControl() # Internal variable to count flos in each process, will be accumulated in `self.state.total_flos` then # returned to 0 every time flos need to be logged From b96e629676ea57a09ee6e027d27af08ce34a392c Mon Sep 17 00:00:00 2001 From: Sylvain Gugger Date: Mon, 4 Apr 2022 14:06:49 -0400 Subject: [PATCH 25/34] Trigger doc build From 82ad581c3bdf294bff55810c6dac2796c0025c27 Mon Sep 17 00:00:00 2001 From: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue, 5 Apr 2022 10:00:03 +0200 Subject: [PATCH 26/34] Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591) Co-authored-by: ydshieh --- tests/vit_mae/test_modeling_vit_mae.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/vit_mae/test_modeling_vit_mae.py b/tests/vit_mae/test_modeling_vit_mae.py index 6ae62cb1c2c3..8cbde5b2ce92 100644 --- a/tests/vit_mae/test_modeling_vit_mae.py +++ b/tests/vit_mae/test_modeling_vit_mae.py @@ -561,7 +561,7 @@ def test_inference_for_pretraining(self): # forward pass with torch.no_grad(): - outputs = model(**inputs, noise=torch.from_numpy(noise)) + outputs = model(**inputs, noise=torch.from_numpy(noise).to(device=torch_device)) # verify the logits expected_shape = torch.Size((1, 196, 768)) From cfb63da0bffe11a0609fdaa117d2d60d7b6b9ac1 Mon Sep 17 00:00:00 2001 From: SaulLu <55560583+SaulLu@users.noreply.github.com> Date: Tue, 5 Apr 2022 10:50:22 +0200 Subject: [PATCH 27/34] add a template to add missing tokenization test (#16553) * add a template to add missing tokenization test * add cookiecutter setting * improve doc * Update templates/adding_a_missing_tokenization_test/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --- .../README.md | 39 ++++++++++ ...on_{{cookiecutter.lowercase_modelname}}.py | 78 +++++++++++++++++++ .../cookiecutter.json | 10 +++ 3 files changed, 127 insertions(+) create mode 100644 templates/adding_a_missing_tokenization_test/README.md create mode 100644 templates/adding_a_missing_tokenization_test/cookiecutter-template-{{cookiecutter.modelname}}/test_tokenization_{{cookiecutter.lowercase_modelname}}.py create mode 100644 templates/adding_a_missing_tokenization_test/cookiecutter.json diff --git a/templates/adding_a_missing_tokenization_test/README.md b/templates/adding_a_missing_tokenization_test/README.md new file mode 100644 index 000000000000..935f21c5ca8a --- /dev/null +++ b/templates/adding_a_missing_tokenization_test/README.md @@ -0,0 +1,39 @@ + + +This folder contains a template to add a tokenization test. + +## Usage + +Using the `cookiecutter` utility requires to have all the `dev` dependencies installed. + +Let's first [fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo) the `transformers` repo on github. Once it's done you can clone your fork and install `transformers` in our environment: + +```shell script +git clone https://github.com/YOUR-USERNAME/transformers +cd transformers +pip install -e ".[dev]" +``` + +Once the installation is done, you can generate the template by running the following command. Be careful, the template will be generated inside a new folder in your current working directory. + +```shell script +cookiecutter path-to-the folder/adding_a_missing_tokenization_test/ +``` + +You will then have to answer some questions about the tokenizer for which you want to add tests. The `modelname` should be cased according to the plain text casing, i.e., BERT, RoBERTa, DeBERTa. + +Once the command has finished, you should have a one new file inside the newly created folder named `test_tokenization_Xxx.py`. At this point the template is finished and you can move it to the sub-folder of the corresponding model in the test folder. diff --git a/templates/adding_a_missing_tokenization_test/cookiecutter-template-{{cookiecutter.modelname}}/test_tokenization_{{cookiecutter.lowercase_modelname}}.py b/templates/adding_a_missing_tokenization_test/cookiecutter-template-{{cookiecutter.modelname}}/test_tokenization_{{cookiecutter.lowercase_modelname}}.py new file mode 100644 index 000000000000..631886f6b2eb --- /dev/null +++ b/templates/adding_a_missing_tokenization_test/cookiecutter-template-{{cookiecutter.modelname}}/test_tokenization_{{cookiecutter.lowercase_modelname}}.py @@ -0,0 +1,78 @@ +# coding=utf-8 +# Copyright 2022 {{cookiecutter.authors}}. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" Testing suite for the {{cookiecutter.modelname}} tokenizer. """ + + +import unittest + +{% if cookiecutter.has_slow_class == "True" and cookiecutter.has_fast_class == "True" -%} +from transformers import {{cookiecutter.camelcase_modelname}}Tokenizer, {{cookiecutter.camelcase_modelname}}TokenizerFast +{% elif cookiecutter.has_slow_class == "True" -%} +from transformers import {{cookiecutter.camelcase_modelname}}Tokenizer +{% elif cookiecutter.has_fast_class == "True" -%} +from transformers import {{cookiecutter.camelcase_modelname}}TokenizerFast +{% endif -%} +{% if cookiecutter.has_fast_class == "True" and cookiecutter.slow_tokenizer_use_sentencepiece == "True" -%} +from transformers.testing_utils import require_sentencepiece, require_tokenizers +from ..test_tokenization_common import TokenizerTesterMixin + + +@require_sentencepiece +@require_tokenizers +{% elif cookiecutter.slow_tokenizer_use_sentencepiece == "True" -%} +from transformers.testing_utils import require_sentencepiece +from ..test_tokenization_common import TokenizerTesterMixin + + +@require_sentencepiece +{% elif cookiecutter.has_fast_class == "True" -%} +from transformers.testing_utils import require_tokenizers +from ..test_tokenization_common import TokenizerTesterMixin + + +@require_tokenizers +{% else -%} +from ..test_tokenization_common import TokenizerTesterMixin + + +{% endif -%} +class {{cookiecutter.camelcase_modelname}}TokenizationTest(TokenizerTesterMixin, unittest.TestCase): + {% if cookiecutter.has_slow_class == "True" -%} + tokenizer_class = {{cookiecutter.camelcase_modelname}}Tokenizer + test_slow_tokenizer = True + {% else -%} + tokenizer_class = None + test_slow_tokenizer = False + {% endif -%} + {% if cookiecutter.has_fast_class == "True" -%} + rust_tokenizer_class = {{cookiecutter.camelcase_modelname}}TokenizerFast + test_rust_tokenizer = True + {% else -%} + rust_tokenizer_class = None + test_rust_tokenizer = False + {% endif -%} + {% if cookiecutter.slow_tokenizer_use_sentencepiece == "True" -%} + test_sentencepiece = True + {% endif -%} + # TODO: Check in `TokenizerTesterMixin` if other attributes need to be changed + def setUp(self): + super().setUp() + + raise NotImplementedError( + "Here you have to implement the saving of a toy tokenizer in " + "`self.tmpdirname`." + ) + + # TODO: add tests with hard-coded target values \ No newline at end of file diff --git a/templates/adding_a_missing_tokenization_test/cookiecutter.json b/templates/adding_a_missing_tokenization_test/cookiecutter.json new file mode 100644 index 000000000000..2e53818f9bb6 --- /dev/null +++ b/templates/adding_a_missing_tokenization_test/cookiecutter.json @@ -0,0 +1,10 @@ +{ + "modelname": "BrandNewBERT", + "uppercase_modelname": "BRAND_NEW_BERT", + "lowercase_modelname": "brand_new_bert", + "camelcase_modelname": "BrandNewBert", + "has_slow_class": ["True", "False"], + "has_fast_class": ["True", "False"], + "slow_tokenizer_use_sentencepiece": ["True", "False"], + "authors": "The HuggingFace Team" +} From e98825cf54a9a8065aa68dc49543fc6d3373cd46 Mon Sep 17 00:00:00 2001 From: Francesco Saverio Zuppichini Date: Tue, 5 Apr 2022 11:56:36 +0200 Subject: [PATCH 28/34] made _load_pretrained_model_low_mem static + bug fix (#16548) --- src/transformers/modeling_utils.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py index 33401c3c093f..0719700c0964 100644 --- a/src/transformers/modeling_utils.py +++ b/src/transformers/modeling_utils.py @@ -2103,8 +2103,8 @@ def retrieve_modules_from_names(self, names, add_prefix=False, remove_prefix=Fal return retrieved_modules - @classmethod - def _load_pretrained_model_low_mem(cls, model, loaded_state_dict_keys, resolved_archive_file): + @staticmethod + def _load_pretrained_model_low_mem(model, loaded_state_dict_keys, resolved_archive_file): """ This is an experimental function that loads the model using ~1.x model size CPU memory @@ -2159,7 +2159,7 @@ def find_submodule_and_param_name(model, long_key): resolved_archive_file = [resolved_archive_file] for archive_file in resolved_archive_file: - state_dict = torch.load(resolved_archive_file, map_location="cpu") + state_dict = torch.load(archive_file, map_location="cpu") # materialize state_dict entries one by one on CPU for k in loaded_state_dict_keys: From 83362822cedd2df4319ac6a23fd69461b16fa813 Mon Sep 17 00:00:00 2001 From: Suraj Patil Date: Tue, 5 Apr 2022 12:26:03 +0200 Subject: [PATCH 29/34] handle torch_dtype in low cpu mem usage (#16580) --- src/transformers/modeling_utils.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py index 0719700c0964..a1a0ad7d36fd 100644 --- a/src/transformers/modeling_utils.py +++ b/src/transformers/modeling_utils.py @@ -2165,7 +2165,8 @@ def find_submodule_and_param_name(model, long_key): for k in loaded_state_dict_keys: submodule, param_name = find_submodule_and_param_name(model, k) if submodule is not None: - new_val = state_dict[k] + param_dtype = getattr(submodule, param_name).dtype + new_val = state_dict[k].to(param_dtype) if isinstance(getattr(submodule, param_name), torch.nn.Parameter): new_val = torch.nn.Parameter(new_val) setattr(submodule, param_name, new_val) From 85f2bd96c2cac9a7842684b5fdf30329c74f0d0c Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 5 Apr 2022 14:15:02 +0200 Subject: [PATCH 30/34] [Doctests] Correct filenaming (#16599) * [Doctests] Correct filenaming * improve quicktour * make style --- docs/source/en/quicktour.mdx | 14 +++++++------- docs/source/es/quicktour.mdx | 13 ++++++------- utils/documentation_tests.txt | 18 +++--------------- 3 files changed, 16 insertions(+), 29 deletions(-) diff --git a/docs/source/en/quicktour.mdx b/docs/source/en/quicktour.mdx index 1fc4f8b865dc..0d7edd630702 100644 --- a/docs/source/en/quicktour.mdx +++ b/docs/source/en/quicktour.mdx @@ -115,23 +115,23 @@ Create a [`pipeline`] with the task you want to solve for and the model you want >>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h") ``` -Next, load a dataset (see the 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [SUPERB](https://huggingface.co/datasets/superb) dataset: +Next, load a dataset (see the 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset: ```py >>> import datasets ->>> dataset = datasets.load_dataset("superb", name="asr", split="test") # doctest: +IGNORE_RESULT +>>> dataset = datasets.load_dataset("minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT ``` You can pass a whole dataset pipeline: ```py ->>> files = dataset["file"] +>>> files = dataset["path"] >>> speech_recognizer(files[:4]) -[{'text': 'HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE'}, - {'text': 'STUFFERED INTO YOU HIS BELLY COUNSELLED HIM'}, - {'text': 'AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS'}, - {'text': 'HO BERTIE ANY GOOD IN YOUR MIND'}] +[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, + {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, + {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, + {'text': 'HOW DO I FURN A JOINA COUT'}] ``` For a larger dataset where the inputs are big (like in speech or vision), you will want to pass along a generator instead of a list that loads all the inputs in memory. See the [pipeline documentation](./main_classes/pipelines) for more information. diff --git a/docs/source/es/quicktour.mdx b/docs/source/es/quicktour.mdx index 8b400867099e..67ed7e7bb5c2 100644 --- a/docs/source/es/quicktour.mdx +++ b/docs/source/es/quicktour.mdx @@ -115,23 +115,22 @@ Crea un [`pipeline`] con la tarea que deseas resolver y el modelo que quieres us >>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) ``` -A continuaci贸n, carga el dataset (ve 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) para m谩s detalles) sobre el que quisieras iterar. Por ejemplo, vamos a cargar el dataset [SUPERB](https://huggingface.co/datasets/superb): +A continuaci贸n, carga el dataset (ve 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) para m谩s detalles) sobre el que quisieras iterar. Por ejemplo, vamos a cargar el dataset [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14): ```py >>> import datasets ->>> dataset = datasets.load_dataset("superb", name="asr", split="test") # doctest: +IGNORE_RESULT +>>> dataset = datasets.load_dataset("minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT ``` Puedes pasar un pipeline para un dataset: ```py ->>> files = dataset["file"] +>>> files = dataset["path"] >>> speech_recognizer(files[:4]) -[{'text': 'HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE'}, - {'text': 'STUFFERED INTO YOU HIS BELLY COUNSELLED HIM'}, - {'text': 'AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS'}, - {'text': 'HO BERTIE ANY GOOD IN YOUR MIND'}] +[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, + {'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, + {'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, ``` Para un dataset m谩s grande, donde los inputs son de mayor tama帽o (como en habla/audio o visi贸n), querr谩s pasar un generador en lugar de una lista que carga todos los inputs en memoria. Ve la [documentaci贸n del pipeline](./main_classes/pipelines) para m谩s informaci贸n. diff --git a/utils/documentation_tests.txt b/utils/documentation_tests.txt index 372e63ad232b..f88974ed434e 100644 --- a/utils/documentation_tests.txt +++ b/utils/documentation_tests.txt @@ -1,17 +1,10 @@ -docs/source/quicktour.mdx -docs/source/quicktour.mdx -docs/source/task_summary.mdx -docs/source/task_summary.mdx +docs/source/en/quicktour.mdx +docs/source/en/task_summary.mdx src/transformers/generation_utils.py -src/transformers/generation_utils.py -src/transformers/models/bart/modeling_bart.py src/transformers/models/bart/modeling_bart.py src/transformers/models/beit/modeling_beit.py src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py -src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py src/transformers/models/blenderbot/modeling_blenderbot.py -src/transformers/models/blenderbot/modeling_blenderbot.py -src/transformers/models/blenderbot_small/modeling_blenderbot_small.py src/transformers/models/blenderbot_small/modeling_blenderbot_small.py src/transformers/models/convnext/modeling_convnext.py src/transformers/models/data2vec/modeling_data2vec_audio.py @@ -20,16 +13,11 @@ src/transformers/models/dpt/modeling_dpt.py src/transformers/models/glpn/modeling_glpn.py src/transformers/models/hubert/modeling_hubert.py src/transformers/models/marian/modeling_marian.py -src/transformers/models/marian/modeling_marian.py -src/transformers/models/mbart/modeling_mbart.py src/transformers/models/mbart/modeling_mbart.py src/transformers/models/pegasus/modeling_pegasus.py -src/transformers/models/pegasus/modeling_pegasus.py -src/transformers/models/plbart/modeling_plbart.py src/transformers/models/plbart/modeling_plbart.py src/transformers/models/poolformer/modeling_poolformer.py src/transformers/models/resnet/modeling_resnet.py -src/transformers/models/resnet/modeling_resnet.py src/transformers/models/roberta/modeling_roberta.py src/transformers/models/roberta/modeling_tf_roberta.py src/transformers/models/segformer/modeling_segformer.py @@ -50,4 +38,4 @@ src/transformers/models/vit_mae/modeling_vit_mae.py src/transformers/models/wav2vec2/modeling_wav2vec2.py src/transformers/models/wav2vec2/tokenization_wav2vec2.py src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py -src/transformers/models/wavlm/modeling_wavlm.py \ No newline at end of file +src/transformers/models/wavlm/modeling_wavlm.py From d726e679e48e82ed42e7646521f1172e80033ea8 Mon Sep 17 00:00:00 2001 From: Matt Date: Tue, 5 Apr 2022 14:23:27 +0100 Subject: [PATCH 31/34] Adding new train_step logic to make things less confusing for users (#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --- src/transformers/modeling_tf_utils.py | 171 ++++++++++++------ ...tf_{{cookiecutter.lowercase_modelname}}.py | 9 + tests/convnext/test_modeling_tf_convnext.py | 7 + tests/t5/test_modeling_tf_t5.py | 30 --- tests/test_modeling_tf_common.py | 50 +++++ 5 files changed, 184 insertions(+), 83 deletions(-) diff --git a/src/transformers/modeling_tf_utils.py b/src/transformers/modeling_tf_utils.py index ee5b32886b07..efa37e32bd75 100644 --- a/src/transformers/modeling_tf_utils.py +++ b/src/transformers/modeling_tf_utils.py @@ -38,7 +38,6 @@ from .configuration_utils import PretrainedConfig from .dynamic_module_utils import custom_object_save from .generation_tf_utils import TFGenerationMixin -from .modeling_tf_outputs import TFSeq2SeqLMOutput from .tf_utils import shape_list from .tokenization_utils_base import BatchEncoding from .utils import ( @@ -53,6 +52,7 @@ RevisionNotFoundError, cached_path, copy_func, + find_labels, has_file, hf_bucket_url, is_offline_mode, @@ -715,6 +715,7 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu base_model_prefix = "" main_input_name = "input_ids" _auto_class = None + _using_dummy_loss = None # a list of re pattern of tensor names to ignore from the model when loading the model weights # (and avoid unnecessary warnings). @@ -899,24 +900,46 @@ def compile( function themselves. """ if loss == "passthrough": + if metrics is not None: + raise ValueError( + "Passing metrics as a dict is not supported when using the internal loss! " + "Please either compile the model with a loss, or remove the metrics argument. " + "Note that advanced metrics using the `KerasMetricCallback` can still be used with the internal " + "loss." + ) logger.warning( "No loss specified in compile() - the model's internal loss computation will be used as the " "loss. Don't panic - this is a common way to train TensorFlow models in Transformers! " - "Please ensure your labels are passed as keys in the input dict so that they are " - "accessible to the model during the forward pass. To disable this behaviour, please pass a " - "loss argument, or explicitly pass loss=None if you do not want your model to compute a loss." + "To disable this behaviour, please pass a loss argument, or explicitly pass " + "`loss=None` if you do not want your model to compute a loss." + ) + loss = dummy_loss + self._using_dummy_loss = True + else: + self._using_dummy_loss = False + parent_args = list(inspect.signature(tf.keras.Model.compile).parameters.keys()) + if "steps_per_execution" in parent_args: + super().compile( + optimizer=optimizer, + loss=loss, + metrics=metrics, + loss_weights=loss_weights, + weighted_metrics=weighted_metrics, + run_eagerly=run_eagerly, + steps_per_execution=steps_per_execution, + **kwargs, + ) + else: + super().compile( + optimizer=optimizer, + loss=loss, + metrics=metrics, + loss_weights=loss_weights, + weighted_metrics=weighted_metrics, + run_eagerly=run_eagerly, + experimental_steps_per_execution=steps_per_execution, + **kwargs, ) - loss = {"loss": dummy_loss} - super().compile( - optimizer=optimizer, - loss=loss, - metrics=metrics, - loss_weights=loss_weights, - weighted_metrics=weighted_metrics, - run_eagerly=run_eagerly, - steps_per_execution=steps_per_execution, - **kwargs, - ) def compute_loss(self, *args, **kwargs): if hasattr(tf.keras.Model, "compute_loss"): @@ -935,40 +958,54 @@ def compute_loss(self, *args, **kwargs): def train_step(self, data): """ A modification of Keras's default `train_step` that cleans up the printed metrics when we use a dummy loss. If - a user specifies a loss at model compile time, this function behaves as the original Keras `train_step`. In - this case, it expects the same `data` as the original function (i.e. `(inputs, labels)`). - - However, when the model is compiled without specifying the loss AND the expected label columns are passed as - part of the input dictionary, the loss is computed internally (inside the model class) and is used in the - backwards pass. In this case, `data` is a singleton tuple containing `(inputs,)`. + a user specifies a loss at model compile time, this function behaves as the original Keras `train_step`. - This is possible under the aforementioned circumstances because our overriden compile function can set an - additional loss function that reduces a `loss` output, and the model will output a `loss` component (notice the - name matching) containing the loss that was used to train the pre-trained model. + When the model is compiled without specifying the loss, our overridden compile function can set a simple dummy + loss that just reads the loss output head of the model. When using this dummy loss, inputs can be passed either + as keys in the input dictionary, or as normal Keras labels. """ + # These are the only transformations `Model.fit` applies to user-input # data when a `tf.data.Dataset` is provided. - data = data_adapter.expand_1d(data) + if not self._using_dummy_loss: + data = data_adapter.expand_1d(data) x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data) - # These next two lines differ from the base method - they avoid issues when the labels are in - # the input dict (and loss is computed internally) - if y is None and "labels" in x: - y = x["labels"] # Stops confusion with metric computations - elif y is None and "input_ids" in x: - # Just make any kind of dummy array to make loss work - y = tf.zeros(tf.shape(x["input_ids"])[0], dtype=tf.int64) + + # When using a dummy loss, we ensure that separate labels are copied to the correct model arguments, + # if those keys are not already present in the input dict + if self._using_dummy_loss and y is not None: + arg_names = list(dict(inspect.signature(self.call).parameters).keys()) + label_kwargs = find_labels(self.__class__) + # If y is a tensor and the model only has one label-like input, map y to that input + if len(label_kwargs) == 1 and isinstance(y, tf.Tensor): + if isinstance(x, tf.Tensor): + x = {arg_names[0]: x} + label_kwarg = next(iter(label_kwargs)) + if label_kwarg not in x: + x[label_kwarg] = y + # Otherwise, copy keys from y to x as long as they weren't already present in x + elif isinstance(y, dict): + if isinstance(x, tf.Tensor): + x = {arg_names[0]: x} + for key, val in y.items(): + if key in arg_names and key not in x: + x[key] = val + # Run forward pass. with tf.GradientTape() as tape: y_pred = self(x, training=True) - loss = self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses) + if self._using_dummy_loss: + loss = self.compiled_loss(y_pred.loss, y_pred.loss, sample_weight, regularization_losses=self.losses) + else: + loss = self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses) # Run backwards pass. self.optimizer.minimize(loss, self.trainable_variables, tape=tape) - # When y_pred is a ModelOutput and y is a tf.Tensor the metrics update - # should be done only with the relevant ModelOutput param that is - # considered by the loss. - if isinstance(y_pred, TFSeq2SeqLMOutput) and isinstance(y, tf.Tensor): - y_pred = y_pred["logits"] - self.compiled_metrics.update_state(y, y_pred, sample_weight) + + # When using the dummy_loss we know metrics are not present, so we can skip a lot of this + if self._using_dummy_loss: + self.compiled_metrics.update_state(y_pred.loss, y_pred.loss, sample_weight) + else: + self.compiled_metrics.update_state(y, y_pred, sample_weight) # Collect metrics to return return_metrics = {} for metric in self.metrics: @@ -985,23 +1022,51 @@ def train_step(self, data): def test_step(self, data): """ - A modification of Keras's default test_step that cleans up the printed metrics when we use a dummy loss. + A modification of Keras's default `test_step` that cleans up the printed metrics when we use a dummy loss. If a + user specifies a loss at model compile time, this function behaves as the original Keras `test_step`. + + When the model is compiled without specifying the loss, our overridden compile function can set a simple dummy + loss that just reads the loss output head of the model. When using this dummy loss, inputs can be passed either + as keys in the input dictionary, or as normal Keras labels. """ - data = data_adapter.expand_1d(data) + # These are the only transformations `Model.fit` applies to user-input + # data when a `tf.data.Dataset` is provided. + if not self._using_dummy_loss: + data = data_adapter.expand_1d(data) x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data) - # These next two lines differ from the base method - they avoid issues when the labels are in - # the input dict (and loss is computed internally) - if y is None and "labels" in x: - y = x["labels"] # Stops confusion with metric computations - elif y is None and "input_ids" in x: - # Just make any kind of dummy array to make loss work - y = tf.zeros(tf.shape(x["input_ids"])[0], dtype=tf.int64) + + # When using a dummy loss, we ensure that separate labels are copied to the correct model arguments, + # if those keys are not already present in the input dict + if self._using_dummy_loss and y is not None: + arg_names = list(dict(inspect.signature(self.call).parameters).keys()) + label_kwargs = find_labels(self.__class__) + # If y is a tensor and the model only has one label-like input, map y to that input + if len(label_kwargs) == 1 and isinstance(y, tf.Tensor): + if isinstance(x, tf.Tensor): + x = {arg_names[0]: x} + label_kwarg = next(iter(label_kwargs)) + if label_kwarg not in x: + x[label_kwarg] = y + # Otherwise, copy keys from y to x as long as they weren't already present in x + elif isinstance(y, dict): + if isinstance(x, tf.Tensor): + x = {arg_names[0]: x} + for key, val in y.items(): + if key in arg_names and key not in x: + x[key] = val + + # Run forward pass. y_pred = self(x, training=False) - self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses) - # Updates stateful loss metrics. - if isinstance(y_pred, TFSeq2SeqLMOutput) and isinstance(y, tf.Tensor): - y_pred = y_pred["logits"] - self.compiled_metrics.update_state(y, y_pred, sample_weight) + if self._using_dummy_loss: + self.compiled_loss(y_pred.loss, y_pred.loss, sample_weight, regularization_losses=self.losses) + else: + self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses) + + # When using the dummy_loss we know metrics are not present, so we can skip a lot of this + if self._using_dummy_loss: + self.compiled_metrics.update_state(y_pred.loss, y_pred.loss, sample_weight) + else: + self.compiled_metrics.update_state(y, y_pred, sample_weight) # Collect metrics to return return_metrics = {} for metric in self.metrics: diff --git a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py index 57fd95dd3ff6..0f4d7824c164 100644 --- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py +++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py @@ -259,6 +259,7 @@ def create_and_check_causal_lm_model_as_decoder( list(prediction_scores.numpy().shape), [self.batch_size, self.seq_length, self.vocab_size] ) + def create_and_check_causal_lm_model_past( self, config, @@ -597,6 +598,10 @@ def test_model(self): config_and_inputs = self.model_tester.prepare_config_and_inputs() self.model_tester.create_and_check_model(*config_and_inputs) + @unittest.skip(reason="Template classes interact badly with this test.") + def test_keras_fit(self): + pass + def test_causal_lm_base_model(self): """Test the base model of the causal LM model @@ -947,6 +952,10 @@ def _get_word_embedding_weight(model, embedding_layer): models_equal = False self.assertTrue(models_equal) + @unittest.skip(reason="Template classes interact badly with this test.") + def test_keras_fit(self): + pass + def _assert_tensors_equal(a, b, atol=1e-12, prefix=""): """If tensors not close, or a and b arent both tensors, raise a nice Assertion error.""" diff --git a/tests/convnext/test_modeling_tf_convnext.py b/tests/convnext/test_modeling_tf_convnext.py index edab09fb69b9..579c27dd27a6 100644 --- a/tests/convnext/test_modeling_tf_convnext.py +++ b/tests/convnext/test_modeling_tf_convnext.py @@ -143,6 +143,13 @@ def setUp(self): def test_inputs_embeds(self): pass + @unittest.skipIf( + not is_tf_available() or len(tf.config.list_physical_devices("GPU")) == 0, + reason="TF (<=2.8) does not support backprop for grouped convolutions on CPU.", + ) + def test_keras_fit(self): + pass + @unittest.skip(reason="ConvNext does not support input and output embeddings") def test_model_common_attributes(self): pass diff --git a/tests/t5/test_modeling_tf_t5.py b/tests/t5/test_modeling_tf_t5.py index 7ac0b33e426b..7445aae53001 100644 --- a/tests/t5/test_modeling_tf_t5.py +++ b/tests/t5/test_modeling_tf_t5.py @@ -804,33 +804,3 @@ def test_translation_en_to_ro(self): translation = tok.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=False) self.assertEqual(translation, expected_translation) - - def test_finetune_keras_trainer(self): - """Ensure that the model can be fine-tuned via the keras API and - that metrics work as expected. - """ - - # This metric expects to be called with the logits output - def _accuracy(y_true, y_pred): - return tf.keras.metrics.sparse_categorical_crossentropy(y_true[:, 0], y_pred[:, 0]) - - # measure the accuracy of the first token - class FirstTokenAccuracy(tf.keras.metrics.MeanMetricWrapper): - def __init__(self, name="accuracy", **kwargs): - super().__init__(_accuracy, name=name, **kwargs) - - model = self.model - model.compile("adam", metrics=FirstTokenAccuracy()) - tokenizer = T5Tokenizer.from_pretrained("t5-small") - - examples = [ - ("sentiment: Everything is awesome!", "positive"), - ("sentiment: Tensorflow datasets are hard to use", "negative"), - ] - - inputs = dict(tokenizer([x[0] for x in examples], padding=True, return_tensors="tf")) - inputs["labels"] = tokenizer([x[1] for x in examples], return_tensors="tf").input_ids - - model.fit(inputs) - m = model.evaluate(inputs) - self.assertEqual(len(m), 2) diff --git a/tests/test_modeling_tf_common.py b/tests/test_modeling_tf_common.py index b72034de6958..b7b4b68414a7 100644 --- a/tests/test_modeling_tf_common.py +++ b/tests/test_modeling_tf_common.py @@ -1302,6 +1302,56 @@ def test_loss_computation(self): self.assertEqual(loss.shape, [loss_size]) + def test_keras_fit(self): + config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() + for model_class in self.all_model_classes: + model = model_class(config) + if getattr(model, "hf_compute_loss", None): + # Test that model correctly compute the loss with kwargs + prepared_for_class = self._prepare_for_class(inputs_dict.copy(), model_class, return_labels=True) + # Is there a better way to remove these decoder inputs? + prepared_for_class = { + key: val + for key, val in prepared_for_class.items() + if key not in ("head_mask", "decoder_head_mask", "cross_attn_head_mask", "decoder_input_ids") + } + + possible_label_cols = { + "labels", + "label", + "label_ids", + "start_positions", + "start_position", + "end_positions", + "end_position", + "next_sentence_label", + } + label_names = possible_label_cols.intersection(set(prepared_for_class)) + self.assertGreater(len(label_names), 0, msg="No matching label names found!") + labels = {key: val for key, val in prepared_for_class.items() if key in label_names} + inputs_minus_labels = {key: val for key, val in prepared_for_class.items() if key not in label_names} + self.assertGreater(len(inputs_minus_labels), 0) + model.compile(optimizer=tf.keras.optimizers.SGD(0.0), run_eagerly=True) + # Make sure the model fits without crashing regardless of where we pass the labels + history1 = model.fit( + prepared_for_class, + validation_data=prepared_for_class, + steps_per_epoch=1, + validation_steps=1, + shuffle=False, + ) + val_loss1 = history1.history["val_loss"][0] + history2 = model.fit( + inputs_minus_labels, + labels, + validation_data=(inputs_minus_labels, labels), + steps_per_epoch=1, + validation_steps=1, + shuffle=False, + ) + val_loss2 = history2.history["val_loss"][0] + self.assertTrue(np.allclose(val_loss1, val_loss2, atol=1e-2, rtol=1e-3)) + def test_generate_with_headmasking(self): attention_names = ["encoder_attentions", "decoder_attentions", "cross_attentions"] config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common() From 7744ce7befeaf94007e44d66007de9a6be27d58f Mon Sep 17 00:00:00 2001 From: Rishav Chandra Varma Date: Tue, 5 Apr 2022 19:20:45 +0530 Subject: [PATCH 32/34] Adding missing type hints for BigBird model (#16555) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent * Type hints for BigBird * removing typos Co-authored-by: matt --- .../models/big_bird/modeling_big_bird.py | 202 +++++++++--------- 1 file changed, 101 insertions(+), 101 deletions(-) diff --git a/src/transformers/models/big_bird/modeling_big_bird.py b/src/transformers/models/big_bird/modeling_big_bird.py index b765a854009d..85b48170f70c 100755 --- a/src/transformers/models/big_bird/modeling_big_bird.py +++ b/src/transformers/models/big_bird/modeling_big_bird.py @@ -18,7 +18,7 @@ import math import os from dataclasses import dataclass -from typing import Optional, Tuple +from typing import Optional, Tuple, Union import numpy as np import torch @@ -1592,7 +1592,7 @@ def forward( to_mask=None, blocked_encoder_mask=None, return_dict=True, - ): + ) -> Union[BaseModelOutputWithPastAndCrossAttentions, Tuple]: all_hidden_states = () if output_hidden_states else None all_self_attentions = () if output_attentions else None all_cross_attentions = () if output_attentions and self.config.add_cross_attention else None @@ -1986,20 +1986,20 @@ def set_attention_type(self, value: str): ) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - encoder_hidden_states=None, - encoder_attention_mask=None, - past_key_values=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + encoder_hidden_states: Optional[torch.FloatTensor] = None, + encoder_attention_mask: Optional[torch.FloatTensor] = None, + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[BaseModelOutputWithPoolingAndCrossAttentions, Tuple[torch.FloatTensor]]: r""" encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if @@ -2280,18 +2280,18 @@ def set_output_embeddings(self, new_embeddings): @replace_return_docstrings(output_type=BigBirdForPreTrainingOutput, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - labels=None, - next_sentence_label=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.FloatTensor] = None, + next_sentence_label: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[BigBirdForPreTrainingOutput, Tuple[torch.FloatTensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., @@ -2395,19 +2395,19 @@ def set_output_embeddings(self, new_embeddings): ) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - encoder_hidden_states=None, - encoder_attention_mask=None, - labels=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + encoder_hidden_states: Optional[torch.FloatTensor] = None, + encoder_attention_mask: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[MaskedLMOutput, Tuple[torch.FloatTensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., @@ -2493,21 +2493,21 @@ def set_output_embeddings(self, new_embeddings): @replace_return_docstrings(output_type=CausalLMOutputWithCrossAttentions, config_class=_CONFIG_FOR_DOC) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - encoder_hidden_states=None, - encoder_attention_mask=None, - past_key_values=None, - labels=None, - use_cache=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + encoder_hidden_states: Optional[torch.FloatTensor] = None, + encoder_attention_mask: Optional[torch.FloatTensor] = None, + past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, + labels: Optional[torch.LongTensor] = None, + use_cache: Optional[bool] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[CausalLMOutputWithCrossAttentions, Tuple[torch.FloatTensor]]: r""" encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if @@ -2664,17 +2664,17 @@ def __init__(self, config): ) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - labels=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[SequenceClassifierOutput, Tuple[torch.FloatTensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., @@ -2762,17 +2762,17 @@ def __init__(self, config): ) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - labels=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[MultipleChoiceModelOutput, Tuple[torch.FloatTensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., @@ -2858,17 +2858,17 @@ def __init__(self, config): ) def forward( self, - input_ids=None, - attention_mask=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - labels=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[TokenClassifierOutput, Tuple[torch.FloatTensor]]: r""" labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`. @@ -2957,19 +2957,19 @@ def __init__(self, config, add_pooling_layer=False): ) def forward( self, - input_ids=None, - attention_mask=None, + input_ids: torch.LongTensor = None, + attention_mask: Optional[torch.FloatTensor] = None, question_lengths=None, - token_type_ids=None, - position_ids=None, - head_mask=None, - inputs_embeds=None, - start_positions=None, - end_positions=None, - output_attentions=None, - output_hidden_states=None, - return_dict=None, - ): + token_type_ids: Optional[torch.LongTensor] = None, + position_ids: Optional[torch.LongTensor] = None, + head_mask: Optional[torch.FloatTensor] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + start_positions: Optional[torch.LongTensor] = None, + end_positions: Optional[torch.LongTensor] = None, + output_attentions: Optional[bool] = None, + output_hidden_states: Optional[bool] = None, + return_dict: Optional[bool] = None, + ) -> Union[BigBirdForQuestionAnsweringModelOutput, Tuple[torch.FloatTensor]]: r""" start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Labels for position (index) of the start of the labelled span for computing the token classification loss. From 20ef1a0ac58061a0bb6bb1ffbe4863845df27c7b Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 5 Apr 2022 08:13:12 -0700 Subject: [PATCH 33/34] [deepspeed] fix typo, adjust config name (#16597) --- src/transformers/deepspeed.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/transformers/deepspeed.py b/src/transformers/deepspeed.py index 46cfb9730760..6feabdaa8095 100644 --- a/src/transformers/deepspeed.py +++ b/src/transformers/deepspeed.py @@ -250,7 +250,7 @@ def trainer_config_process(self, args): self.fill_match("bf16.enabled", (args.bf16 or args.bf16_full_eval), "bf16|bf16_full_eval") # deepspeed's default mode is fp16 unless there is a config that says differently - if self.is_true("bfoat16.enabled"): + if self.is_true("bf16.enabled"): self._dtype = torch.bfloat16 elif self.is_false("fp16.enabled"): self._dtype = torch.float32 From 578abb1632b4e68581257c8fd33d11135630080b Mon Sep 17 00:00:00 2001 From: Steven Date: Tue, 5 Apr 2022 10:02:50 -0700 Subject: [PATCH 34/34] =?UTF-8?q?=20=F0=9F=96=8D=20apply=20feedback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/en/task_summary.mdx | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/source/en/task_summary.mdx b/docs/source/en/task_summary.mdx index fd30add50729..8323e182a607 100644 --- a/docs/source/en/task_summary.mdx +++ b/docs/source/en/task_summary.mdx @@ -970,7 +970,7 @@ We get the same translation as with the pipeline example. ## Audio classification -Audio classification assigns a class to an audio signal. The Keyword Spotting dataset from the [SUPERB](https://huggingface.co/datasets/superb) benchmark is an example dataset that can be used for audio classification fine-tuning. This dataset contains ten classes of keywords for classification. If you'd like to fine-tune a model for audio classification, take a look at the [run_audio_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/audio-classification/run_audio_classification.py) script or the how-to guide [here](./tasks/audio_classification). +Audio classification assigns a class to an audio signal. The Keyword Spotting dataset from the [SUPERB](https://huggingface.co/datasets/superb) benchmark is an example dataset that can be used for audio classification fine-tuning. This dataset contains ten classes of keywords for classification. If you'd like to fine-tune a model for audio classification, take a look at the [run_audio_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/audio-classification/run_audio_classification.py) script or this [how-to guide](./tasks/audio_classification). The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for audio classification inference: @@ -988,9 +988,9 @@ The following examples demonstrate how to use a [`pipeline`] and a model and tok {'label': 'fearful', 'score': 0.12404385954141617}] ``` -The general process for using a model and tokenizer for audio classification is: +The general process for using a model and feature extractor for audio classification is: -1. Instantiate a tokenizer and a model from the checkpoint name. +1. Instantiate a feature extractor and a model from the checkpoint name. 2. Process the audio signal to be classified with a feature extractor. 3. Pass the input through the model and take the `argmax` to retrieve the most likely class. 4. Convert the class id to a class name with `id2label` to return an interpretable result. @@ -1023,7 +1023,7 @@ The general process for using a model and tokenizer for audio classification is: ## Automatic speech recognition -Automatic speech recognition transcribes an audio signal to text. The [Common Voice](https://huggingface.co/datasets/common_voice) dataset is an example dataset that can be used for automatic speech recognition fine-tuning. It contains an audio file of a speaker and the corresponding sentence. If you'd like to fine-tune a model for automatic speech recognition, take a look at the [run_speech_recognition_ctc.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or [run_speech_recognition_seq2seq.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py) scripts or the how-to guide [here](./tasks/asr). +Automatic speech recognition transcribes an audio signal to text. The [Common Voice](https://huggingface.co/datasets/common_voice) dataset is an example dataset that can be used for automatic speech recognition fine-tuning. It contains an audio file of a speaker and the corresponding sentence. If you'd like to fine-tune a model for automatic speech recognition, take a look at the [run_speech_recognition_ctc.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or [run_speech_recognition_seq2seq.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py) scripts or this [how-to guide](./tasks/asr). The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for automatic speech recognition inference: @@ -1037,9 +1037,9 @@ The following examples demonstrate how to use a [`pipeline`] and a model and tok {'text': "PRESENTETE MISTER VICE PRESIDENT GOVERNOR CONGRESSMEN THOMAS SAN O TE WILAN CONGRESSMAN MILLA MISTER WEBB MSTBELL SCIENIS DISTINGUISHED GUESS AT LADIES AND GENTLEMAN I APPRECIATE TO YOUR PRESIDENT HAVING MADE ME AN HONORARY VISITING PROFESSOR AND I WILL ASSURE YOU THAT MY FIRST LECTURE WILL BE A VERY BRIEF I AM DELIGHTED TO BE HERE AND I'M PARTICULARLY DELIGHTED TO BE HERE ON THIS OCCASION WE MEED AT A COLLEGE NOTED FOR KNOWLEGE IN A CITY NOTED FOR PROGRESS IN A STATE NOTED FOR STRAINTH AN WE STAND IN NEED OF ALL THREE"} ``` -The general process for using a model and tokenizer for automatic speech recognition is: +The general process for using a model and processor for automatic speech recognition is: -1. Instantiate a tokenizer and a model from the checkpoint name. +1. Instantiate a processor (which regroups a feature extractor for input processing and a tokenizer for decoding) and a model from the checkpoint name. 2. Process the audio signal and text with a processor. 3. Pass the input through the model and take the `argmax` to retrieve the predicted text. 4. Decode the text with a tokenizer to obtain the transcription. @@ -1071,7 +1071,7 @@ The general process for using a model and tokenizer for automatic speech recogni ## Image classification -Like text and audio classification, image classification assigns a class to an image. The [CIFAR-100](https://huggingface.co/datasets/cifar100) dataset is an example dataset that can be used for image classification fine-tuning. It contains an image and the corresponding class. If you'd like to fine-tune a model for image classification, take a look at the [run_image_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/image-classification/run_image_classification.py) script or the how-to guide [here](./tasks/image_classification). +Like text and audio classification, image classification assigns a class to an image. The [CIFAR-100](https://huggingface.co/datasets/cifar100) dataset is an example dataset that can be used for image classification fine-tuning. It contains an image and the corresponding class. If you'd like to fine-tune a model for image classification, take a look at the [run_image_classification.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/image-classification/run_image_classification.py) script or this [how-to guide](./tasks/image_classification). The following examples demonstrate how to use a [`pipeline`] and a model and tokenizer for image classification inference: @@ -1091,9 +1091,9 @@ The following examples demonstrate how to use a [`pipeline`] and a model and tok {'label': 'tiger cat', 'score': 0.023034192621707916}] ``` -The general process for using a model and tokenizer for image classification is: +The general process for using a model and feature extractor for image classification is: -1. Instantiate a tokenizer and a model from the checkpoint name. +1. Instantiate a feature extractor and a model from the checkpoint name. 2. Process the image to be classified with a feature extractor. 3. Pass the input through the model and take the `argmax` to retrieve the predicted class. 4. Convert the class id to a class name with `id2label` to return an interpretable result.