diff --git a/docs/source/quicktour.rst b/docs/source/quicktour.rst index b3005b59e8aa..c77da9894c9e 100644 --- a/docs/source/quicktour.rst +++ b/docs/source/quicktour.rst @@ -285,16 +285,24 @@ We can see we get the numbers from before: tensor([[2.2043e-04, 9.9978e-01], [5.3086e-01, 4.6914e-01]], grad_fn=) -If you have labels, you can provide them to the model, it will return a tuple with the loss and the final activations. +If you provide the model with labels in addition to inputs, the model output object will also contain a ``loss`` +attribute: .. code-block:: >>> ## PYTORCH CODE >>> import torch >>> pt_outputs = pt_model(**pt_batch, labels = torch.tensor([1, 0])) + >>> print(pt_outputs) + SequenceClassifierOutput(loss=tensor(0.3167, grad_fn=), logits=tensor([[-4.0833, 4.3364], + [ 0.0818, -0.0418]], grad_fn=), hidden_states=None, attentions=None) >>> ## TENSORFLOW CODE >>> import tensorflow as tf >>> tf_outputs = tf_model(tf_batch, labels = tf.constant([1, 0])) + >>> print(tf_outputs) + TFSequenceClassifierOutput(loss=, logits=, hidden_states=None, attentions=None) Models are standard `torch.nn.Module `__ or `tf.keras.Model `__ so you can use them in your usual training loop. 🤗 @@ -322,6 +330,7 @@ loading a saved PyTorch model in a TensorFlow model, use :func:`~transformers.TF .. code-block:: + from transformers import TFAutoModel tokenizer = AutoTokenizer.from_pretrained(save_directory) model = TFAutoModel.from_pretrained(save_directory, from_pt=True) @@ -329,6 +338,7 @@ and if you are loading a saved TensorFlow model in a PyTorch model, you should u .. code-block:: + from transformers import AutoModel tokenizer = AutoTokenizer.from_pretrained(save_directory) model = AutoModel.from_pretrained(save_directory, from_tf=True) @@ -339,10 +349,12 @@ Lastly, you can also ask the model to return all hidden states and all attention >>> ## PYTORCH CODE >>> pt_outputs = pt_model(**pt_batch, output_hidden_states=True, output_attentions=True) - >>> all_hidden_states, all_attentions = pt_outputs[-2:] + >>> all_hidden_states = pt_outputs.hidden_states + >>> all_attentions = pt_outputs.attentions >>> ## TENSORFLOW CODE >>> tf_outputs = tf_model(tf_batch, output_hidden_states=True, output_attentions=True) - >>> all_hidden_states, all_attentions = tf_outputs[-2:] + >>> all_hidden_states = tf_outputs.hidden_states + >>> all_attentions = tf_outputs.attentions Accessing the code ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -375,16 +387,16 @@ directly instantiate model and tokenizer without the auto magic: Customizing the model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -If you want to change how the model itself is built, you can define your custom configuration class. Each architecture -comes with its own relevant configuration (in the case of DistilBERT, :class:`~transformers.DistilBertConfig`) which -allows you to specify any of the hidden dimension, dropout rate, etc. If you do core modifications, like changing the -hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would then -instantiate the model directly from this configuration. +If you want to change how the model itself is built, you can define a custom configuration class. Each architecture +comes with its own relevant configuration. For example, :class:`~transformers.DistilBertConfig` allows you to specify +parameters such as the hidden dimension, dropout rate, etc for DistilBERT. If you do core modifications, like changing +the hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would +then instantiate the model directly from this configuration. -Here we use the predefined vocabulary of DistilBERT (hence load the tokenizer with the -:func:`~transformers.DistilBertTokenizer.from_pretrained` method) and initialize the model from scratch (hence -instantiate the model from the configuration instead of using the -:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method). +Below, we load a predefined vocabulary for a tokenizer with the +:func:`~transformers.DistilBertTokenizer.from_pretrained` method. However, unlike the tokenizer, we wish to initialize +the model from scratch. Therefore, we instantiate the model from a configuration instead of using the +:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method. .. code-block:: @@ -401,9 +413,9 @@ instantiate the model from the configuration instead of using the For something that only changes the head of the model (for instance, the number of labels), you can still use a pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body. -We could create a configuration with all the default values and just change the number of labels, but more easily, you -can directly pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the -default configuration with it: +Instead of creating a new configuration with all the default values just to change the number of labels, we can instead +pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the default +configuration appropriately: .. code-block::