diff --git a/docs/source/en/tasks/asr.mdx b/docs/source/en/tasks/asr.mdx
index daa627aaf131..4bb5b4ebec1c 100644
--- a/docs/source/en/tasks/asr.mdx
+++ b/docs/source/en/tasks/asr.mdx
@@ -14,9 +14,12 @@ specific language governing permissions and limitations under the License.
 
 <Youtube id="TksaY_FDgnk"/>
 
-Automatic speech recognition (ASR) converts a speech signal to text. It is an example of a sequence-to-sequence task, going from a sequence of audio inputs to textual outputs. Voice assistants like Siri and Alexa utilize ASR models to assist users.
+Automatic speech recognition (ASR) converts a speech signal to text, mapping a sequence of audio inputs to text outputs. Virtual assistants like Siri and Alexa use ASR models to help users everyday, and there are many other useful user-facing applications like live captioning and note-taking during meetings.
 
-This guide will show you how to fine-tune [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base) on the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset to transcribe audio to text.
+This guide will show you how to:
+
+1. Finetune [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base) on the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset to transcribe audio to text.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -24,17 +27,31 @@ See the automatic speech recognition [task page](https://huggingface.co/tasks/au
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate jiwer
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load MInDS-14 dataset
 
-Load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) from the 🤗 Datasets library:
+Start by loading a smaller subset of the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
 
 ```py
 >>> from datasets import load_dataset, Audio
 
->>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train")
+>>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train[:100]")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset's `train` split into a train and test set with the [`~Dataset.train_test_split`] method:
 
 ```py
 >>> minds = minds.train_test_split(test_size=0.2)
@@ -47,16 +64,16 @@ Then take a look at the dataset:
 DatasetDict({
     train: Dataset({
         features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
-        num_rows: 450
+        num_rows: 16
     })
     test: Dataset({
         features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'],
-        num_rows: 113
+        num_rows: 4
     })
 })
 ```
 
-While the dataset contains a lot of helpful information, like `lang_id` and `intent_class`, you will focus on the `audio` and `transcription` columns in this guide. Remove the other columns:
+While the dataset contains a lot of useful information, like `lang_id` and `english_transcription`, you'll focus on the `audio` and `transcription` in this guide. Remove the other columns with the [`~datasets.Dataset.remove_columns`] method:
 
 ```py
 >>> minds = minds.remove_columns(["english_transcription", "intent_class", "lang_id"])
@@ -74,11 +91,14 @@ Take a look at the example again:
  'transcription': "hi I'm trying to use the banking app on my phone and currently my checking and savings account balance is not refreshing"}
 ```
 
-The `audio` column contains a 1-dimensional `array` of the speech signal that must be called to load and resample the audio file.
+There are two fields:
+
+- `audio`: a 1-dimensional `array` of the speech signal that must be called to load and resample the audio file. 
+- `transcription`: the target text. 
 
 ## Preprocess
 
-Load the Wav2Vec2 processor to process the audio signal and transcribed text:
+The next step is to load a Wav2Vec2 processor to process the audio signal:
 
 ```py
 >>> from transformers import AutoProcessor
@@ -86,7 +106,7 @@ Load the Wav2Vec2 processor to process the audio signal and transcribed text:
 >>> processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base")
 ```
 
-The [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset has a sampling rate of 8000khz. You will need to resample the dataset to use the pretrained Wav2Vec2 model:
+The MInDS-14 dataset has a sampling rate of 8000khz (you can find this information in its [dataset card](https://huggingface.co/datasets/PolyAI/minds14)), which means you'll need to resample the dataset to 16000kHz to use the pretrained Wav2Vec2 model:
 
 ```py
 >>> minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
@@ -99,32 +119,38 @@ The [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset has a sam
  'transcription': "hi I'm trying to use the banking app on my phone and currently my checking and savings account balance is not refreshing"}
 ```
 
-The preprocessing function needs to:
+As you can see in the `transcription` above, the text contains a mix of upper and lowercase characters. The Wav2Vec2 tokenizer is only trained on uppercase characters so you'll need to make sure the text matches the tokenizer's vocabulary:
+
+```py
+>>> def uppercase(example):
+...     return {"transcription": example["transcription"].upper()}
+
+
+>>> minds = minds.map(uppercase)
+```
+
+Now create a preprocessing function that:
 
-1. Call the `audio` column to load and resample the audio file.
-2. Extract the `input_values` from the audio file.
-3. Typically, when you call the processor, you call the feature extractor. Since you also want to tokenize text, instruct the processor to call the tokenizer instead with a context manager.
+1. Calls the `audio` column to load and resample the audio file.
+2. Extracts the `input_values` from the audio file and tokenize the `transcription` column with the processor.
 
 ```py
 >>> def prepare_dataset(batch):
 ...     audio = batch["audio"]
-
-...     batch = processor(audio=audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
-...     batch["input_length"] = len(batch["input_values"])
-
-...     batch["labels"] = processor(text=batch["transcription"]).input_ids
+...     batch = processor(audio["array"], sampling_rate=audio["sampling_rate"], text=batch["transcription"])
+...     batch["input_length"] = len(batch["input_values"][0])
 ...     return batch
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the map function by increasing the number of processes with `num_proc`. Remove the columns you don't need:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] function. You can speed up `map` by increasing the number of processes with the `num_proc` parameter. Remove the columns you don't need with the [`~datasets.Dataset.remove_columns`] method:
 
 ```py
 >>> encoded_minds = minds.map(prepare_dataset, remove_columns=minds.column_names["train"], num_proc=4)
 ```
 
-🤗 Transformers doesn't have a data collator for automatic speech recognition, so you will need to create one. You can adapt the [`DataCollatorWithPadding`] to create a batch of examples for automatic speech recognition. It will also dynamically pad your text and labels to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
+🤗 Transformers doesn't have a data collator for ASR, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It'll also dynamically pad your text and labels to the length of the longest element in its batch (instead of the entire dataset) so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
 
-Unlike other data collators, this specific data collator needs to apply a different padding method to `input_values` and `labels`. You can apply a different padding method with a context manager:
+Unlike other data collators, this specific data collator needs to apply a different padding method to `input_values` and `labels`:
 
 ```py
 >>> import torch
@@ -137,12 +163,12 @@ Unlike other data collators, this specific data collator needs to apply a differ
 ... class DataCollatorCTCWithPadding:
 
 ...     processor: AutoProcessor
-...     padding: Union[bool, str] = True
+...     padding: Union[bool, str] = "longest"
 
 ...     def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
 ...         # split inputs and labels since they have to be of different lengths and need
 ...         # different padding methods
-...         input_features = [{"input_values": feature["input_values"]} for feature in features]
+...         input_features = [{"input_values": feature["input_values"][0]} for feature in features]
 ...         label_features = [{"input_ids": feature["labels"]} for feature in features]
 
 ...         batch = self.processor.pad(input_features, padding=self.padding, return_tensors="pt")
@@ -157,17 +183,55 @@ Unlike other data collators, this specific data collator needs to apply a differ
 ...         return batch
 ```
 
-Create a batch of examples and dynamically pad them with `DataCollatorForCTCWithPadding`:
+Now instantiate your `DataCollatorForCTCWithPadding`:
 
 ```py
->>> data_collator = DataCollatorCTCWithPadding(processor=processor, padding=True)
+>>> data_collator = DataCollatorCTCWithPadding(processor=processor, padding="longest")
+```
+
+## Evaluate
+
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [word error rate](https://huggingface.co/spaces/evaluate-metric/wer) (WER) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
+
+```py
+>>> import evaluate
+
+>>> wer = evaluate.load("wer")
 ```
 
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the WER:
+
+```py
+>>> import numpy as np
+
+
+>>> def compute_metrics(pred):
+...     pred_logits = pred.predictions
+...     pred_ids = np.argmax(pred_logits, axis=-1)
+
+...     pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
+
+...     pred_str = processor.batch_decode(pred_ids)
+...     label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
+
+...     wer = wer.compute(predictions=pred_str, references=label_str)
+
+...     return {"wer": wer}
+```
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
 ## Train
 
 <frameworkcontent>
 <pt>
-Load Wav2Vec2 with [`AutoModelForCTC`]. For `ctc_loss_reduction`, it is often better to use the average instead of the default summation:
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
+
+</Tip>
+
+You're ready to start training your model now! Load Wav2Vec2 with [`AutoModelForCTC`]. Specify the reduction to apply with the `ctc_loss_reduction` parameter. It is often better to use the average instead of the default summation:
 
 ```py
 >>> from transformers import AutoModelForCTC, TrainingArguments, Trainer
@@ -179,30 +243,32 @@ Load Wav2Vec2 with [`AutoModelForCTC`]. For `ctc_loss_reduction`, it is often be
 ... )
 ```
 
-<Tip>
-
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
-2. Pass the training arguments to [`Trainer`] along with the model, datasets, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the WER and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_asr_mind_model",
+...     per_device_train_batch_size=8,
+...     gradient_accumulation_steps=2,
+...     learning_rate=1e-5,
+...     warmup_steps=500,
+...     max_steps=2000,
+...     gradient_checkpointing=True,
+...     fp16=True,
 ...     group_by_length=True,
-...     per_device_train_batch_size=16,
 ...     evaluation_strategy="steps",
-...     num_train_epochs=3,
-...     fp16=True,
-...     gradient_checkpointing=True,
-...     learning_rate=1e-4,
-...     weight_decay=0.005,
-...     save_total_limit=2,
+...     per_device_eval_batch_size=8,
+...     save_steps=1000,
+...     eval_steps=1000,
+...     logging_steps=25,
+...     load_best_model_at_end=True,
+...     metric_for_best_model="wer",
+...     greater_is_better=False,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -212,15 +278,89 @@ At this point, only three steps remain:
 ...     eval_dataset=encoded_minds["test"],
 ...     tokenizer=processor.feature_extractor,
 ...     data_collator=data_collator,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
 ```
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for automatic speech recognition, take a look at this blog [post](https://huggingface.co/blog/fine-tune-wav2vec2-english) for English ASR and this [post](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for multilingual ASR.
+For a more in-depth example of how to finetune a model for automatic speech recognition, take a look at this blog [post](https://huggingface.co/blog/fine-tune-wav2vec2-english) for English ASR and this [post](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for multilingual ASR.
+
+</Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Load an audio file you'd like to run inference on. Remember to resample the sampling rate of the audio file to match the sampling rate of the model if you need to!
+
+```py
+>>> from datasets import load_dataset, Audio
+
+>>> dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
+>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
+>>> sampling_rate = dataset.features["audio"].sampling_rate
+>>> audio_file = dataset[0]["audio"]["path"]
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for automatic speech recognition with your model, and pass your audio file to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> transcriber = pipeline("automatic-speech-recognition", model="stevhliu/my_awesome_asr_minds_model")
+>>> transcriber(audio_file)
+{'text': 'I WOUD LIKE O SET UP JOINT ACOUNT WTH Y PARTNER'}
+```
+
+<Tip>
+
+The transcription is decent, but it could be better! Try finetuning your model on more examples to get even better results!
+
+</Tip>
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Load a processor to preprocess the audio file and transcription and return the `input` as PyTorch tensors:
+
+```py
+>>> from transformers import AutoProcessor
+
+>>> processor = AutoProcessor.from_pretrained("stevhliu/my_awesome_asr_mind_model")
+>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
+```
 
-</Tip>
\ No newline at end of file
+Pass your inputs to the model and return the logits:
+
+```py
+>>> from transformers import AutoModelForCTC
+
+>>> model = AutoModelForCTC.from_pretrained("stevhliu/my_awesome_asr_mind_model")
+>>> with torch.no_grad():
+...     logits = model(**inputs).logits
+```
+
+Get the predicted `input_ids` with the highest probability, and use the processor to decode the predicted `input_ids` back into text:
+
+```py
+>>> import torch
+
+>>> predicted_ids = torch.argmax(logits, dim=-1)
+>>> transcription = processor.batch_decode(predicted_ids)
+>>> transcription
+['I WOUL LIKE O SET UP JOINT ACOUNT WTH Y PARTNER']
+```
+</pt>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/audio_classification.mdx b/docs/source/en/tasks/audio_classification.mdx
index 99cc3064653f..ab0abbced785 100644
--- a/docs/source/en/tasks/audio_classification.mdx
+++ b/docs/source/en/tasks/audio_classification.mdx
@@ -14,9 +14,12 @@ specific language governing permissions and limitations under the License.
 
 <Youtube id="KWwzcmG98Ds"/>
 
-Audio classification assigns a label or class to audio data. It is similar to text classification, except an audio input is continuous and must be discretized, whereas text can be split into tokens. Some practical applications of audio classification include identifying intent, speakers, and even animal species by their sounds.
+Audio classification - just like with text - assigns a class label output from the input data. The only difference is instead of text inputs, you have raw audio waveforms. Some practical applications of audio classification include identifying speaker intent, language classification, and even animal species by their sounds.
 
-This guide will show you how to fine-tune [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base) on the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) to classify intent.
+This guide will show you how to:
+
+1. Finetune [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base) on the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset to classify speaker intent.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -24,9 +27,23 @@ See the audio classification [task page](https://huggingface.co/tasks/audio-clas
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load MInDS-14 dataset
 
-Load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) from the 🤗 Datasets library:
+Start by loading the MInDS-14 dataset from the 🤗 Datasets library:
 
 ```py
 >>> from datasets import load_dataset, Audio
@@ -34,7 +51,7 @@ Load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) from the 
 >>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset's `train` split into a smaller train and test set with the [`~datasets.Dataset.train_test_split`] method. This'll give you a chance to experiment and make sure everything works before spending more time on the full dataset.
 
 ```py
 >>> minds = minds.train_test_split(test_size=0.2)
@@ -56,7 +73,7 @@ DatasetDict({
 })
 ```
 
-While the dataset contains a lot of other useful information, like `lang_id` and `english_transcription`, you will focus on the `audio` and `intent_class` in this guide. Remove the other columns:
+While the dataset contains a lot of useful information, like `lang_id` and `english_transcription`, you'll focus on the `audio` and `intent_class` in this guide. Remove the other columns with the [`~datasets.Dataset.remove_columns`] method:
 
 ```py
 >>> minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"])
@@ -73,7 +90,12 @@ Take a look at an example now:
  'intent_class': 2}
 ```
 
-The `audio` column contains a 1-dimensional `array` of the speech signal that must be called to load and resample the audio file. The `intent_class` column is an integer that represents the class id of intent. Create a dictionary that maps a label name to an integer and vice versa. The mapping will help the model recover the label name from the label number:
+There are two fields:
+
+- `audio`: a 1-dimensional `array` of the speech signal that must be called to load and resample the audio file. 
+- `intent_class`: represents the class id of the speaker's intent. 
+
+To make it easier for the model to get the label name from the label id, create a dictionary that maps the label name to an integer and vice versa:
 
 ```py
 >>> labels = minds["train"].features["intent_class"].names
@@ -83,18 +105,16 @@ The `audio` column contains a 1-dimensional `array` of the speech signal that mu
 ...     id2label[str(i)] = label
 ```
 
-Now you can convert the label number to a label name for more information:
+Now you can convert the label id to a label name:
 
 ```py
 >>> id2label[str(2)]
 'app_error'
 ```
 
-Each keyword - or label - corresponds to a number; `2` indicates `app_error` in the example above.
-
 ## Preprocess
 
-Load the Wav2Vec2 feature extractor to process the audio signal:
+The next step is to load a Wav2Vec2 feature extractor to process the audio signal:
 
 ```py
 >>> from transformers import AutoFeatureExtractor
@@ -102,7 +122,7 @@ Load the Wav2Vec2 feature extractor to process the audio signal:
 >>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
 ```
 
-The [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset has a sampling rate of 8000khz. You will need to resample the dataset to use the pretrained Wav2Vec2 model:
+The MInDS-14 dataset has a sampling rate of 8000khz (you can find this information in it's [dataset card](https://huggingface.co/datasets/PolyAI/minds14)), which means you'll need to resample the dataset to 16000kHz to use the pretrained Wav2Vec2 model:
 
 ```py
 >>> minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
@@ -114,11 +134,11 @@ The [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset has a sam
  'intent_class': 2}
 ```
 
-The preprocessing function needs to:
+Now create a preprocessing function that:
 
-1. Call the `audio` column to load and if necessary resample the audio file.
-2. Check the sampling rate of the audio file matches the sampling rate of the audio data a model was pretrained with. You can find this information on the Wav2Vec2 [model card](https://huggingface.co/facebook/wav2vec2-base).
-3. Set a maximum input length so longer inputs are batched without being truncated.
+1. Calls the `audio` column to load, and if necessary, resample the audio file.
+2. Checks if the sampling rate of the audio file matches the sampling rate of the audio data a model was pretrained with. You can find this information in the Wav2Vec2 [model card](https://huggingface.co/facebook/wav2vec2-base).
+3. Set a maximum input length to batch longer inputs without truncating them.
 
 ```py
 >>> def preprocess_function(examples):
@@ -129,18 +149,46 @@ The preprocessing function needs to:
 ...     return inputs
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once. Remove the columns you don't need, and rename `intent_class` to `label` because that is what the model expects:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] function. You can speed up `map` by setting `batched=True` to process multiple elements of the dataset at once. Remove the columns you don't need, and rename `intent_class` to `label` because that's the name the model expects:
 
 ```py
 >>> encoded_minds = minds.map(preprocess_function, remove_columns="audio", batched=True)
 >>> encoded_minds = encoded_minds.rename_column("intent_class", "label")
 ```
 
+## Evaluate
+
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
+
+```py
+>>> import evaluate
+
+>>> accuracy = evaluate.load("accuracy")
+```
+
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the accuracy:
+
+```py
+>>> import numpy as np
+
+
+>>> def compute_metrics(eval_pred):
+...     predictions = np.argmax(eval_pred.predictions, axis=1)
+...     return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)
+```
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
 ## Train
 
 <frameworkcontent>
 <pt>
-Load Wav2Vec2 with [`AutoModelForAudioClassification`]. Specify the number of labels, and pass the model the mapping between label number and label class:
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
+
+</Tip>
+You're ready to start training your model now! Load Wav2Vec2 with [`AutoModelForAudioClassification`] along with the number of expected labels, and the label mappings:
 
 ```py
 >>> from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer
@@ -151,25 +199,28 @@ Load Wav2Vec2 with [`AutoModelForAudioClassification`]. Specify the number of la
 ... )
 ```
 
-<Tip>
-
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
-2. Pass the training arguments to [`Trainer`] along with the model, datasets, and feature extractor.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the accuracy and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
+
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_mind_model",
 ...     evaluation_strategy="epoch",
 ...     save_strategy="epoch",
 ...     learning_rate=3e-5,
-...     num_train_epochs=5,
+...     per_device_train_batch_size=32,
+...     gradient_accumulation_steps=4,
+...     per_device_eval_batch_size=32,
+...     num_train_epochs=10,
+...     warmup_ratio=0.1,
+...     logging_steps=10,
+...     load_best_model_at_end=True,
+...     metric_for_best_model="accuracy",
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -178,15 +229,89 @@ At this point, only three steps remain:
 ...     train_dataset=encoded_minds["train"],
 ...     eval_dataset=encoded_minds["test"],
 ...     tokenizer=feature_extractor,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
 ```
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for audio classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/audio_classification.ipynb).
+For a more in-depth example of how to finetune a model for audio classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/audio_classification.ipynb).
 
 </Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Load an audio file you'd like to run inference on. Remember to resample the sampling rate of the audio file to match the sampling rate of the model if you need to!
+
+```py
+>>> from datasets import load_dataset, Audio
+
+>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")
+>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
+>>> sampling_rate = dataset.features["audio"].sampling_rate
+>>> audio_file = dataset[0]["audio"]["path"]
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for audio classification with your model, and pass your audio file to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> classifier = pipeline("audio-classification", model="stevhliu/my_awesome_minds_model")
+>>> classifier(audio_file)
+[
+    {'score': 0.09766869246959686, 'label': 'cash_deposit'},
+    {'score': 0.07998877018690109, 'label': 'app_error'},
+    {'score': 0.0781070664525032, 'label': 'joint_account'},
+    {'score': 0.07667109370231628, 'label': 'pay_bill'},
+    {'score': 0.0755252093076706, 'label': 'balance'}
+]
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Load a feature extractor to preprocess the audio file and return the `input` as PyTorch tensors:
+
+```py
+>>> from transformers import AutoFeatureExtractor
+
+>>> feature_extractor = AutoFeatureExtractor.from_pretrained("stevhliu/my_awesome_minds_model")
+>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
+```
+
+Pass your inputs to the model and return the logits:
+
+```py
+>>> from transformers import AutoModelForAudioClassification
+
+>>> model = AutoModelForAudioClassification.from_pretrained("stevhliu/my_awesome_minds_model")
+>>> with torch.no_grad():
+...     logits = model(**inputs).logits
+```
+
+Get the class with the highest probability, and use the model's `id2label` mapping to convert it to a label:
+
+```py
+>>> import torch
+
+>>> predicted_class_ids = torch.argmax(logits).item()
+>>> predicted_label = model.config.id2label[predicted_class_ids]
+>>> predicted_label
+'cash_deposit'
+```
+</pt>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/image_classification.mdx b/docs/source/en/tasks/image_classification.mdx
index d2d86df062b9..a7362e10c476 100644
--- a/docs/source/en/tasks/image_classification.mdx
+++ b/docs/source/en/tasks/image_classification.mdx
@@ -12,11 +12,16 @@ specific language governing permissions and limitations under the License.
 
 # Image classification
 
+[[open-in-colab]]
+
 <Youtube id="tjAIM7BOYhw"/>
 
-Image classification assigns a label or class to an image. Unlike text or audio classification, the inputs are the pixel values that represent an image. There are many uses for image classification, like detecting damage after a disaster, monitoring crop health, or helping screen medical images for signs of disease.
+Image classification assigns a label or class to an image. Unlike text or audio classification, the inputs are the pixel values that comprise an image. There are many applications for image classification such as detecting damage after a natural disaster, monitoring crop health, or helping screen medical images for signs of disease.
+
+This guide will show you how to:
 
-This guide will show you how to fine-tune [ViT](https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/vit) on the [Food-101](https://huggingface.co/datasets/food101) dataset to classify a food item in an image.
+1. Finetune [ViT](https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/vit) on the [Food-101](https://huggingface.co/datasets/food101) dataset to classify a food item in an image.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -24,9 +29,23 @@ See the image classification [task page](https://huggingface.co/tasks/image-clas
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load Food-101 dataset
 
-Load only the first 5000 images of the Food-101 dataset from the 🤗 Datasets library since it is pretty large:
+Start by loading a smaller subset of the Food-101 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
 
 ```py
 >>> from datasets import load_dataset
@@ -34,7 +53,7 @@ Load only the first 5000 images of the Food-101 dataset from the 🤗 Datasets l
 >>> food = load_dataset("food101", split="train[:5000]")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset's `train` split into a train and test set with the [`~datasets.Dataset.train_test_split`] method:
 
 ```py
 >>> food = food.train_test_split(test_size=0.2)
@@ -48,7 +67,12 @@ Then take a look at an example:
  'label': 79}
 ```
 
-The `image` field contains a PIL image, and each `label` is an integer that represents a class. Create a dictionary that maps a label name to an integer and vice versa. The mapping will help the model recover the label name from the label number:
+There are two fields:
+
+- `image`: a PIL image of the food item.
+- `label`: the label class of the food item.
+
+To make it easier for the model to get the label name from the label id, create a dictionary that maps the label name to an integer and vice versa:
 
 ```py
 >>> labels = food["train"].features["label"].names
@@ -58,18 +82,16 @@ The `image` field contains a PIL image, and each `label` is an integer that repr
 ...     id2label[str(i)] = label
 ```
 
-Now you can convert the label number to a label name for more information:
+Now you can convert the label id to a label name:
 
 ```py
 >>> id2label[str(79)]
 'prime_rib'
 ```
 
-Each food class - or label - corresponds to a number; `79` indicates a prime rib in the example above.
-
 ## Preprocess
 
-Load the ViT feature extractor to process the image into a tensor:
+The next step is to load a ViT feature extractor to process the image into a tensor:
 
 ```py
 >>> from transformers import AutoFeatureExtractor
@@ -77,7 +99,9 @@ Load the ViT feature extractor to process the image into a tensor:
 >>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
 ```
 
-Apply several image transformations to the dataset to make the model more robust against overfitting. Here you'll use torchvision's [`transforms`](https://pytorch.org/vision/stable/transforms.html) module. Crop a random part of the image, resize it, and normalize it with the image mean and standard deviation:
+Apply some image transformations to the images to make the model more robust against overfitting. Here you'll use torchvision's [`transforms`](https://pytorch.org/vision/stable/transforms.html) module, but you can also use any image library you like. 
+
+Crop a random part of the image, resize it, and normalize it with the image mean and standard deviation:
 
 ```py
 >>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor
@@ -91,7 +115,7 @@ Apply several image transformations to the dataset to make the model more robust
 >>> _transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])
 ```
 
-Create a preprocessing function that will apply the transforms and return the `pixel_values` - the inputs to the model - of the image:
+Then create a preprocessing function to apply the transforms and return the `pixel_values` - the inputs to the model - of the image:
 
 ```py
 >>> def transforms(examples):
@@ -100,13 +124,13 @@ Create a preprocessing function that will apply the transforms and return the `p
 ...     return examples
 ```
 
-Use 🤗 Dataset's [`~datasets.Dataset.with_transform`] method to apply the transforms over the entire dataset. The transforms are applied on-the-fly when you load an element of the dataset:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.with_transform`] method. The transforms are applied on the fly when you load an element of the dataset:
 
 ```py
 >>> food = food.with_transform(transforms)
 ```
 
-Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data collators in 🤗 Transformers, the DefaultDataCollator does not apply additional preprocessing such as padding.
+Now create a batch of examples using [`DataCollatorWithPadding`]. Unlike other data collators in 🤗 Transformers, the `DefaultDataCollator` does not apply additional preprocessing such as padding.
 
 ```py
 >>> from transformers import DefaultDataCollator
@@ -114,11 +138,39 @@ Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data col
 >>> data_collator = DefaultDataCollator()
 ```
 
+## Evaluate
+
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
+
+```py
+>>> import evaluate
+
+>>> accuracy = evaluate.load("accuracy")
+```
+
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the accuracy:
+
+```py
+>>> import numpy as np
+
+
+>>> def compute_metrics(eval_pred):
+...     predictions = np.argmax(eval_pred.predictions, axis=1)
+...     return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)
+```
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
 ## Train
 
 <frameworkcontent>
 <pt>
-Load ViT with [`AutoModelForImageClassification`]. Specify the number of labels, and pass the model the mapping between label number and label class:
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
+
+</Tip>
+You're ready to start training your model now! Load ViT with [`AutoModelForImageClassification`]. Specify the number of labels along with the number of expected labels, and the label mappings:
 
 ```py
 >>> from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
@@ -131,31 +183,28 @@ Load ViT with [`AutoModelForImageClassification`]. Specify the number of labels,
 ... )
 ```
 
-<Tip>
-
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`]. It is important you don't remove unused columns because this will drop the `image` column. Without the `image` column, you can't create `pixel_values`. Set `remove_unused_columns=False` to prevent this behavior!
-2. Pass the training arguments to [`Trainer`] along with the model, datasets, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`TrainingArguments`]. It is important you don't remove unused columns because this'll drop the `image` column. Without the `image` column, you can't create `pixel_values`. Set `remove_unused_columns=False` to prevent this behavior! The only other required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the accuracy and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_food_model",
+...     remove_unused_columns=False,
+...     evaluation_strategy="epoch",
+...     save_strategy="epoch",
+...     learning_rate=5e-5,
 ...     per_device_train_batch_size=16,
-...     evaluation_strategy="steps",
-...     num_train_epochs=4,
-...     fp16=True,
-...     save_steps=100,
-...     eval_steps=100,
+...     gradient_accumulation_steps=4,
+...     per_device_eval_batch_size=16,
+...     num_train_epochs=3,
+...     warmup_ratio=0.1,
 ...     logging_steps=10,
-...     learning_rate=2e-4,
-...     save_total_limit=2,
-...     remove_unused_columns=False,
+...     load_best_model_at_end=True,
+...     metric_for_best_model="accuracy",
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -165,15 +214,84 @@ At this point, only three steps remain:
 ...     train_dataset=food["train"],
 ...     eval_dataset=food["test"],
 ...     tokenizer=feature_extractor,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
 ```
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for image classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+For a more in-depth example of how to finetune a model for image classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
 
 </Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Load an image you'd like to run inference on:
+
+```py
+>>> ds = load_dataset("food101", split="validation[:10]")
+>>> image = ds["image"][0]
+```
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png" alt="image of beignets"/>
+</div>
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for image classification with your model, and pass your image to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> classifier = pipeline("image-classification", model="my_awesome_food_model")
+>>> classifier(image)
+[{'score': 0.35574808716773987, 'label': 'beignets'},
+ {'score': 0.018057454377412796, 'label': 'chicken_wings'},
+ {'score': 0.017733804881572723, 'label': 'prime_rib'},
+ {'score': 0.016335085034370422, 'label': 'bruschetta'},
+ {'score': 0.0160061065107584, 'label': 'ramen'}]
+```
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Load a feature extractor to preprocess the image and return the `input` as PyTorch tensors:
+
+```py
+>>> from transformers import AutoFeatureExtractor
+>>> import torch
+
+>>> feature_extractor = AutoFeatureExtractor.from_pretrained("my_awesome_food_model")
+>>> inputs = feature_extractor(image, return_tensors="pt")
+```
+
+Pass your inputs to the model and return the logits:
+
+```py
+>>> from transformers import AutoModelForImageClassification
+
+>>> model = AutoModelForImageClassification.from_pretrained("my_awesome_food_model")
+>>> with torch.no_grad():
+...     logits = model(**inputs).logits
+```
+
+Get the predicted label with the highest probability, and use the model's `id2label` mapping to convert it to a label:
+
+```py
+>>> predicted_label = logits.argmax(-1).item()
+>>> model.config.id2label[predicted_label]
+'beignets'
+```
+</pt>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/language_modeling.mdx b/docs/source/en/tasks/language_modeling.mdx
index 82708f2f899a..eaf8fdc947f1 100644
--- a/docs/source/en/tasks/language_modeling.mdx
+++ b/docs/source/en/tasks/language_modeling.mdx
@@ -12,29 +12,44 @@ specific language governing permissions and limitations under the License.
 
 # Language modeling
 
-Language modeling predicts words in a sentence. There are two forms of language modeling.
+Language modeling tasks predicts words in a sentence, making these types of models great at generating text. You can use these models for creative applications like choosing your own text adventure or an intelligent coding assistant like Copilot or CodeParrot. There are two types of language modeling, causal and masked.
 
 <Youtube id="Vpjb1lu0MDk"/>
 
-Causal language modeling predicts the next token in a sequence of tokens, and the model can only attend to tokens on the left.
+Causal language modeling predicts the next token in a sequence of tokens, and the model can only attend to tokens on the left. This means the model cannot see future tokens. GPT-2 is an example of a causal language model.
 
 <Youtube id="mqElG5QJWUg"/>
 
-Masked language modeling predicts a masked token in a sequence, and the model can attend to tokens bidirectionally.
+Masked language modeling predicts a masked token in a sequence, and the model can attend to tokens bidirectionally. This means the model has full access to the tokens on the left and right. BERT is an example of a masked language model.
 
-This guide will show you how to fine-tune [DistilGPT2](https://huggingface.co/distilgpt2) for causal language modeling and [DistilRoBERTa](https://huggingface.co/distilroberta-base) for masked language modeling on the [r/askscience](https://www.reddit.com/r/askscience/) subset of the [ELI5](https://huggingface.co/datasets/eli5) dataset.
+This guide will show you how to:
 
-<Tip>
+1. Finetune [DistilGPT2](https://huggingface.co/distilgpt2) for causal language modeling and [DistilRoBERTa](https://huggingface.co/distilroberta-base) for masked language modeling on the [r/askscience](https://www.reddit.com/r/askscience/) subset of the [ELI5](https://huggingface.co/datasets/eli5) dataset.
+2. Use your finetuned model for inference.
 
-You can fine-tune other architectures for language modeling such as [GPT-Neo](https://huggingface.co/EleutherAI/gpt-neo-125M), [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B), and [BERT](https://huggingface.co/bert-base-uncased), following the same steps presented in this guide!
+<Tip>
 
-See the text generation [task page](https://huggingface.co/tasks/text-generation) and fill mask [task page](https://huggingface.co/tasks/fill-mask) for more information about their associated models, datasets, and metrics.
+You can finetune other architectures for language modeling such as [GPT-Neo](https://huggingface.co/EleutherAI/gpt-neo-125M), [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B), and [BERT](https://huggingface.co/bert-base-uncased), following the same steps in this guide! See the text generation [task page](https://huggingface.co/tasks/text-generation) and fill mask [task page](https://huggingface.co/tasks/fill-mask) for more information about their associated models, datasets, and metrics.
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load ELI5 dataset
 
-Load only the first 5000 rows of the ELI5 dataset from the 🤗 Datasets library since it is pretty large:
+Start by loading a smaller subset of the r/askscience subset of the ELI5 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
 
 ```py
 >>> from datasets import load_dataset
@@ -42,10 +57,10 @@ Load only the first 5000 rows of the ELI5 dataset from the 🤗 Datasets library
 >>> eli5 = load_dataset("eli5", split="train_asks[:5000]")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset's `train_asks` split into a train and test set with the [`~datasets.Dataset.train_test_split`] method:
 
 ```py
-eli5 = eli5.train_test_split(test_size=0.2)
+>>> eli5 = eli5.train_test_split(test_size=0.2)
 ```
 
 Then take a look at an example:
@@ -66,13 +81,13 @@ Then take a look at an example:
  'title_urls': {'url': []}}
 ```
 
-Notice `text` is a subfield nested inside the `answers` dictionary. When you preprocess the dataset, you will need to extract the `text` subfield into a separate column.
+While this may look like a lot, you're only really interested in the `text` field. What's cool about language modeling tasks is you don't need labels (also known as an unsupervised task) because the next word *is* the label.
 
 ## Preprocess
 
 <Youtube id="ma1TrR7gE7I"/>
 
-For causal language modeling, load the DistilGPT2 tokenizer to process the `text` subfield:
+For causal language modeling, the next step is to load a DistilGPT2 tokenizer to process the `text` subfield:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -82,7 +97,7 @@ For causal language modeling, load the DistilGPT2 tokenizer to process the `text
 
 <Youtube id="8PmhEIXhBvI"/>
 
-For masked language modeling, load the DistilRoBERTa tokenizer instead:
+For masked language modeling, the next step is to load a DistilRoBERTa tokenizer to process the `text` subfield:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -90,7 +105,7 @@ For masked language modeling, load the DistilRoBERTa tokenizer instead:
 >>> tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
 ```
 
-Extract the `text` subfield from its nested structure with the [`flatten`](https://huggingface.co/docs/datasets/process.html#flatten) method:
+You'll notice from the example above, the `text` field is actually nested inside `answers`. This means you'll need to extract the `text` subfield from its nested structure with the [`flatten`](https://huggingface.co/docs/datasets/process.html#flatten) method:
 
 ```py
 >>> eli5 = eli5.flatten()
@@ -109,16 +124,16 @@ Extract the `text` subfield from its nested structure with the [`flatten`](https
  'title_urls.url': []}
 ```
 
-Each subfield is now a separate column as indicated by the `answers` prefix. Notice that `answers.text` is a list. Instead of tokenizing each sentence separately, convert the list to a string to jointly tokenize them.
+Each subfield is now a separate column as indicated by the `answers` prefix, and the `text` field is a list now. Instead of tokenizing each sentence separately, convert the list to a string so you can jointly tokenize them.
 
-Here is how you can create a preprocessing function to convert the list to a string and truncate sequences to be no longer than DistilGPT2's maximum input length:
+Here is how you can create a preprocessing function to convert the list to a string, and truncate sequences to be no longer than DistilGPT2's maximum input length:
 
 ```py
 >>> def preprocess_function(examples):
 ...     return tokenizer([" ".join(x) for x in examples["answers.text"]], truncation=True)
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once and increasing the number of processes with `num_proc`. Remove the columns you don't need:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.with_transform`] method. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once, and increasing the number of processes with `num_proc`. Remove any columns you don't need:
 
 ```py
 >>> tokenized_eli5 = eli5.map(
@@ -129,7 +144,7 @@ Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing
 ... )
 ```
 
-Now you need a second preprocessing function to capture text truncated from any lengthy examples to prevent loss of information. This preprocessing function should:
+Now you'll need a second preprocessing function to capture text truncated from the lengthier examples to avoid losing any information. This preprocessing function should:
 
 - Concatenate all the text.
 - Split the concatenated text into smaller chunks defined by `block_size`.
@@ -156,11 +171,11 @@ Apply the `group_texts` function over the entire dataset:
 >>> lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)
 ```
 
-For causal language modeling, use [`DataCollatorForLanguageModeling`] to create a batch of examples. It will also *dynamically pad* your text to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient. 
+Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
 
 <frameworkcontent>
 <pt>
-You can use the end of sequence token as the padding token, and set `mlm=False`. This will use the inputs as labels shifted to the right by one element:
+For causal language modeling, use the end-of-sequence token as the padding token and set `mlm=False`. This will use the inputs as labels shifted to the right by one element:
 
 ```py
 >>> from transformers import DataCollatorForLanguageModeling
@@ -169,7 +184,7 @@ You can use the end of sequence token as the padding token, and set `mlm=False`.
 >>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
 ```
 
-For masked language modeling, use the same [`DataCollatorForLanguageModeling`] except you should specify `mlm_probability` to randomly mask tokens each time you iterate over the data.
+For masked language modeling, use the end-of-sequence token as the padding token and specify `mlm_probability` to randomly mask tokens each time you iterate over the data:
 
 ```py
 >>> from transformers import DataCollatorForLanguageModeling
@@ -179,7 +194,7 @@ For masked language modeling, use the same [`DataCollatorForLanguageModeling`] e
 ```
 </pt>
 <tf>
-You can use the end of sequence token as the padding token, and set `mlm=False`. This will use the inputs as labels shifted to the right by one element:
+For causal language modeling, use the end-of-sequence token as the padding token and set `mlm=False`. This will use the inputs as labels shifted to the right by one element:
 
 ```py
 >>> from transformers import DataCollatorForLanguageModeling
@@ -187,25 +202,30 @@ You can use the end of sequence token as the padding token, and set `mlm=False`.
 >>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")
 ```
 
-For masked language modeling, use the same [`DataCollatorForLanguageModeling`] except you should specify `mlm_probability` to randomly mask tokens each time you iterate over the data.
+For masked language modeling, use the end-of-sequence token as the padding token and specify `mlm_probability` to randomly mask tokens each time you iterate over the data:
 
 ```py
 >>> from transformers import DataCollatorForLanguageModeling
 
->>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")
+>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15, return_tensors="tf")
 ```
 </tf>
 </frameworkcontent>
 
 ## Causal language modeling
 
-Causal language modeling is frequently used for text generation. This section shows you how to fine-tune [DistilGPT2](https://huggingface.co/distilgpt2) to generate new text.
+Causal language models are frequently used for text generation. This section shows you how to finetune [DistilGPT2](https://huggingface.co/distilgpt2) to generate new text.
 
 ### Train
 
 <frameworkcontent>
 <pt>
-Load DistilGPT2 with [`AutoModelForCausalLM`]:
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
+
+</Tip>
+You're ready to start training your model now! Load DistilGPT2 with [`AutoModelForCausalLM`]:
 
 ```py
 >>> from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
@@ -213,24 +233,19 @@ Load DistilGPT2 with [`AutoModelForCausalLM`]:
 >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
 ```
 
-<Tip>
-
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model).
 2. Pass the training arguments to [`Trainer`] along with the model, datasets, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_eli5_clm-model",
 ...     evaluation_strategy="epoch",
 ...     learning_rate=2e-5,
 ...     weight_decay=0.01,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -243,9 +258,46 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
+
+Once training is completed, use the [`~transformers.Trainer.evaluate`] method to evaluate your model and get its perplexity:
+
+```py
+>>> import math
+
+>>> eval_results = trainer.evaluate()
+>>> print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")
+Perplexity: 49.61
+```
+
+Then share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 <tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
+<Tip>
+
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
+
+</Tip>
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
+
+```py
+>>> from transformers import create_optimizer, AdamWeightDecay
+
+>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+```
+
+Then you can load DistilGPT2 with [`TFAutoModelForCausalLM`]:
+
+```py
+>>> from transformers import TFAutoModelForCausalLM
+
+>>> model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
 
 ```py
 >>> tf_train_set = model.prepare_tf_dataset(
@@ -263,53 +315,132 @@ To fine-tune a model in TensorFlow, start by converting your datasets to the `tf
 ... )
 ```
 
+Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+
+```py
+>>> import tensorflow as tf
+
+>>> model.compile(optimizer=optimizer)
+```
+
+This can be done by specifying where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
+
+```py
+>>> from transformers.keras_callbacks import PushToHubCallback
+
+>>> callback = PushToHubCallback(
+...     output_dir="my_awesome_eli5_clm-model",
+...     tokenizer=tokenizer,
+... )
+```
+
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callback to finetune the model:
+
+```py
+>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])
+```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
+</tf>
+</frameworkcontent>
+
 <Tip>
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+For a more in-depth example of how to finetune a model for causal language modeling, take a look at the corresponding
+[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
+or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
 
 </Tip>
 
-Set up an optimizer function, learning rate, and some training hyperparameters:
+### Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Come up with a prompt you'd like to generate text from:
 
 ```py
->>> from transformers import create_optimizer, AdamWeightDecay
+>>> prompt = "Somatic hypermutation allows the immune system to"
+```
 
->>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for text generation with your model, and pass your text to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> generator = pipeline("text-generation", model="my_awesome_eli5_clm-model")
+>>> generator(prompt)
+[{'generated_text': "Somatic hypermutation allows the immune system to be able to effectively reverse the damage caused by an infection.\n\n\nThe damage caused by an infection is caused by the immune system's ability to perform its own self-correcting tasks."}]
 ```
 
-Load DistilGPT2 with [`TFAutoModelForCausalLM`]:
+<frameworkcontent>
+<pt>
+Tokenize the text and return the `input_ids` as PyTorch tensors:
 
 ```py
->>> from transformers import TFAutoModelForCausalLM
+>>> from transformers import AutoTokenizer
 
->>> model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_eli5_clm-model")
+>>> inputs = tokenizer(prompt, return_tensors="pt").input_ids
 ```
 
-Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+Use the [`~transformers.generation_utils.GenerationMixin.generate`] method to generate text. For more details about the different text generation strategies and parameters for controlling generation, check out the [Text Generation](./main_classes/text_generation) API.
 
 ```py
->>> import tensorflow as tf
+>>> from transformers import AutoModelForCausalLM
 
->>> model.compile(optimizer=optimizer)
+>>> model = AutoModelForCausalLM.from_pretrained("my_awesome_eli5_clm-model")
+>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
+```
+
+Decode the generated token ids back into text:
+
+```py
+>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
+["Somatic hypermutation allows the immune system to react to drugs with the ability to adapt to a different environmental situation. In other words, a system of 'hypermutation' can help the immune system to adapt to a different environmental situation or in some cases even a single life. In contrast, researchers at the University of Massachusetts-Boston have found that 'hypermutation' is much stronger in mice than in humans but can be found in humans, and that it's not completely unknown to the immune system. A study on how the immune system"]
+```
+</pt>
+<tf>
+Tokenize the text and return the `input_ids` as TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_eli5_clm-model")
+>>> inputs = tokenizer(prompt, return_tensors="tf").input_ids
+```
+
+Use the [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] method to create the summarization. For more details about the different text generation strategies and parameters for controlling generation, check out the [Text Generation](./main_classes/text_generation) API.
+
+```py
+>>> from transformers import TFAutoModelForCausalLM
+
+>>> model = TFAutoModelForCausalLM.from_pretrained("my_awesome_eli5_clm-model")
+>>> outputs = model.generate(input_ids=inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+Decode the generated token ids back into text:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
+>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
+['Somatic hypermutation allows the immune system to detect the presence of other viruses as they become more prevalent. Therefore, researchers have identified a high proportion of human viruses. The proportion of virus-associated viruses in our study increases with age. Therefore, we propose a simple algorithm to detect the presence of these new viruses in our samples as a sign of improved immunity. A first study based on this algorithm, which will be published in Science on Friday, aims to show that this finding could translate into the development of a better vaccine that is more effective for']
 ```
 </tf>
 </frameworkcontent>
 
 ## Masked language modeling
 
-Masked language modeling is also known as a fill-mask task because it predicts a masked token in a sequence. Models for masked language modeling require a good contextual understanding of an entire sequence instead of only the left context. This section shows you how to fine-tune [DistilRoBERTa](https://huggingface.co/distilroberta-base) to predict a masked word.
+Masked language modeling are good for tasks that require a good contextual understanding of an entire sequence. This section shows you how to finetune [DistilRoBERTa](https://huggingface.co/distilroberta-base) to predict a masked word.
 
 ### Train
 
 <frameworkcontent>
 <pt>
-Load DistilRoBERTa with [`AutoModelForMaskedlM`]:
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
+
+</Tip>
+You're ready to start training your model now! Load DistilRoBERTa with [`AutoModelForMaskedLM`]:
 
 ```py
 >>> from transformers import AutoModelForMaskedLM
@@ -317,25 +448,20 @@ Load DistilRoBERTa with [`AutoModelForMaskedlM`]:
 >>> model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
 ```
 
-<Tip>
-
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model).
 2. Pass the training arguments to [`Trainer`] along with the model, datasets, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_eli5_mlm_model",
 ...     evaluation_strategy="epoch",
 ...     learning_rate=2e-5,
 ...     num_train_epochs=3,
 ...     weight_decay=0.01,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -348,9 +474,46 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
+
+Once training is completed, use the [`~transformers.Trainer.evaluate`] method to evaluate your model and get its perplexity:
+
+```py
+>>> import math
+
+>>> eval_results = trainer.evaluate()
+>>> print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")
+Perplexity: 8.76
+```
+
+Then share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 <tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
+<Tip>
+
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
+
+</Tip>
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
+
+```py
+>>> from transformers import create_optimizer, AdamWeightDecay
+
+>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+```
+
+Then you can load DistilRoBERTa with [`TFAutoModelForMaskedLM`]:
+
+```py
+>>> from transformers import TFAutoModelForMaskedLM
+
+>>> model = TFAutoModelForMaskedLM.from_pretrained("distilroberta-base")
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
 
 ```py
 >>> tf_train_set = model.prepare_tf_dataset(
@@ -368,48 +531,139 @@ To fine-tune a model in TensorFlow, start by converting your datasets to the `tf
 ... )
 ```
 
+Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+
+```py
+>>> import tensorflow as tf
+
+>>> model.compile(optimizer=optimizer)
+```
+
+This can be done by specifying where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
+
+```py
+>>> from transformers.keras_callbacks import PushToHubCallback
+
+>>> callback = PushToHubCallback(
+...     output_dir="my_awesome_eli5_mlm_model",
+...     tokenizer=tokenizer,
+... )
+```
+
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callback to finetune the model:
+
+```py
+>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])
+```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
+</tf>
+</frameworkcontent>
+
 <Tip>
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+For a more in-depth example of how to finetune a model for masked language modeling, take a look at the corresponding
+[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
+or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
 
 </Tip>
 
-Set up an optimizer function, learning rate, and some training hyperparameters:
+### Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Come up with some text you'd like the model to fill in the blank with, and use the special `<mask>` token to indicate the blank:
 
 ```py
->>> from transformers import create_optimizer, AdamWeightDecay
+>>> text = "The Milky Way is a <mask> galaxy."
+```
 
->>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for fill-mask with your model, and pass your text to it. If you like, you can use the `top_k` parameter to specify how many predictions to return:
+
+```py
+>>> from transformers import pipeline
+
+>>> mask_filler = pipeline("fill-mask", "stevhliu/my_awesome_eli5_mlm_model")
+>>> mask_filler(text, top_k=3)
+[{'score': 0.5150994658470154,
+  'token': 21300,
+  'token_str': ' spiral',
+  'sequence': 'The Milky Way is a spiral galaxy.'},
+ {'score': 0.07087188959121704,
+  'token': 2232,
+  'token_str': ' massive',
+  'sequence': 'The Milky Way is a massive galaxy.'},
+ {'score': 0.06434620916843414,
+  'token': 650,
+  'token_str': ' small',
+  'sequence': 'The Milky Way is a small galaxy.'}]
 ```
 
-Load DistilRoBERTa with [`TFAutoModelForMaskedLM`]:
+<frameworkcontent>
+<pt>
+Tokenize the text and return the `input_ids` as PyTorch tensors. You'll also need to specify the position of the `<mask>` token:
 
 ```py
->>> from transformers import TFAutoModelForMaskedLM
+>>> from transformers import AutoTokenizer
 
->>> model = TFAutoModelForCausalLM.from_pretrained("distilroberta-base")
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_eli5_mlm_model")
+>>> inputs = tokenizer(text, return_tensors="pt")
+>>> mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
 ```
 
-Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+Pass your inputs to the model and return the `logits` of the masked token:
 
 ```py
->>> import tensorflow as tf
+>>> from transformers import AutoModelForMaskedLM
 
->>> model.compile(optimizer=optimizer)
+>>> model = AutoModelForMaskedLM.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")
+>>> logits = model(**inputs).logits
+>>> mask_token_logits = logits[0, mask_token_index, :]
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+Then return the three masked tokens with the highest probability and print them out:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
+>>> top_3_tokens = torch.topk(mask_token_logits, 3, dim=1).indices[0].tolist()
+
+>>> for token in top_3_tokens:
+...     print(text.replace(tokenizer.mask_token, tokenizer.decode([token])))
+The Milky Way is a spiral galaxy.
+The Milky Way is a massive galaxy.
+The Milky Way is a small galaxy.
 ```
-</tf>
-</frameworkcontent>
+</pt>
+<tf>
+Tokenize the text and return the `input_ids` as TensorFlow tensors. You'll also need to specify the position of the `<mask>` token:
 
-<Tip>
+```py
+>>> from transformers import AutoTokenizer
 
-For a more in-depth example of how to fine-tune a model for causal language modeling, take a look at the corresponding
-[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
-or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_eli5_mlm_model")
+>>> inputs = tokenizer(text, return_tensors="tf")
+>>> mask_token_index = tf.where(inputs["input_ids"] == tokenizer.mask_token_id)[0, 1]
+```
+
+Pass your inputs to the model and return the `logits` of the masked token:
+
+```py
+>>> from transformers import TFAutoModelForMaskedLM
+
+>>> model = TFAutoModelForMaskedLM.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")
+>>> logits = model(**inputs).logits
+>>> mask_token_logits = logits[0, mask_token_index, :]
+```
+
+Then return the three masked tokens with the highest probability and print them out:
+
+```py
+>>> top_3_tokens = tf.math.top_k(mask_token_logits, 3).indices.numpy()
 
-</Tip>
\ No newline at end of file
+>>> for token in top_3_tokens:
+...     print(text.replace(tokenizer.mask_token, tokenizer.decode([token])))
+The Milky Way is a spiral galaxy.
+The Milky Way is a massive galaxy.
+The Milky Way is a small galaxy.
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/multiple_choice.mdx b/docs/source/en/tasks/multiple_choice.mdx
index 6ee0d7137f9c..1a1a517df7da 100644
--- a/docs/source/en/tasks/multiple_choice.mdx
+++ b/docs/source/en/tasks/multiple_choice.mdx
@@ -12,13 +12,30 @@ specific language governing permissions and limitations under the License.
 
 # Multiple choice
 
-A multiple choice task is similar to question answering, except several candidate answers are provided along with a context. The model is trained to select the correct answer from multiple inputs given a context.
+A multiple choice task is similar to question answering, except several candidate answers are provided along with a context and the model is trained to select the correct answer.
 
-This guide will show you how to fine-tune [BERT](https://huggingface.co/bert-base-uncased) on the `regular` configuration of the [SWAG](https://huggingface.co/datasets/swag) dataset to select the best answer given multiple options and some context.
+This guide will show you how to:
+
+1. Finetune [BERT](https://huggingface.co/bert-base-uncased) on the `regular` configuration of the [SWAG](https://huggingface.co/datasets/swag) dataset to select the best answer given multiple options and some context.
+2. Use your finetuned model for inference.
+
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
 
 ## Load SWAG dataset
 
-Load the SWAG dataset from the 🤗 Datasets library:
+Start by loading the `regular` configuration of the SWAG dataset from the 🤗 Datasets library:
 
 ```py
 >>> from datasets import load_dataset
@@ -43,11 +60,15 @@ Then take a look at an example:
  'video-id': 'anetv_jkn6uvmqwh4'}
 ```
 
-The `sent1` and `sent2` fields show how a sentence begins, and each `ending` field shows how a sentence could end. Given the sentence beginning, the model must pick the correct sentence ending as indicated by the `label` field.
+While it looks like there are a lot of fields here, it is actually pretty straightforward:
+
+- `sent1` and `sent2`: these fields show how a sentence starts, and if you put the two together, you get the `startphrase` field.
+- `ending`: suggests a possible ending for how a sentence can end, but only one of them is correct.
+- `label`: identifies the correct sentence ending.
 
 ## Preprocess
 
-Load the BERT tokenizer to process the start of each sentence and the four possible endings:
+The next step is to load a BERT tokenizer to process the sentence starts and the four possible endings:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -55,9 +76,9 @@ Load the BERT tokenizer to process the start of each sentence and the four possi
 >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
 ```
 
-The preprocessing function needs to do:
+The preprocessing function you want to create needs to:
 
-1. Make four copies of the `sent1` field so you can combine each of them with `sent2` to recreate how a sentence starts.
+1. Make four copies of the `sent1` field and combine each of them with `sent2` to recreate how a sentence starts.
 2. Combine `sent2` with each of the four possible sentence endings.
 3. Flatten these two lists so you can tokenize them, and then unflatten them afterward so each example has a corresponding `input_ids`, `attention_mask`, and `labels` field.
 
@@ -79,15 +100,15 @@ The preprocessing function needs to do:
 ...     return {k: [v[i : i + 4] for i in range(0, len(v), 4)] for k, v in tokenized_examples.items()}
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] method. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
 
 ```py
 tokenized_swag = swag.map(preprocess_function, batched=True)
 ```
 
-🤗 Transformers doesn't have a data collator for multiple choice, so you will need to create one. You can adapt the [`DataCollatorWithPadding`] to create a batch of examples for multiple choice. It will also *dynamically pad* your text and labels to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
+🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
 
-`DataCollatorForMultipleChoice` will flatten all the model inputs, apply padding, and then unflatten the results:
+`DataCollatorForMultipleChoice` flattens all the model inputs, applies padding, and then unflattens the results:
 
 <frameworkcontent>
 <pt>
@@ -176,39 +197,65 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
 </tf>
 </frameworkcontent>
 
-## Train
+## Evaluate
 
-<frameworkcontent>
-<pt>
-Load BERT with [`AutoModelForMultipleChoice`]:
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
 
 ```py
->>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
+>>> import evaluate
 
->>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+>>> accuracy = evaluate.load("accuracy")
 ```
 
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the accuracy:
+
+```py
+>>> import numpy as np
+
+
+>>> def compute_metrics(eval_pred):
+...     predictions, labels = eval_pred
+...     predictions = np.argmax(predictions, axis=1)
+...     return accuracy.compute(predictions=predictions, references=labels)
+```
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
+## Train
+
+<frameworkcontent>
+<pt>
 <Tip>
 
-If you aren't familiar with fine-tuning a model with Trainer, take a look at the basic tutorial [here](../training#finetune-with-trainer)!
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
 
 </Tip>
+You're ready to start training your model now! Load BERT with [`AutoModelForMultipleChoice`]:
+
+```py
+>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
+
+>>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+```
 
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
-2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the accuracy and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_swag_model",
 ...     evaluation_strategy="epoch",
+...     save_strategy="epoch",
+...     load_best_model_at_end=True,
 ...     learning_rate=5e-5,
 ...     per_device_train_batch_size=16,
 ...     per_device_eval_batch_size=16,
 ...     num_train_epochs=3,
 ...     weight_decay=0.01,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -218,13 +265,44 @@ At this point, only three steps remain:
 ...     eval_dataset=tokenized_swag["validation"],
 ...     tokenizer=tokenizer,
 ...     data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
 ```
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 <tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
+<Tip>
+
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
+
+</Tip>
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
+
+```py
+>>> from transformers import create_optimizer
+
+>>> batch_size = 16
+>>> num_train_epochs = 2
+>>> total_train_steps = (len(tokenized_swag["train"]) // batch_size) * num_train_epochs
+>>> optimizer, schedule = create_optimizer(init_lr=5e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
+```
+
+Then you can load BERT with [`TFAutoModelForMultipleChoice`]:
+
+```py
+>>> from transformers import TFAutoModelForMultipleChoice
+
+>>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
 
 ```py
 >>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
@@ -243,41 +321,127 @@ To fine-tune a model in TensorFlow, start by converting your datasets to the `tf
 ... )
 ```
 
+Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+
+```py
+>>> model.compile(optimizer=optimizer)
+```
+
+The last two things to setup before you start training is to compute the accuracy from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](./main_classes/keras_callbacks).
+
+Pass your `compute_metrics` function to [`~transformers.KerasMetricCallback`]:
+
+```py
+>>> from transformers.keras_callbacks import KerasMetricCallback
+
+>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
+```
+
+Specify where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
+
+```py
+>>> from transformers.keras_callbacks import PushToHubCallback
+
+>>> push_to_hub_callback = PushToHubCallback(
+...     output_dir="my_awesome_model",
+...     tokenizer=tokenizer,
+... )
+```
+
+Then bundle your callbacks together:
+
+```py
+>>> callbacks = [metric_callback, push_to_hub_callback]
+```
+
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callbacks to finetune the model:
+
+```py
+>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2, callbacks=callbacks)
+```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
+</tf>
+</frameworkcontent>
+
+
 <Tip>
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+For a more in-depth example of how to finetune a model for multiple choice, take a look at the corresponding
+[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)
+or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb).
 
 </Tip>
 
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Come up with some text and two candidate answers:
 
 ```py
->>> from transformers import create_optimizer
+>>> prompt = "France has a bread law, Le Décret Pain, with strict rules on what is allowed in a traditional baguette."
+>>> candidate1 = "The law does not apply to croissants and brioche."
+>>> candidate2 = "The law applies to baguettes."
+```
 
->>> batch_size = 16
->>> num_train_epochs = 2
->>> total_train_steps = (len(tokenized_swag["train"]) // batch_size) * num_train_epochs
->>> optimizer, schedule = create_optimizer(init_lr=5e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
+<frameworkcontent>
+<pt>
+Tokenize each prompt and candidate answer pair and return PyTorch tensors. You should also create some `labels`:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_swag_model")
+>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="pt", padding=True)
+>>> labels = torch.tensor(0).unsqueeze(0)
 ```
 
-Load BERT with [`TFAutoModelForMultipleChoice`]:
+Pass your inputs and labels to the model and return the `logits`:
 
 ```py
->>> from transformers import TFAutoModelForMultipleChoice
+>>> from transformers import AutoModelForMultipleChoice
 
->>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
+>>> model = AutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
+>>> outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
+>>> logits = outputs.logits
 ```
 
-Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+Get the class with the highest probability:
 
 ```py
->>> model.compile(optimizer=optimizer)
+>>> predicted_class = logits.argmax().item()
+>>> predicted_class
+'0'
+```
+</pt>
+<tf>
+Tokenize each prompt and candidate answer pair and return TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_swag_model")
+>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="tf", padding=True)
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import TFAutoModelForMultipleChoice
+
+>>> model = TFAutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
+>>> inputs = {k: tf.expand_dims(v, 0) for k, v in inputs.items()}
+>>> outputs = model(inputs)
+>>> logits = outputs.logits
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+Get the class with the highest probability:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2)
+>>> predicted_class = int(tf.math.argmax(logits, axis=-1)[0])
+>>> predicted_class
+'0'
 ```
 </tf>
 </frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/question_answering.mdx b/docs/source/en/tasks/question_answering.mdx
index 218fa7bb5520..deabef4f04f4 100644
--- a/docs/source/en/tasks/question_answering.mdx
+++ b/docs/source/en/tasks/question_answering.mdx
@@ -12,14 +12,19 @@ specific language governing permissions and limitations under the License.
 
 # Question answering
 
+[[open-in-colab]]
+
 <Youtube id="ajPx5LwJD-I"/>
 
-Question answering tasks return an answer given a question. There are two common forms of question answering:
+Question answering tasks return an answer given a question. If you've ever asked a virtual assistant like Alexa, Siri or Google what the weather is, then you've used a question answering model before. There are two common types of question answering tasks:
 
 - Extractive: extract the answer from the given context.
 - Abstractive: generate an answer from the context that correctly answers the question.
 
-This guide will show you how to fine-tune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [SQuAD](https://huggingface.co/datasets/squad) dataset for extractive question answering.
+This guide will show you how to:
+
+1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [SQuAD](https://huggingface.co/datasets/squad) dataset for extractive question answering.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -27,14 +32,34 @@ See the question answering [task page](https://huggingface.co/tasks/question-ans
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load SQuAD dataset
 
-Load the SQuAD dataset from the 🤗 Datasets library:
+Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
 
 ```py
 >>> from datasets import load_dataset
 
->>> squad = load_dataset("squad")
+>>> squad = load_dataset("squad", split="train[:5000]")
+```
+
+Split the dataset's `train` split into a train and test set with the [`~datasets.Dataset.train_test_split`] method:
+
+```py
+>>> squad = squad.train_test_split(test_size=0.2)
 ```
 
 Then take a look at an example:
@@ -49,13 +74,17 @@ Then take a look at an example:
 }
 ```
 
-The `answers` field is a dictionary containing the starting position of the answer and the `text` of the answer.
+There are several important fields here:
+
+- `answers`: the starting location of the answer token and the answer text.
+- `context`: background information from which the model needs to extract the answer.
+- `question`: the question a model should answer.
 
 ## Preprocess
 
 <Youtube id="qgaM0weJHpA"/>
 
-Load the DistilBERT tokenizer to process the `question` and `context` fields:
+The next step is to load a DistilBERT tokenizer to process the `question` and `context` fields:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -63,15 +92,15 @@ Load the DistilBERT tokenizer to process the `question` and `context` fields:
 >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
 ```
 
-There are a few preprocessing steps particular to question answering that you should be aware of:
+There are a few preprocessing steps particular to question answering tasks you should be aware of:
 
-1. Some examples in a dataset may have a very long `context` that exceeds the maximum input length of the model. Truncate only the `context` by setting `truncation="only_second"`.
+1. Some examples in a dataset may have a very long `context` that exceeds the maximum input length of the model. To deal with longer sequences, truncate only the `context` by setting `truncation="only_second"`.
 2. Next, map the start and end positions of the answer to the original `context` by setting
    `return_offset_mapping=True`.
-3. With the mapping in hand, you can find the start and end tokens of the answer. Use the [`sequence_ids`](https://huggingface.co/docs/tokenizers/python/latest/api/reference.html#tokenizers.Encoding.sequence_ids) method to
+3. With the mapping in hand, now you can find the start and end tokens of the answer. Use the [`~tokenizers.Encoding.sequence_ids`] method to
    find which part of the offset corresponds to the `question` and which corresponds to the `context`.
 
-Here is how you can create a function to truncate and map the start and end tokens of the answer to the `context`:
+Here is how you can create a function to truncate and map the start and end tokens of the `answer` to the `context`:
 
 ```py
 >>> def preprocess_function(examples):
@@ -126,13 +155,13 @@ Here is how you can create a function to truncate and map the start and end toke
 ...     return inputs
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once. Remove the columns you don't need:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] function. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once. Remove any columns you don't need:
 
 ```py
 >>> tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)
 ```
 
-Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data collators in 🤗 Transformers, the `DefaultDataCollator` does not apply additional preprocessing such as padding.
+Now create a batch of examples using [`DefaultDataCollator`]. Unlike other data collators in 🤗 Transformers, the [`DefaultDataCollator`] does not apply any additional preprocessing such as padding.
 
 <frameworkcontent>
 <pt>
@@ -155,7 +184,12 @@ Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data col
 
 <frameworkcontent>
 <pt>
-Load DistilBERT with [`AutoModelForQuestionAnswering`]:
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
+
+</Tip>
+You're ready to start training your model now! Load DistilBERT with [`AutoModelForQuestionAnswering`]:
 
 ```py
 >>> from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
@@ -163,67 +197,49 @@ Load DistilBERT with [`AutoModelForQuestionAnswering`]:
 >>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
 ```
 
-<Tip>
-
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model).
 2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_qa_model",
 ...     evaluation_strategy="epoch",
 ...     learning_rate=2e-5,
 ...     per_device_train_batch_size=16,
 ...     per_device_eval_batch_size=16,
 ...     num_train_epochs=3,
 ...     weight_decay=0.01,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
 ...     model=model,
 ...     args=training_args,
 ...     train_dataset=tokenized_squad["train"],
-...     eval_dataset=tokenized_squad["validation"],
+...     eval_dataset=tokenized_squad["test"],
 ...     tokenizer=tokenizer,
 ...     data_collator=data_collator,
 ... )
 
 >>> trainer.train()
 ```
-</pt>
-<tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
 
-```py
->>> tf_train_set = model.prepare_tf_dataset(
-...     tokenized_squad["train"],
-...     shuffle=True,
-...     batch_size=16,
-...     collate_fn=data_collator,
-... )
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
 
->>> tf_validation_set = model.prepare_tf_dataset(
-...     tokenized_squad["validation"],
-...     shuffle=False,
-...     batch_size=16,
-...     collate_fn=data_collator,
-... )
+```py
+>>> trainer.push_to_hub()
 ```
-
+</pt>
+<tf>
 <Tip>
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
 
 </Tip>
-
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
 >>> from transformers import create_optimizer
@@ -238,7 +254,7 @@ Set up an optimizer function, learning rate schedule, and some training hyperpar
 ... )
 ```
 
-Load DistilBERT with [`TFAutoModelForQuestionAnswering`]:
+Then you can load DistilBERT with [`TFAutoModelForQuestionAnswering`]:
 
 ```py
 >>> from transformers import TFAutoModelForQuestionAnswering
@@ -246,6 +262,24 @@ Load DistilBERT with [`TFAutoModelForQuestionAnswering`]:
 >>> model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
 ```
 
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
+
+```py
+>>> tf_train_set = model.prepare_tf_dataset(
+...     tokenized_squad["train"],
+...     shuffle=True,
+...     batch_size=16,
+...     collate_fn=data_collator,
+... )
+
+>>> tf_validation_set = model.prepare_tf_dataset(
+...     tokenized_squad["test"],
+...     shuffle=False,
+...     batch_size=16,
+...     collate_fn=data_collator,
+... )
+```
+
 Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
 
 ```py
@@ -254,18 +288,134 @@ Configure the model for training with [`compile`](https://keras.io/api/models/mo
 >>> model.compile(optimizer=optimizer)
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+The last thing to setup before you start training is to provide a way to push your model to the Hub. This can be done by specifying where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
+
+```py
+>>> from transformers.keras_callbacks import PushToHubCallback
+
+>>> callback = PushToHubCallback(
+...     output_dir="my_awesome_qa_model",
+...     tokenizer=tokenizer,
+... )
+```
+
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callback to finetune the model:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
+>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=[callback])
 ```
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
 </tf>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for question answering, take a look at the corresponding
+For a more in-depth example of how to finetune a model for question answering, take a look at the corresponding
 [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)
 or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
 
-</Tip>
\ No newline at end of file
+</Tip>
+
+## Evaluate
+
+Evaluation for question answering requires a significant amount of postprocessing. To avoid taking up too much of your time, this guide skips the evaluation step. The [`Trainer`] still calculates the evaluation loss during training so you're not completely in the dark about your model's performance.
+
+If have more time and you're interested in how to evaluate your model for question answering, take a look at the [Question answering](https://huggingface.co/course/chapter7/7?fw=pt#postprocessing) chapter from the 🤗 Hugging Face Course!
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Come up with a question and some context you'd like the model to predict:
+
+```py
+>>> question = "How many programming languages does BLOOM support?"
+>>> context = "BLOOM has 176 billion parameters and can generate text in 46 languages natural languages and 13 programming languages."
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for question answering with your model, and pass your text to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> question_answerer = pipeline("question-answering", model="my_awesome_qa_model")
+>>> question_answerer(question=question, context=context)
+{'score': 0.2058267742395401,
+ 'start': 10,
+ 'end': 95,
+ 'answer': '176 billion parameters and can generate text in 46 languages natural languages and 13'}
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Tokenize the text and return PyTorch tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_qa_model")
+>>> inputs = tokenizer(question, context, return_tensors="pt")
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import AutoModelForQuestionAnswering
+
+>>> model = AutoModelForQuestionAnswering.from_pretrained("my_awesome_qa_model")
+>>> with torch.no_grad():
+...     outputs = model(**inputs)
+```
+
+Get the highest probability from the model output for the start and end positions:
+
+```py
+>>> answer_start_index = outputs.start_logits.argmax()
+>>> answer_end_index = outputs.end_logits.argmax()
+```
+
+Decode the predicted tokens to get the answer:
+
+```py
+>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
+>>> tokenizer.decode(predict_answer_tokens)
+'176 billion parameters and can generate text in 46 languages natural languages and 13'
+```
+</pt>
+<tf>
+Tokenize the text and return TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_qa_model")
+>>> inputs = tokenizer(question, text, return_tensors="tf")
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import TFAutoModelForQuestionAnswering
+
+>>> model = TFAutoModelForQuestionAnswering.from_pretrained("my_awesome_qa_model")
+>>> outputs = model(**inputs)
+```
+
+Get the highest probability from the model output for the start and end positions:
+
+```py
+>>> answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
+>>> answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
+```
+
+Decode the predicted tokens to get the answer:
+
+```py
+>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
+>>> tokenizer.decode(predict_answer_tokens)
+'176 billion parameters and can generate text in 46 languages natural languages and 13'
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/semantic_segmentation.mdx b/docs/source/en/tasks/semantic_segmentation.mdx
index 941f0c7f8a54..069f3128021b 100644
--- a/docs/source/en/tasks/semantic_segmentation.mdx
+++ b/docs/source/en/tasks/semantic_segmentation.mdx
@@ -18,7 +18,10 @@ specific language governing permissions and limitations under the License.
 
 Semantic segmentation assigns a label or class to each individual pixel of an image. There are several types of segmentation, and in the case of semantic segmentation, no distinction is made between unique instances of the same object. Both objects are given the same label (for example, "car" instead of "car-1" and "car-2"). Common real-world applications of semantic segmentation include training self-driving cars to identify pedestrians and important traffic information, identifying cells and abnormalities in medical imagery, and monitoring environmental changes from satellite imagery.
 
-This guide will show you how to finetune [SegFormer](https://huggingface.co/docs/transformers/main/en/model_doc/segformer#segformer) on the [SceneParse150](https://huggingface.co/datasets/scene_parse_150) dataset.
+This guide will show you how to:
+
+1. Finetune [SegFormer](https://huggingface.co/docs/transformers/main/en/model_doc/segformer#segformer) on the [SceneParse150](https://huggingface.co/datasets/scene_parse_150) dataset.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -32,9 +35,17 @@ Before you begin, make sure you have all the necessary libraries installed:
 pip install -q datasets transformers evaluate
 ```
 
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load SceneParse150 dataset
 
-Load the first 50 examples of the SceneParse150 dataset from the 🤗 Datasets library so you can quickly train and test a model:
+Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
 
 ```py
 >>> from datasets import load_dataset
@@ -42,7 +53,7 @@ Load the first 50 examples of the SceneParse150 dataset from the 🤗 Datasets l
 >>> ds = load_dataset("scene_parse_150", split="train[:50]")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset's `train` split into a train and test set with the [`~datasets.Dataset.train_test_split`] method:
 
 ```py
 >>> ds = ds.train_test_split(test_size=0.2)
@@ -59,7 +70,9 @@ Then take a look at an example:
  'scene_category': 368}
 ```
 
-There is an `image`, an `annotation` (this is the segmentation map or label), and a `scene_category` field that describes the image scene, like "kitchen" or "office". In this guide, you'll only need `image` and `annotation`, both of which are PIL images.
+- `image`: a PIL image of the scene.
+- `annotation`: a PIL image of the segmentation map, which is also the model's target.
+- `scene_category`: a category id that describes the image scene like "kitchen" or "office". In this guide, you'll only need `image` and `annotation`, both of which are PIL images.
 
 You'll also want to create a dictionary that maps a label id to a label class which will be useful when you set up the model later. Download the mappings from the Hub and create the `id2label` and `label2id` dictionaries:
 
@@ -77,7 +90,7 @@ You'll also want to create a dictionary that maps a label id to a label class wh
 
 ## Preprocess
 
-Next, load a SegFormer feature extractor to prepare the images and annotations for the model. Some datasets, like this one, use the zero-index as the background class. However, the background class isn't included in the 150 classes, so you'll need to set `reduce_labels=True` to subtract one from all the labels. The zero-index is replaced by `255` so it's ignored by SegFormer's loss function:
+The next step is to load a SegFormer feature extractor to prepare the images and annotations for the model. Some datasets, like this one, use the zero-index as the background class. However, the background class isn't actually included in the 150 classes, so you'll need to set `reduce_labels=True` to subtract one from all the labels. The zero-index is replaced by `255` so it's ignored by SegFormer's loss function:
 
 ```py
 >>> from transformers import AutoFeatureExtractor
@@ -85,7 +98,7 @@ Next, load a SegFormer feature extractor to prepare the images and annotations f
 >>> feature_extractor = AutoFeatureExtractor.from_pretrained("nvidia/mit-b0", reduce_labels=True)
 ```
 
-It is common to apply some data augmentations to an image dataset to make a model more robust against overfitting. In this guide, you'll use the [`ColorJitter`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html) function from [torchvision](https://pytorch.org/vision/stable/index.html) to randomly change the color properties of an image:
+It is common to apply some data augmentations to an image dataset to make a model more robust against overfitting. In this guide, you'll use the [`ColorJitter`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html) function from [torchvision](https://pytorch.org/vision/stable/index.html) to randomly change the color properties of an image, but you can also use any image library you like.
 
 ```py
 >>> from torchvision.transforms import ColorJitter
@@ -117,53 +130,9 @@ To apply the `jitter` over the entire dataset, use the 🤗 Datasets [`~datasets
 >>> test_ds.set_transform(val_transforms)
 ```
 
-## Train
-
-Load SegFormer with [`AutoModelForSemanticSegmentation`], and pass the model the mapping between label ids and label classes:
-
-```py
->>> from transformers import AutoModelForSemanticSegmentation
-
->>> pretrained_model_name = "nvidia/mit-b0"
->>> model = AutoModelForSemanticSegmentation.from_pretrained(
-...     pretrained_model_name, id2label=id2label, label2id=label2id
-... )
-```
-
-<Tip>
-
-If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
-
-</Tip>
-
-Define your training hyperparameters in [`TrainingArguments`]. It is important not to remove unused columns because this will drop the `image` column. Without the `image` column, you can't create `pixel_values`. Set `remove_unused_columns=False` to prevent this behavior!
+## Evaluate
 
-To save and push a model under your namespace to the Hub, set `push_to_hub=True`:
-
-```py
->>> from transformers import TrainingArguments
-
->>> training_args = TrainingArguments(
-...     output_dir="segformer-b0-scene-parse-150",
-...     learning_rate=6e-5,
-...     num_train_epochs=50,
-...     per_device_train_batch_size=2,
-...     per_device_eval_batch_size=2,
-...     save_total_limit=3,
-...     evaluation_strategy="steps",
-...     save_strategy="steps",
-...     save_steps=20,
-...     eval_steps=20,
-...     logging_steps=1,
-...     eval_accumulation_steps=5,
-...     remove_unused_columns=False,
-...     push_to_hub=True,
-... )
-```
-
-To evaluate model performance during training, you'll need to create a function to compute and report metrics. For semantic segmentation, you'll typically compute the [mean Intersection over Union](https://huggingface.co/spaces/evaluate-metric/mean_iou) (IoU). The mean IoU measures the overlapping area between the predicted and ground truth segmentation maps. 
-
-Load the mean IoU from the 🤗 Evaluate library:
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [mean Intersection over Union](https://huggingface.co/spaces/evaluate-metric/accuracy) (IoU) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
 
 ```py
 >>> import evaluate
@@ -199,10 +168,50 @@ Then create a function to [`~evaluate.EvaluationModule.compute`] the metrics. Yo
 ...         return metrics
 ```
 
-Pass your model, training arguments, datasets, and metrics function to the [`Trainer`]:
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
+## Train
+
+<Tip>
+
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
+
+</Tip>
+
+You're ready to start training your model now! Load SegFormer with [`AutoModelForSemanticSegmentation`], and pass the model the mapping between label ids and label classes:
 
 ```py
->>> from transformers import Trainer
+>>> from transformers import AutoModelForSemanticSegmentation, TrainingArguments, Trainer
+
+>>> pretrained_model_name = "nvidia/mit-b0"
+>>> model = AutoModelForSemanticSegmentation.from_pretrained(
+...     pretrained_model_name, id2label=id2label, label2id=label2id
+... )
+```
+
+At this point, only three steps remain:
+
+1. Define your training hyperparameters in [`TrainingArguments`]. It is important you don't remove unused columns because this'll drop the `image` column. Without the `image` column, you can't create `pixel_values`. Set `remove_unused_columns=False` to prevent this behavior! The only other required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the IoU metric and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
+
+```py
+>>> training_args = TrainingArguments(
+...     output_dir="segformer-b0-scene-parse-150",
+...     learning_rate=6e-5,
+...     num_train_epochs=50,
+...     per_device_train_batch_size=2,
+...     per_device_eval_batch_size=2,
+...     save_total_limit=3,
+...     evaluation_strategy="steps",
+...     save_strategy="steps",
+...     save_steps=20,
+...     eval_steps=20,
+...     logging_steps=1,
+...     eval_accumulation_steps=5,
+...     remove_unused_columns=False,
+...     push_to_hub=True,
+... )
 
 >>> trainer = Trainer(
 ...     model=model,
@@ -211,12 +220,14 @@ Pass your model, training arguments, datasets, and metrics function to the [`Tra
 ...     eval_dataset=test_ds,
 ...     compute_metrics=compute_metrics,
 ... )
+
+>>> trainer.train()
 ```
 
-Lastly, call [`~Trainer.train`] to finetune your model:
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
 
 ```py
->>> trainer.train()
+>>> trainer.push_to_hub()
 ```
 
 ## Inference
@@ -234,7 +245,43 @@ Load an image for inference:
     <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/semantic-seg-image.png" alt="Image of bedroom"/>
 </div>
 
-Process the image with a feature extractor and place the `pixel_values` on a GPU:
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for image segmentation with your model, and pass your image to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> segmenter = pipeline("image-segmentation", model="my_awesome_seg_model")
+>>> segmenter(image)
+[{'score': None,
+  'label': 'wall',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062690>},
+ {'score': None,
+  'label': 'sky',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062A50>},
+ {'score': None,
+  'label': 'floor',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062B50>},
+ {'score': None,
+  'label': 'ceiling',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062A10>},
+ {'score': None,
+  'label': 'bed ',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062E90>},
+ {'score': None,
+  'label': 'windowpane',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062390>},
+ {'score': None,
+  'label': 'cabinet',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062550>},
+ {'score': None,
+  'label': 'chair',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062D90>},
+ {'score': None,
+  'label': 'armchair',
+  'mask': <PIL.Image.Image image mode=L size=640x427 at 0x7FD5B2062E10>}]
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like. Process the image with a feature extractor and place the `pixel_values` on a GPU:
 
 ```py
 >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # use GPU if available, otherwise use a CPU
diff --git a/docs/source/en/tasks/sequence_classification.mdx b/docs/source/en/tasks/sequence_classification.mdx
index 2ef8a9619ce5..bc9c5f20e722 100644
--- a/docs/source/en/tasks/sequence_classification.mdx
+++ b/docs/source/en/tasks/sequence_classification.mdx
@@ -12,11 +12,16 @@ specific language governing permissions and limitations under the License.
 
 # Text classification
 
+[[open-in-colab]]
+
 <Youtube id="leNG9fN9FQU"/>
 
-Text classification is a common NLP task that assigns a label or class to text. There are many practical applications of text classification widely used in production by some of today's largest companies. One of the most popular forms of text classification is sentiment analysis, which assigns a label like positive, negative, or neutral to a sequence of text. 
+Text classification is a common NLP task that assigns a label or class to text. Some of the largest companies run text classification in production for a wide range of practical applications. One of the most popular forms of text classification is sentiment analysis, which assigns a label like 🙂 positive, 🙁 negative, or 😐 neutral to a sequence of text. 
+
+This guide will show you how to:
 
-This guide will show you how to fine-tune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [IMDb](https://huggingface.co/datasets/imdb) dataset to determine whether a movie review is positive or negative.
+1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [IMDb](https://huggingface.co/datasets/imdb) dataset to determine whether a movie review is positive or negative.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -24,9 +29,23 @@ See the text classification [task page](https://huggingface.co/tasks/text-classi
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load IMDb dataset
 
-Load the IMDb dataset from the 🤗 Datasets library:
+Start by loading the IMDb dataset from the 🤗 Datasets library:
 
 ```py
 >>> from datasets import load_dataset
@@ -46,12 +65,12 @@ Then take a look at an example:
 
 There are two fields in this dataset: 
 
-- `text`: a string containing the text of the movie review.
-- `label`: a value that can either be `0` for a negative review or `1` for a positive review.
+- `text`: the movie review text.
+- `label`: a value that is either `0` for a negative review or `1` for a positive review.
 
 ## Preprocess
 
-Load the DistilBERT tokenizer to process the `text` field:
+The next step is to load a DistilBERT tokenizer to preprocess the `text` field:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -66,13 +85,13 @@ Create a preprocessing function to tokenize `text` and truncate sequences to be
 ...     return tokenizer(examples["text"], truncation=True)
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] function. You can speed up `map` by setting `batched=True` to process multiple elements of the dataset at once:
 
 ```py
 tokenized_imdb = imdb.map(preprocess_function, batched=True)
 ```
 
-Use [`DataCollatorWithPadding`] to create a batch of examples. It will also *dynamically pad* your text to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
+Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
 
 <frameworkcontent>
 <pt>
@@ -91,38 +110,74 @@ Use [`DataCollatorWithPadding`] to create a batch of examples. It will also *dyn
 </tf>
 </frameworkcontent>
 
-## Train
+## Evaluate
 
-<frameworkcontent>
-<pt>
-Load DistilBERT with [`AutoModelForSequenceClassification`] along with the number of expected labels:
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
 
 ```py
->>> from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
+>>> import evaluate
 
->>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+>>> accuracy = evaluate.load("accuracy")
 ```
 
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the accuracy:
+
+```py
+>>> import numpy as np
+
+
+>>> def compute_metrics(eval_pred):
+...     predictions, labels = eval_pred
+...     predictions = np.argmax(predictions, axis=1)
+...     return accuracy.compute(predictions=predictions, references=labels)
+```
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
+## Train
+
+Before you start training your model, create a map of the expected ids to their labels with `id2label` and `label2id`:
+
+```py
+>>> id2label = {0: "NEGATIVE", 1: "POSITIVE"}
+>>> label2id = {"NEGATIVE": 0, "POSITIVE": 1}
+```
+
+<frameworkcontent>
+<pt>
 <Tip>
 
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
 
 </Tip>
+You're ready to start training your model now! Load DistilBERT with [`AutoModelForSequenceClassification`] along with the number of expected labels, and the label mappings:
+
+```py
+>>> from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
+
+>>> model = AutoModelForSequenceClassification.from_pretrained(
+...     "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
+... )
+```
 
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
-2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the accuracy and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_model",
 ...     learning_rate=2e-5,
 ...     per_device_train_batch_size=16,
 ...     per_device_eval_batch_size=16,
-...     num_train_epochs=5,
+...     num_train_epochs=2,
 ...     weight_decay=0.01,
+...     evaluation_strategy="epoch",
+...     save_strategy="epoch",
+...     load_best_model_at_end=True,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -132,6 +187,7 @@ At this point, only three steps remain:
 ...     eval_dataset=tokenized_imdb["test"],
 ...     tokenizer=tokenizer,
 ...     data_collator=data_collator,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
@@ -139,13 +195,46 @@ At this point, only three steps remain:
 
 <Tip>
 
-[`Trainer`] will apply dynamic padding by default when you pass `tokenizer` to it. In this case, you don't need to specify a data collator explicitly.
+[`Trainer`] applies dynamic padding by default when you pass `tokenizer` to it. In this case, you don't need to specify a data collator explicitly.
 
 </Tip>
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 <tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
+<Tip>
 
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
+
+</Tip>
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
+
+```py
+>>> from transformers import create_optimizer
+>>> import tensorflow as tf
+
+>>> batch_size = 16
+>>> num_epochs = 5
+>>> batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
+>>> total_train_steps = int(batches_per_epoch * num_epochs)
+>>> optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
+```
+
+Then you can load DistilBERT with [`TFAutoModelForSequenceClassification`] along with the number of expected labels, and the label mappings:
+
+```py
+>>> from transformers import TFAutoModelForSequenceClassification
+
+>>> model = TFAutoModelForSequenceClassification.from_pretrained(
+...     "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
+... )
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
 
 ```py
 >>> tf_train_set = model.prepare_tf_dataset(
@@ -163,53 +252,135 @@ To fine-tune a model in TensorFlow, start by converting your datasets to the `tf
 ... )
 ```
 
-<Tip>
+Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+```py
+>>> import tensorflow as tf
 
-</Tip>
+>>> model.compile(optimizer=optimizer)
+```
+
+The last two things to setup before you start training is to compute the accuracy from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](./main_classes/keras_callbacks).
 
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
+Pass your `compute_metrics` function to [`~transformers.KerasMetricCallback`]:
 
 ```py
->>> from transformers import create_optimizer
->>> import tensorflow as tf
+>>> from transformers.keras_callbacks import KerasMetricCallback
 
->>> batch_size = 16
->>> num_epochs = 5
->>> batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
->>> total_train_steps = int(batches_per_epoch * num_epochs)
->>> optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
+>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
 ```
 
-Load DistilBERT with [`TFAutoModelForSequenceClassification`] along with the number of expected labels:
+Specify where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
 
 ```py
->>> from transformers import TFAutoModelForSequenceClassification
+>>> from transformers.keras_callbacks import PushToHubCallback
 
->>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+>>> push_to_hub_callback = PushToHubCallback(
+...     output_dir="my_awesome_model",
+...     tokenizer=tokenizer,
+... )
 ```
 
-Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+Then bundle your callbacks together:
 
 ```py
->>> import tensorflow as tf
-
->>> model.compile(optimizer=optimizer)
+>>> callbacks = [metric_callback, push_to_hub_callback]
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callbacks to finetune the model:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
+>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=callbacks)
 ```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
 </tf>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for text classification, take a look at the corresponding
+For a more in-depth example of how to finetune a model for text classification, take a look at the corresponding
 [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)
 or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
 
-</Tip>
\ No newline at end of file
+</Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Grab some text you'd like to run inference on:
+
+```py
+>>> text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for sentiment analysis with your model, and pass your text to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> classifier = pipeline("sentiment-analysis", model="stevhliu/my_awesome_model")
+>>> classifier(text)
+[{'label': 'POSITIVE', 'score': 0.9994940757751465}]
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Tokenize the text and return PyTorch tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_model")
+>>> inputs = tokenizer(text, return_tensors="pt")
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import AutoModelForSequenceClassification
+
+>>> model = AutoModelForSequenceClassification.from_pretrained("stevhliu/my_awesome_model")
+>>> with torch.no_grad():
+...     logits = model(**inputs).logits
+```
+
+Get the class with the highest probability, and use the model's `id2label` mapping to convert it to a text label:
+
+```py
+>>> predicted_class_id = logits.argmax().item()
+>>> model.config.id2label[predicted_class_id]
+'POSITIVE'
+```
+</pt>
+<tf>
+Tokenize the text and return TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_model")
+>>> inputs = tokenizer(text, return_tensors="tf")
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import TFAutoModelForSequenceClassification
+
+>>> model = TFAutoModelForSequenceClassification.from_pretrained("stevhliu/my_awesome_model")
+>>> logits = model(**inputs).logits
+```
+
+Get the class with the highest probability, and use the model's `id2label` mapping to convert it to a text label:
+
+```py
+>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
+>>> model.config.id2label[predicted_class_id]
+'POSITIVE'
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/summarization.mdx b/docs/source/en/tasks/summarization.mdx
index 1b2eafcb5f9f..1663c1f71348 100644
--- a/docs/source/en/tasks/summarization.mdx
+++ b/docs/source/en/tasks/summarization.mdx
@@ -19,7 +19,10 @@ Summarization creates a shorter version of a document or an article that capture
 - Extractive: extract the most relevant information from a document.
 - Abstractive: generate new text that captures the most relevant information. 
 
-This guide will show you how to fine-tune [T5](https://huggingface.co/t5-small) on the California state bill subset of the [BillSum](https://huggingface.co/datasets/billsum) dataset for abstractive summarization.
+This guide will show you how to:
+
+1. Finetune [T5](https://huggingface.co/t5-small) on the California state bill subset of the [BillSum](https://huggingface.co/datasets/billsum) dataset for abstractive summarization.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -27,9 +30,23 @@ See the summarization [task page](https://huggingface.co/tasks/summarization) fo
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load BillSum dataset
 
-Load the BillSum dataset from the 🤗 Datasets library:
+Start by loading the smaller California state bill subset of the BillSum dataset from the 🤗 Datasets library:
 
 ```py
 >>> from datasets import load_dataset
@@ -37,7 +54,7 @@ Load the BillSum dataset from the 🤗 Datasets library:
 >>> billsum = load_dataset("billsum", split="ca_test")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset into a train and test set with the [`~datasets.Dataset.train_test_split`] method:
 
 ```py
 >>> billsum = billsum.train_test_split(test_size=0.2)
@@ -52,11 +69,14 @@ Then take a look at an example:
  'title': 'An act to add Section 10295.35 to the Public Contract Code, relating to public contracts.'}
 ```
 
-The `text` field is the input and the `summary` field is the target.
+There are two fields that you'll want to use:
+
+- `text`: the text of the bill which'll be the input to the model.
+- `summary`: a condensed version of `text` which'll be the model target.
 
 ## Preprocess
 
-Load the T5 tokenizer to process `text` and `summary`:
+The next step is to load a T5 tokenizer to process `text` and `summary`:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -64,7 +84,7 @@ Load the T5 tokenizer to process `text` and `summary`:
 >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
 ```
 
-The preprocessing function needs to:
+The preprocessing function you want to create needs to:
 
 1. Prefix the input with a prompt so T5 knows this is a summarization task. Some models capable of multiple NLP tasks require prompting for specific tasks.
 2. Use the keyword `text_target` argument when tokenizing labels.
@@ -84,13 +104,13 @@ The preprocessing function needs to:
 ...     return model_inputs
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] method. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
 
 ```py
 >>> tokenized_billsum = billsum.map(preprocess_function, batched=True)
 ```
 
-Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dynamically pad* your text and labels to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
+Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
 
 <frameworkcontent>
 <pt>
@@ -109,41 +129,74 @@ Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dyna
 </tf>
 </frameworkcontent>
 
-## Train
+## Evaluate
 
-<frameworkcontent>
-<pt>
-Load T5 with [`AutoModelForSeq2SeqLM`]:
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
 
 ```py
->>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
+>>> import evaluate
 
->>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
+>>> rouge = evaluate.load("rouge")
 ```
 
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the ROUGE metric:
+
+```py
+>>> import numpy as np
+
+
+>>> def compute_metrics(eval_pred):
+...     predictions, labels = eval_pred
+...     decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
+...     labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
+...     decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
+
+...     result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
+
+...     prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
+...     result["gen_len"] = np.mean(prediction_lens)
+
+...     return {k: round(v, 4) for k, v in result.items()}
+```
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
+## Train
+
+<frameworkcontent>
+<pt>
 <Tip>
 
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
 
 </Tip>
+You're ready to start training your model now! Load T5 with [`AutoModelForSeq2SeqLM`]:
+
+```py
+>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
+
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
+```
 
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`Seq2SeqTrainingArguments`].
-2. Pass the training arguments to [`Seq2SeqTrainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`Seq2SeqTrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the ROUGE metric and save the training checkpoint.
+2. Pass the training arguments to [`Seq2SeqTrainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = Seq2SeqTrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_billsum_model",
 ...     evaluation_strategy="epoch",
 ...     learning_rate=2e-5,
 ...     per_device_train_batch_size=16,
 ...     per_device_eval_batch_size=16,
 ...     weight_decay=0.01,
 ...     save_total_limit=3,
-...     num_train_epochs=1,
+...     num_train_epochs=4,
+...     predict_with_generate=True,
 ...     fp16=True,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Seq2SeqTrainer(
@@ -153,13 +206,41 @@ At this point, only three steps remain:
 ...     eval_dataset=tokenized_billsum["test"],
 ...     tokenizer=tokenizer,
 ...     data_collator=data_collator,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
 ```
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
+```
 </pt>
 <tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
+<Tip>
+
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
+
+</Tip>
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
+
+```py
+>>> from transformers import create_optimizer, AdamWeightDecay
+
+>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+```
+
+Then you can load T5 with [`TFAutoModelForSeq2SeqLM`]:
+
+```py
+>>> from transformers import TFAutoModelForSeq2SeqLM
+
+>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
 
 ```py
 >>> tf_train_set = model.prepare_tf_dataset(
@@ -177,46 +258,133 @@ To fine-tune a model in TensorFlow, start by converting your datasets to the `tf
 ... )
 ```
 
-<Tip>
+Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+```py
+>>> import tensorflow as tf
 
-</Tip>
+>>> model.compile(optimizer=optimizer)
+```
 
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
+The last two things to setup before you start training is to compute the ROUGE score from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](./main_classes/keras_callbacks).
+
+Pass your `compute_metrics` function to [`~transformers.KerasMetricCallback`]:
 
 ```py
->>> from transformers import create_optimizer, AdamWeightDecay
+>>> from transformers.keras_callbacks import KerasMetricCallback
 
->>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
 ```
 
-Load T5 with [`TFAutoModelForSeq2SeqLM`]:
+Specify where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
 
 ```py
->>> from transformers import TFAutoModelForSeq2SeqLM
+>>> from transformers.keras_callbacks import PushToHubCallback
 
->>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
+>>> push_to_hub_callback = PushToHubCallback(
+...     output_dir="my_awesome_billsum_model",
+...     tokenizer=tokenizer,
+... )
 ```
 
-Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+Then bundle your callbacks together:
 
 ```py
->>> model.compile(optimizer=optimizer)
+>>> callbacks = [metric_callback, push_to_hub_callback]
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callbacks to finetune the model:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
+>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)
 ```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
 </tf>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for summarization, take a look at the corresponding
+For a more in-depth example of how to finetune a model for summarization, take a look at the corresponding
 [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)
 or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb).
 
-</Tip>
\ No newline at end of file
+</Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Come up with some text you'd like to summarize. For T5, you need to prefix your input depending on the task you're working on. For summarization you should prefix your input as shown below:
+
+```py
+>>> text = "summarize: The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes."
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for summarization with your model, and pass your text to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> summarizer = pipeline("summarization", model="stevhliu/my_awesome_billsum_model")
+>>> summarizer(text)
+[{"summary_text": "The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country."}]
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+
+<frameworkcontent>
+<pt>
+Tokenize the text and return the `input_ids` as PyTorch tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_billsum_model")
+>>> inputs = tokenizer(text, return_tensors="pt").input_ids
+```
+
+Use the [`~transformers.generation_utils.GenerationMixin.generate`] method to create the summarization. For more details about the different text generation strategies and parameters for controlling generation, check out the [Text Generation](./main_classes/text_generation) API.
+
+```py
+>>> from transformers import AutoModelForSeq2SeqLM
+
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("stevhliu/my_awesome_billsum_model")
+>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)
+```
+
+Decode the generated token ids back into text:
+
+```py
+>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
+'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'
+```
+</pt>
+<tf>
+Tokenize the text and return the `input_ids` as TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_billsum_model")
+>>> inputs = tokenizer(text, return_tensors="tf").input_ids
+```
+
+Use the [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] method to create the summarization. For more details about the different text generation strategies and parameters for controlling generation, check out the [Text Generation](./main_classes/text_generation) API.
+
+```py
+>>> from transformers import TFAutoModelForSeq2SeqLM
+
+>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("stevhliu/my_awesome_billsum_model")
+>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)
+```
+
+Decode the generated token ids back into text:
+
+```py
+>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
+'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/token_classification.mdx b/docs/source/en/tasks/token_classification.mdx
index 3d2a3ccb05cc..8c7ceac48f46 100644
--- a/docs/source/en/tasks/token_classification.mdx
+++ b/docs/source/en/tasks/token_classification.mdx
@@ -12,11 +12,16 @@ specific language governing permissions and limitations under the License.
 
 # Token classification
 
+[[open-in-colab]]
+
 <Youtube id="wVHdVlPScxA"/>
 
 Token classification assigns a label to individual tokens in a sentence. One of the most common token classification tasks is Named Entity Recognition (NER). NER attempts to find a label for each entity in a sentence, such as a person, location, or organization. 
 
-This guide will show you how to fine-tune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [WNUT 17](https://huggingface.co/datasets/wnut_17) dataset to detect new entities.
+This guide will show you how to:
+
+1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [WNUT 17](https://huggingface.co/datasets/wnut_17) dataset to detect new entities.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -24,9 +29,23 @@ See the token classification [task page](https://huggingface.co/tasks/token-clas
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load WNUT 17 dataset
 
-Load the WNUT 17 dataset from the 🤗 Datasets library:
+Start by loading the WNUT 17 dataset from the 🤗 Datasets library:
 
 ```py
 >>> from datasets import load_dataset
@@ -44,7 +63,7 @@ Then take a look at an example:
 }
 ```
 
-Each number in `ner_tags` represents an entity. Convert the number to a label name for more information:
+Each number in `ner_tags` represents an entity. Convert the numbers to their label names to find out what the entities are:
 
 ```py
 >>> label_list = wnut["train"].features[f"ner_tags"].feature.names
@@ -66,10 +85,10 @@ Each number in `ner_tags` represents an entity. Convert the number to a label na
 ]
 ```
 
-The `ner_tag` describes an entity, such as a corporation, location, or person. The letter that prefixes each `ner_tag` indicates the token position of the entity:
+The letter that prefixes each `ner_tag` indicates the token position of the entity:
 
 - `B-` indicates the beginning of an entity.
-- `I-` indicates a token is contained inside the same entity (e.g., the `State` token is a part of an entity like
+- `I-` indicates a token is contained inside the same entity (for example, the `State` token is a part of an entity like
   `Empire State Building`).
 - `0` indicates the token doesn't correspond to any entity.
 
@@ -77,7 +96,7 @@ The `ner_tag` describes an entity, such as a corporation, location, or person. T
 
 <Youtube id="iY2AZYdZAr0"/>
 
-Load the DistilBERT tokenizer to process the `tokens`:
+The next step is to load a DistilBERT tokenizer to preprocess the `tokens` field:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -85,23 +104,23 @@ Load the DistilBERT tokenizer to process the `tokens`:
 >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
 ```
 
-Since the input has already been split into words, set `is_split_into_words=True` to tokenize the words into subwords:
+As you saw in the example `tokens` field above, it looks like the input has already been tokenized. But the input actually hasn't been tokenized yet and you'll need to set `is_split_into_words=True` to tokenize the words into subwords. For example:
 
 ```py
+>>> example = wnut["train"][0]
 >>> tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
 >>> tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
 >>> tokens
 ['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]']
 ```
 
-Adding the special tokens `[CLS]` and `[SEP]` and subword tokenization creates a mismatch between the input and labels. A single word corresponding to a single label may be split into two subwords. You will need to realign the tokens and labels by:
+However, this adds some special tokens `[CLS]` and `[SEP]` and the subword tokenization creates a mismatch between the input and labels. A single word corresponding to a single label may now be split into two subwords. You'll need to realign the tokens and labels by:
 
 1. Mapping all tokens to their corresponding word with the [`word_ids`](https://huggingface.co/docs/tokenizers/python/latest/api/reference.html#tokenizers.Encoding.word_ids) method.
-2. Assigning the label `-100` to the special tokens `[CLS]` and `[SEP]` so the PyTorch loss function ignores
-   them.
+2. Assigning the label `-100` to the special tokens `[CLS]` and `[SEP]` so they're ignored by the PyTorch loss function.
 3. Only labeling the first token of a given word. Assign `-100` to other subtokens from the same word.
 
-Here is how you can create a function to realign the tokens and labels, and truncate sequences to be no longer than DistilBERT's maximum input length::
+Here is how you can create a function to realign the tokens and labels, and truncate sequences to be no longer than DistilBERT's maximum input length:
 
 ```py
 >>> def tokenize_and_align_labels(examples):
@@ -126,13 +145,13 @@ Here is how you can create a function to realign the tokens and labels, and trun
 ...     return tokenized_inputs
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to tokenize and align the labels over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] function. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
 
 ```py
 >>> tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
 ```
 
-Use [`DataCollatorForTokenClassification`] to create a batch of examples. It will also *dynamically pad* your text and labels to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
+Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
 
 <frameworkcontent>
 <pt>
@@ -151,39 +170,120 @@ Use [`DataCollatorForTokenClassification`] to create a batch of examples. It wil
 </tf>
 </frameworkcontent>
 
-## Train
+## Evaluate
 
-<frameworkcontent>
-<pt>
-Load DistilBERT with [`AutoModelForTokenClassification`] along with the number of expected labels:
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [seqeval](https://huggingface.co/spaces/evaluate-metric/seqeval) framework (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric). Seqeval actually produces several scores: precision, recall, F1, and accuracy.
 
 ```py
->>> from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
+>>> import evaluate
 
->>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=14)
+>>> seqeval = evaluate.load("seqeval")
+```
+
+Get the NER labels first, and then create a function that passes your true predictions and true labels to [`~evaluate.EvaluationModule.compute`] to calculate the scores:
+
+```py
+>>> import numpy as np
+
+>>> labels = [label_list[i] for i in example[f"ner_tags"]]
+
+
+>>> def compute_metrics(p):
+...     predictions, labels = p
+...     predictions = np.argmax(predictions, axis=2)
+
+...     true_predictions = [
+...         [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
+...         for prediction, label in zip(predictions, labels)
+...     ]
+...     true_labels = [
+...         [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
+...         for prediction, label in zip(predictions, labels)
+...     ]
+
+...     results = seqeval.compute(predictions=true_predictions, references=true_labels)
+...     return {
+...         "precision": results["overall_precision"],
+...         "recall": results["overall_recall"],
+...         "f1": results["overall_f1"],
+...         "accuracy": results["overall_accuracy"],
+...     }
 ```
 
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
+
+## Train
+
+Before you start training your model, create a map of the expected ids to their labels with `id2label` and `label2id`:
+
+```py
+>>> id2label = {
+...     0: "O",
+...     1: "B-corporation",
+...     2: "I-corporation",
+...     3: "B-creative-work",
+...     4: "I-creative-work",
+...     5: "B-group",
+...     6: "I-group",
+...     7: "B-location",
+...     8: "I-location",
+...     9: "B-person",
+...     10: "I-person",
+...     11: "B-product",
+...     12: "I-product",
+... }
+>>> label2id = {
+...     "O": 0,
+...     "B-corporation": 1,
+...     "I-corporation": 2,
+...     "B-creative-work": 3,
+...     "I-creative-work": 4,
+...     "B-group": 5,
+...     "I-group": 6,
+...     "B-location": 7,
+...     "I-location": 8,
+...     "B-person": 9,
+...     "I-person": 10,
+...     "B-product": 11,
+...     "I-product": 12,
+... }
+```
+
+<frameworkcontent>
+<pt>
 <Tip>
 
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
 
 </Tip>
+You're ready to start training your model now! Load DistilBERT with [`AutoModelForTokenClassification`] along with the number of expected labels, and the label mappings:
+
+```py
+>>> from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
+
+>>> model = AutoModelForTokenClassification.from_pretrained(
+...     "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
+... )
+```
 
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`TrainingArguments`].
-2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`TrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the seqeval scores and save the training checkpoint.
+2. Pass the training arguments to [`Trainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
 >>> training_args = TrainingArguments(
-...     output_dir="./results",
-...     evaluation_strategy="epoch",
+...     output_dir="my_awesome_wnut_model",
 ...     learning_rate=2e-5,
 ...     per_device_train_batch_size=16,
 ...     per_device_eval_batch_size=16,
-...     num_train_epochs=3,
+...     num_train_epochs=2,
 ...     weight_decay=0.01,
+...     evaluation_strategy="epoch",
+...     save_strategy="epoch",
+...     load_best_model_at_end=True,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Trainer(
@@ -193,37 +293,25 @@ At this point, only three steps remain:
 ...     eval_dataset=tokenized_wnut["test"],
 ...     tokenizer=tokenizer,
 ...     data_collator=data_collator,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
 ```
-</pt>
-<tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
 
-```py
->>> tf_train_set = model.prepare_tf_dataset(
-...     tokenized_wnut["train"],
-...     shuffle=True,
-...     batch_size=16,
-...     collate_fn=data_collator,
-... )
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
 
->>> tf_validation_set = model.prepare_tf_dataset(
-...     tokenized_wnut["validation"],
-...     shuffle=False,
-...     batch_size=16,
-...     collate_fn=data_collator,
-... )
+```py
+>>> trainer.push_to_hub()
 ```
-
+</pt>
+<tf>
 <Tip>
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
 
 </Tip>
-
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
 >>> from transformers import create_optimizer
@@ -239,12 +327,32 @@ Set up an optimizer function, learning rate schedule, and some training hyperpar
 ... )
 ```
 
-Load DistilBERT with [`TFAutoModelForTokenClassification`] along with the number of expected labels:
+Then you can load DistilBERT with [`TFAutoModelForTokenClassification`] along with the number of expected labels, and the label mappings:
 
 ```py
 >>> from transformers import TFAutoModelForTokenClassification
 
->>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+>>> model = TFAutoModelForTokenClassification.from_pretrained(
+...     "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
+... )
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
+
+```py
+>>> tf_train_set = model.prepare_tf_dataset(
+...     tokenized_wnut["train"],
+...     shuffle=True,
+...     batch_size=16,
+...     collate_fn=data_collator,
+... )
+
+>>> tf_validation_set = model.prepare_tf_dataset(
+...     tokenized_wnut["validation"],
+...     shuffle=False,
+...     batch_size=16,
+...     collate_fn=data_collator,
+... )
 ```
 
 Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
@@ -255,18 +363,190 @@ Configure the model for training with [`compile`](https://keras.io/api/models/mo
 >>> model.compile(optimizer=optimizer)
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+The last two things to setup before you start training is to compute the seqeval scores from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](./main_classes/keras_callbacks).
+
+Pass your `compute_metrics` function to [`~transformers.KerasMetricCallback`]:
+
+```py
+>>> from transformers.keras_callbacks import KerasMetricCallback
+
+>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
+```
+
+Specify where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
 
 ```py
->>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
+>>> from transformers.keras_callbacks import PushToHubCallback
+
+>>> push_to_hub_callback = PushToHubCallback(
+...     output_dir="my_awesome_wnut_model",
+...     tokenizer=tokenizer,
+... )
 ```
+
+Then bundle your callbacks together:
+
+```py
+>>> callbacks = [metric_callback, push_to_hub_callback]
+```
+
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callbacks to finetune the model:
+
+```py
+>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=callbacks)
+```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
 </tf>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for token classification, take a look at the corresponding
+For a more in-depth example of how to finetune a model for token classification, take a look at the corresponding
 [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)
 or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
 
-</Tip>
\ No newline at end of file
+</Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Grab some text you'd like to run inference on:
+
+```py
+>>> text = "The Golden State Warriors are an American professional basketball team based in San Francisco."
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for NER with your model, and pass your text to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> classifier = pipeline("ner", model="stevhliu/my_awesome_wnut_model")
+>>> classifier(text)
+[{'entity': 'B-location',
+  'score': 0.42658573,
+  'index': 2,
+  'word': 'golden',
+  'start': 4,
+  'end': 10},
+ {'entity': 'I-location',
+  'score': 0.35856336,
+  'index': 3,
+  'word': 'state',
+  'start': 11,
+  'end': 16},
+ {'entity': 'B-group',
+  'score': 0.3064001,
+  'index': 4,
+  'word': 'warriors',
+  'start': 17,
+  'end': 25},
+ {'entity': 'B-location',
+  'score': 0.65523505,
+  'index': 13,
+  'word': 'san',
+  'start': 80,
+  'end': 83},
+ {'entity': 'B-location',
+  'score': 0.4668663,
+  'index': 14,
+  'word': 'francisco',
+  'start': 84,
+  'end': 93}]
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Tokenize the text and return PyTorch tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_wnut_model")
+>>> inputs = tokenizer(text, return_tensors="pt")
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import AutoModelForTokenClassification
+
+>>> model = AutoModelForTokenClassification.from_pretrained("stevhliu/my_awesome_wnut_model")
+>>> with torch.no_grad():
+...     logits = model(**inputs).logits
+```
+
+Get the class with the highest probability, and use the model's `id2label` mapping to convert it to a text label:
+
+```py
+>>> predictions = torch.argmax(logits, dim=2)
+>>> predicted_token_class = [model.config.id2label[t.item()] for t in predictions[0]]
+>>> predicted_token_class
+['O',
+ 'O',
+ 'B-location',
+ 'I-location',
+ 'B-group',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'B-location',
+ 'B-location',
+ 'O',
+ 'O']
+```
+</pt>
+<tf>
+Tokenize the text and return TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_wnut_model")
+>>> inputs = tokenizer(text, return_tensors="tf")
+```
+
+Pass your inputs to the model and return the `logits`:
+
+```py
+>>> from transformers import TFAutoModelForTokenClassification
+
+>>> model = TFAutoModelForTokenClassification.from_pretrained("stevhliu/my_awesome_wnut_model")
+>>> logits = model(**inputs).logits
+```
+
+Get the class with the highest probability, and use the model's `id2label` mapping to convert it to a text label:
+
+```py
+>>> predicted_token_class_ids = tf.math.argmax(logits, axis=-1)
+>>> predicted_token_class = [model.config.id2label[t] for t in predicted_token_class_ids[0].numpy().tolist()]
+>>> predicted_token_class
+['O',
+ 'O',
+ 'B-location',
+ 'I-location',
+ 'B-group',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'O',
+ 'B-location',
+ 'B-location',
+ 'O',
+ 'O']
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/en/tasks/translation.mdx b/docs/source/en/tasks/translation.mdx
index 7439bc7b61c4..318cb2b1a9d2 100644
--- a/docs/source/en/tasks/translation.mdx
+++ b/docs/source/en/tasks/translation.mdx
@@ -14,9 +14,12 @@ specific language governing permissions and limitations under the License.
 
 <Youtube id="1JvfrvZgi6c"/>
 
-Translation converts a sequence of text from one language to another. It is one of several tasks you can formulate as a sequence-to-sequence problem, a powerful framework that extends to vision and audio tasks. 
+Translation converts a sequence of text from one language to another. It is one of several tasks you can formulate as a sequence-to-sequence problem, a powerful framework for returning some output from an input, like translation or summarization. Translation systems are commonly used for translation between different language texts, but it can also be used for speech or some combination in between like text-to-speech or speech-to-text.
 
-This guide will show you how to fine-tune [T5](https://huggingface.co/t5-small) on the English-French subset of the [OPUS Books](https://huggingface.co/datasets/opus_books) dataset to translate English text to French.
+This guide will show you how to:
+
+1. Finetune [T5](https://huggingface.co/t5-small) on the English-French subset of the [OPUS Books](https://huggingface.co/datasets/opus_books) dataset to translate English text to French.
+2. Use your finetuned model for inference.
 
 <Tip>
 
@@ -24,9 +27,23 @@ See the translation [task page](https://huggingface.co/tasks/translation) for mo
 
 </Tip>
 
+Before you begin, make sure you have all the necessary libraries installed:
+
+```bash
+pip install transformers datasets evaluate
+```
+
+We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
+
+```py
+>>> from huggingface_hub import notebook_login
+
+>>> notebook_login()
+```
+
 ## Load OPUS Books dataset
 
-Load the OPUS Books dataset from the 🤗 Datasets library:
+Start by loading the English-French subset of the [OPUS Books](https://huggingface.co/datasets/opus_books) dataset from the 🤗 Datasets library:
 
 ```py
 >>> from datasets import load_dataset
@@ -34,10 +51,10 @@ Load the OPUS Books dataset from the 🤗 Datasets library:
 >>> books = load_dataset("opus_books", "en-fr")
 ```
 
-Split this dataset into a train and test set:
+Split the dataset into a train and test set with the [`~datasets.Dataset.train_test_split`] method:
 
 ```py
-books = books["train"].train_test_split(test_size=0.2)
+>>> books = books["train"].train_test_split(test_size=0.2)
 ```
 
 Then take a look at an example:
@@ -49,13 +66,13 @@ Then take a look at an example:
   'fr': 'Mais ce plateau élevé ne mesurait que quelques toises, et bientôt nous fûmes rentrés dans notre élément.'}}
 ```
 
-The `translation` field is a dictionary containing the English and French translations of the text.
+`translation`: an English and French translation of the text.
 
 ## Preprocess
 
 <Youtube id="XAR8jnZZuUs"/>
 
-Load the T5 tokenizer to process the language pairs:
+The next step is to load a T5 tokenizer to process the English-French language pairs:
 
 ```py
 >>> from transformers import AutoTokenizer
@@ -63,10 +80,10 @@ Load the T5 tokenizer to process the language pairs:
 >>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
 ```
 
-The preprocessing function needs to:
+The preprocessing function you want to create needs to:
 
 1. Prefix the input with a prompt so T5 knows this is a translation task. Some models capable of multiple NLP tasks require prompting for specific tasks.
-2. Tokenize the input (English) and target (French) separately. You can't tokenize French text with a tokenizer pretrained on an English vocabulary. A context manager will help set the tokenizer to French first before tokenizing it.
+2. Tokenize the input (English) and target (French) separately because you can't tokenize French text with a tokenizer pretrained on an English vocabulary.
 3. Truncate sequences to be no longer than the maximum length set by the `max_length` parameter.
 
 ```py
@@ -82,84 +99,113 @@ The preprocessing function needs to:
 ...     return model_inputs
 ```
 
-Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
+To apply the preprocessing function over the entire dataset, use 🤗 Datasets [`~datasets.Dataset.map`] method. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
 
 ```py
 >>> tokenized_books = books.map(preprocess_function, batched=True)
 ```
 
+Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
+
 <frameworkcontent>
 <pt>
-Load T5 with [`AutoModelForSeq2SeqLM`]:
-
 ```py
->>> from transformers import AutoModelForSeq2SeqLM
+>>> from transformers import DataCollatorForSeq2Seq
 
->>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
+>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
 ```
 </pt>
 <tf>
-Load T5 with [`TFAutoModelForSeq2SeqLM`]:
 
 ```py
->>> from transformers import TFAutoModelForSeq2SeqLM
+>>> from transformers import DataCollatorForSeq2Seq
 
->>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
+>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors="tf")
 ```
 </tf>
 </frameworkcontent>
 
-Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dynamically pad* your text and labels to the length of the longest element in its batch, so they are a uniform length. While it is possible to pad your text in the `tokenizer` function by setting `padding=True`, dynamic padding is more efficient.
+## Evaluate
 
-<frameworkcontent>
-<pt>
+Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
 
 ```py
->>> from transformers import DataCollatorForSeq2Seq
+>>> import evaluate
 
->>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
+>>> sacrebleu = evaluate.load("sacrebleu")
 ```
-</pt>
-<tf>
+
+Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the SacreBLEU score:
 
 ```py
->>> from transformers import DataCollatorForSeq2Seq
+>>> import numpy as np
 
->>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors="tf")
+
+>>> def postprocess_text(preds, labels):
+...     preds = [pred.strip() for pred in preds]
+...     labels = [[label.strip()] for label in labels]
+
+...     return preds, labels
+
+
+>>> def compute_metrics(eval_preds):
+...     preds, labels = eval_preds
+...     if isinstance(preds, tuple):
+...         preds = preds[0]
+...     decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
+
+...     labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
+...     decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
+
+...     decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
+
+...     result = metric.compute(predictions=decoded_preds, references=decoded_labels)
+...     result = {"bleu": result["score"]}
+
+...     prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
+...     result["gen_len"] = np.mean(prediction_lens)
+...     result = {k: round(v, 4) for k, v in result.items()}
+...     return result
 ```
-</tf>
-</frameworkcontent>
+
+Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
 
 ## Train
 
 <frameworkcontent>
 <pt>
-
 <Tip>
 
-If you aren't familiar with fine-tuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
+If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!
 
 </Tip>
+You're ready to start training your model now! Load T5 with [`AutoModelForSeq2SeqLM`]:
+
+```py
+>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
+
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
+```
 
 At this point, only three steps remain:
 
-1. Define your training hyperparameters in [`Seq2SeqTrainingArguments`].
-2. Pass the training arguments to [`Seq2SeqTrainer`] along with the model, dataset, tokenizer, and data collator.
-3. Call [`~Trainer.train`] to fine-tune your model.
+1. Define your training hyperparameters in [`Seq2SeqTrainingArguments`]. The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [`Trainer`] will evaluate the SacreBLEU metric and save the training checkpoint.
+2. Pass the training arguments to [`Seq2SeqTrainer`] along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
+3. Call [`~Trainer.train`] to finetune your model.
 
 ```py
->>> from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
-
 >>> training_args = Seq2SeqTrainingArguments(
-...     output_dir="./results",
+...     output_dir="my_awesome_opus_books_model",
 ...     evaluation_strategy="epoch",
 ...     learning_rate=2e-5,
 ...     per_device_train_batch_size=16,
 ...     per_device_eval_batch_size=16,
 ...     weight_decay=0.01,
 ...     save_total_limit=3,
-...     num_train_epochs=1,
+...     num_train_epochs=2,
+...     predict_with_generate=True,
 ...     fp16=True,
+...     push_to_hub=True,
 ... )
 
 >>> trainer = Seq2SeqTrainer(
@@ -169,13 +215,41 @@ At this point, only three steps remain:
 ...     eval_dataset=tokenized_books["test"],
 ...     tokenizer=tokenizer,
 ...     data_collator=data_collator,
+...     compute_metrics=compute_metrics,
 ... )
 
 >>> trainer.train()
+````
+
+Once training is completed, share your model to the Hub with the [`~transformers.Trainer.push_to_hub`] method so everyone can use your model:
+
+```py
+>>> trainer.push_to_hub()
 ```
 </pt>
 <tf>
-To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~TFPreTrainedModel.prepare_tf_dataset`].
+<Tip>
+
+If you aren't familiar with finetuning a model with Keras, take a look at the basic tutorial [here](../training#train-a-tensorflow-model-with-keras)!
+
+</Tip>
+To finetune a model in TensorFlow, start by setting up an optimizer function, learning rate schedule, and some training hyperparameters:
+
+```py
+>>> from transformers import AdamWeightDecay
+
+>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+```
+
+Then you can load T5 with [`TFAutoModelForSeq2SeqLM`]:
+
+```py
+>>> from transformers import TFAutoModelForSeq2SeqLM
+
+>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
+```
+
+Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
 
 ```py
 >>> tf_train_set = model.prepare_tf_dataset(
@@ -193,38 +267,132 @@ To fine-tune a model in TensorFlow, start by converting your datasets to the `tf
 ... )
 ```
 
-<Tip>
+Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
 
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+```py
+>>> import tensorflow as tf
 
-</Tip>
+>>> model.compile(optimizer=optimizer)
+```
+
+The last two things to setup before you start training is to compute the SacreBLEU metric from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](./main_classes/keras_callbacks).
 
-Set up an optimizer function, learning rate schedule, and some training hyperparameters:
+Pass your `compute_metrics` function to [`~transformers.KerasMetricCallback`]:
 
 ```py
->>> from transformers import create_optimizer, AdamWeightDecay
+>>> from transformers.keras_callbacks import KerasMetricCallback
 
->>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
+>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
 ```
 
-Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
+Specify where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]:
 
 ```py
->>> model.compile(optimizer=optimizer)
+>>> from transformers.keras_callbacks import PushToHubCallback
+
+>>> push_to_hub_callback = PushToHubCallback(
+...     output_dir="my_awesome_opus_books_model",
+...     tokenizer=tokenizer,
+... )
+```
+
+Then bundle your callbacks together:
+
+```py
+>>> callbacks = [metric_callback, push_to_hub_callback]
 ```
 
-Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fine-tune the model:
+Finally, you're ready to start training your model! Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) with your training and validation datasets, the number of epochs, and your callbacks to finetune the model:
 
 ```py
->>> model.fit(tf_train_set, validation_data=tf_test_set, epochs=3)
+>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)
 ```
+
+Once training is completed, your model is automatically uploaded to the Hub so everyone can use it!
 </tf>
 </frameworkcontent>
 
 <Tip>
 
-For a more in-depth example of how to fine-tune a model for translation, take a look at the corresponding
+For a more in-depth example of how to finetune a model for translation, take a look at the corresponding
 [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb)
 or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).
 
 </Tip>
+
+## Inference
+
+Great, now that you've finetuned a model, you can use it for inference!
+
+Come up with some text you'd like to translate to another language. For T5, you need to prefix your input depending on the task you're working on. For translation from English to French, you should prefix your input as shown below:
+
+```py
+>>> text = "translate English to French: Legumes share resources with nitrogen-fixing bacteria."
+```
+
+The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for translation with your model, and pass your text to it:
+
+```py
+>>> from transformers import pipeline
+
+>>> translator = pipeline("translation", model="my_awesome_opus_books_model")
+>>> translator(text)
+[{'translation_text': 'Legumes partagent des ressources avec des bactéries azotantes.'}]
+```
+
+You can also manually replicate the results of the `pipeline` if you'd like:
+
+<frameworkcontent>
+<pt>
+Tokenize the text and return the `input_ids` as PyTorch tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_opus_books_model")
+>>> inputs = tokenizer(text, return_tensors="pt").input_ids
+```
+
+Use the [`~transformers.generation_utils.GenerationMixin.generate`] method to create the translation. For more details about the different text generation strategies and parameters for controlling generation, check out the [Text Generation](./main_classes/text_generation) API.
+
+```py
+>>> from transformers import AutoModelForSeq2SeqLM
+
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("my_awesome_opus_books_model")
+>>> outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
+```
+
+Decode the generated token ids back into text:
+
+```py
+>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
+'Les lignées partagent des ressources avec des bactéries enfixant l'azote.'
+```
+</pt>
+<tf>
+Tokenize the text and return the `input_ids` as TensorFlow tensors:
+
+```py
+>>> from transformers import AutoTokenizer
+
+>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_opus_books_model")
+>>> inputs = tokenizer(text, return_tensors="tf").input_ids
+```
+
+Use the [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] method to create the translation. For more details about the different text generation strategies and parameters for controlling generation, check out the [Text Generation](./main_classes/text_generation) API.
+
+```py
+>>> from transformers import TFAutoModelForSeq2SeqLM
+
+>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("my_awesome_opus_books_model")
+>>> outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
+```
+
+Decode the generated token ids back into text:
+
+```py
+>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
+'Les lugumes partagent les ressources avec des bactéries fixatrices d'azote.'
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file