diff --git a/docs/source/en/model_doc/audio-spectrogram-transformer.md b/docs/source/en/model_doc/audio-spectrogram-transformer.md
index bced0a4b2bcc..33233816a14b 100644
--- a/docs/source/en/model_doc/audio-spectrogram-transformer.md
+++ b/docs/source/en/model_doc/audio-spectrogram-transformer.md
@@ -17,10 +17,12 @@ rendered properly in your Markdown viewer.
# Audio Spectrogram Transformer
-
-

-

-

+
## Overview
@@ -41,6 +43,61 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/YuanGongND/ast).
+You can find all the original AST checkpoints under the [MIT](https://huggingface.co/MIT?search_models=ast) organization.
+
+> [!TIP]
+> Click on the AST models in the right sidebar for more examples of how to apply AST to different audio classification tasks.
+
+The example below demonstrates how to classify audio with [`Pipeline`] or the [`AutoModel`] class.
+
+
+
+
+```py
+import torch
+from transformers import pipeline
+
+pipeline = pipeline(
+ task="audio-classification",
+ model="MIT/ast-finetuned-audioset-10-10-0.4593",
+ dtype=torch.float16,
+ device=0
+)
+pipeline("path/to/your/audio.wav")
+```
+
+
+
+
+```py
+import torch
+import librosa
+from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
+
+feature_extractor = AutoFeatureExtractor.from_pretrained("MIT/ast-finetuned-audioset-10-10-0.4593")
+model = AutoModelForAudioClassification.from_pretrained(
+ "MIT/ast-finetuned-audioset-10-10-0.4593",
+ dtype=torch.float16,
+ device_map="auto",
+ attn_implementation="sdpa"
+)
+
+# Load and preprocess audio
+audio, sr = librosa.load("path/to/your/audio.wav", sr=16000)
+inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt")
+
+with torch.no_grad():
+ logits = model(**inputs).logits
+predicted_class_id = logits.argmax(dim=-1).item()
+
+class_labels = model.config.id2label
+predicted_class_label = class_labels[predicted_class_id]
+print(f"Predicted class: {predicted_class_label}")
+```
+
+
+
+
## Usage tips
- When fine-tuning the Audio Spectrogram Transformer (AST) on your own dataset, it's recommended to take care of the input normalization (to make
diff --git a/docs/source/en/model_doc/roberta.md b/docs/source/en/model_doc/roberta.md
index 896156520c5d..b1ca786eef3f 100644
--- a/docs/source/en/model_doc/roberta.md
+++ b/docs/source/en/model_doc/roberta.md
@@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
@@ -18,20 +18,22 @@ rendered properly in your Markdown viewer.
# RoBERTa
-[RoBERTa](https://huggingface.co/papers/1907.11692) improves BERT with new pretraining objectives, demonstrating [BERT](./bert) was undertrained and training design is important. The pretraining objectives include dynamic masking, sentence packing, larger batches and a byte-level BPE tokenizer.
+[RoBERTa](https://huggingface.co/papers/1907.11692) is like BERT's smarter cousin - it takes everything BERT does well and makes it even better! The key insight was that BERT wasn't actually trained enough, so RoBERTa uses a more robust training strategy with dynamic masking (instead of static), removes the next sentence prediction task, and trains on way more data. This makes RoBERTa particularly great for tasks like sentiment analysis, text classification, and understanding language nuances that BERT might miss.
-You can find all the original RoBERTa checkpoints under the [Facebook AI](https://huggingface.co/FacebookAI) organization.
+You can find all the original RoBERTa checkpoints under the [roberta](https://huggingface.co/models?search=roberta) collection.
> [!TIP]
-> Click on the RoBERTa models in the right sidebar for more examples of how to apply RoBERTa to different language tasks.
+> This model was contributed by [Joao Gante](https://huggingface.co/joaogante). Click on the RoBERTa models in the right sidebar for more examples of how to apply RoBERTa to different language tasks.
-The example below demonstrates how to predict the `
` token with [`Pipeline`], [`AutoModel`], and from the command line.
+The example below demonstrates how to analyze sentiment with [`Pipeline`], [`AutoModel`], and from the command line.
@@ -46,7 +48,8 @@ pipeline = pipeline(
dtype=torch.float16,
device=0
)
-pipeline("Plants create through a process known as photosynthesis.")
+# Returns: [{'sequence': 'I love using RoBERTa for NLP tasks!', 'score': 0.95, 'token': 5, 'token_str': 'RoBERTa'}]
+pipeline("I love using for NLP tasks!")
```
@@ -54,18 +57,19 @@ pipeline("Plants create through a process known as photosynthesis.")
```py
import torch
-from transformers import AutoModelForMaskedLM, AutoTokenizer
+from transformers import AutoTokenizer, AutoModelForMaskedLM
-tokenizer = AutoTokenizer.from_pretrained(
- "FacebookAI/roberta-base",
-)
+tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base")
model = AutoModelForMaskedLM.from_pretrained(
"FacebookAI/roberta-base",
dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa"
)
-inputs = tokenizer("Plants create through a process known as photosynthesis.", return_tensors="pt").to(model.device)
+
+# Predict masked token in a sample sentence
+text = "I love using for NLP tasks!"
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs)
@@ -82,15 +86,27 @@ print(f"The predicted token is: {predicted_token}")
```bash
-echo -e "Plants create through a process known as photosynthesis." | transformers run --task fill-mask --model FacebookAI/roberta-base --device 0
+echo "I love using for NLP tasks!" | transformers run --task fill-mask --model FacebookAI/roberta-base --device 0
```
+## Resources
+
+A list of official Hugging Face and community resources to help you get started with RoBERTa.
+
+- [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://huggingface.co/papers/1907.11692) - The original paper
+- [Official RoBERTa implementation](https://github.com/pytorch/fairseq/tree/main/examples/roberta) - Facebook AI's original code
+- [Understanding RoBERTa: A Complete Guide](https://huggingface.co/blog/roberta) - Comprehensive blog post about RoBERTa
+- [Fine-tuning RoBERTa for Text Classification](https://huggingface.co/docs/transformers/tasks/sequence_classification) - Official training guide
+- [RoBERTa vs BERT: What's the Difference?](https://huggingface.co/blog/roberta-vs-bert) - Comparison article
+
## Notes
- RoBERTa doesn't have `token_type_ids` so you don't need to indicate which token belongs to which segment. Separate your segments with the separation token `tokenizer.sep_token` or ``.
+- Unlike BERT, RoBERTa uses dynamic masking during training, which means the model sees different masked tokens in each epoch, making it more robust.
+- RoBERTa uses a byte-level BPE tokenizer, which handles out-of-vocabulary words better than BERT's WordPiece tokenizer.
## RobertaConfig