Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 0 additions & 1 deletion docs/source/en/model_doc/biogpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,6 @@ print(output)

- Pad inputs on the right because BioGPT uses absolute position embeddings.
- BioGPT can reuse previously computed key-value attention pairs. Access this feature with the [past_key_values](https://huggingface.co/docs/transformers/main/en/model_doc/biogpt#transformers.BioGptModel.forward.past_key_values) parameter in [`BioGPTModel.forward`].
- The `head_mask` argument is ignored when using an attention implementation other than "eager". If you want to use `head_mask`, make sure `attn_implementation="eager"`).

```py
from transformers import AutoModelForCausalLM
Expand Down
1 change: 0 additions & 1 deletion docs/source/en/model_doc/data2vec.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ The original code for vision can be found [here](https://github.com/facebookrese
- For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction
- For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization.
- For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction.
- The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

### Using Scaled Dot Product Attention (SDPA)

Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/gpt_bigcode.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,6 @@ The main differences compared to GPT2.

You can read more about the optimizations in the [original pull request](https://github.com/huggingface/transformers/pull/22575)

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Combining Starcoder and Flash Attention 2

First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature.
Expand Down
5 changes: 0 additions & 5 deletions docs/source/en/model_doc/hubert.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,6 @@ print(transcription[0])
## Notes

- HuBERT models expect raw audio input as a 1D float array sampled at 16kHz.
- If you want to use a `head_mask`, use the model with `attn_implementation="eager"`.

```python
model = HubertModel.from_pretrained("facebook/hubert-base-ls960", attn_implementation="eager")
```

## HubertConfig

Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/m2m_100.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,6 @@ multilingual it expects the sequences in a certain format: A special language id
source and target text. The source text format is `[lang_code] X [eos]`, where `lang_code` is source language
id for source text and target language id for target text, with `X` being the source or target text.

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

The [`M2M100Tokenizer`] depends on `sentencepiece` so be sure to install it before running the
examples. To install `sentencepiece` run `pip install sentencepiece`.

Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/mbart.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,6 @@ You can find all the original mBART checkpoints under the [AI at Meta](https://h
> [!TIP]
> Click on the mBART models in the right sidebar for more examples of applying mBART to different language tasks.

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

The example below demonstrates how to translate text with [`Pipeline`] or the [`AutoModel`] class.

<hfoptions id="usage">
Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/musicgen.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,6 @@ python src/transformers/models/musicgen/convert_musicgen_transformers.py \
--checkpoint small --pytorch_dump_folder /output/path --safe_serialization
```

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Generation

MusicGen is compatible with two generation modes: greedy and sampling. In practice, sampling leads to significantly
Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/musicgen_melody.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,6 @@ There are two key differences with MusicGen:
1. The audio prompt is used here as a conditional signal for the generated audio sample, whereas it's used for audio continuation in [MusicGen](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen).
2. Conditional text and audio signals are concatenated to the decoder's hidden states instead of being used as a cross-attention signal, as in MusicGen.

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Generation

MusicGen Melody is compatible with two generation modes: greedy and sampling. In practice, sampling leads to significantly better results than greedy, thus we encourage sampling mode to be used where possible. Sampling is enabled by default, and can be explicitly specified by setting `do_sample=True` in the call to [`MusicgenMelodyForConditionalGeneration.generate`], or by overriding the model's generation config (see below).
Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/model_doc/opt.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,6 @@ tokenizer.batch_decode(generated_ids)[0]

- OPT adds an `EOS` token `</s>` to the beginning of every prompt.

- The `head_mask` argument is ignored if the attention implementation isn't `"eager"`. Set `attn_implementation="eager"` to enable the `head_mask`.

## Resources

- Refer to this [notebook](https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing) for an example of fine-tuning OPT with PEFT, bitsandbytes, and Transformers.
Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/qwen2_audio.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,6 @@ The abstract from the paper is the following:

`Qwen2-Audio-7B` and `Qwen2-Audio-7B-Instruct` can be found on the [Huggingface Hub](https://huggingface.co/Qwen)

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

### Inference

```python
Expand Down
3 changes: 0 additions & 3 deletions docs/source/en/model_doc/sew.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,6 @@ This model was contributed by [anton-l](https://huggingface.co/anton-l).
- SEWForCTC is fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded using
[`Wav2Vec2CTCTokenizer`].

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Resources

- [Audio classification task guide](../tasks/audio_classification)
Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/model_doc/unispeech-sat.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ found [here](https://github.com/microsoft/UniSpeech/tree/main/UniSpeech-SAT).
decoded using [`Wav2Vec2CTCTokenizer`].
- UniSpeechSat performs especially well on speaker verification, speaker identification, and speaker diarization tasks.

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Resources

Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/model_doc/unispeech.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,6 @@ found [here](https://github.com/microsoft/UniSpeech/tree/main/UniSpeech).
- UniSpeech model can be fine-tuned using connectionist temporal classification (CTC) so the model output has to be
decoded using [`Wav2Vec2CTCTokenizer`].

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Resources

Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/model_doc/wav2vec2.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ Note: Meta (FAIR) released a new version of [Wav2Vec2-BERT 2.0](https://huggingf
- Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded
using [`Wav2Vec2CTCTokenizer`].

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

## Using Flash Attention 2

Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/model_doc/whisper.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@ rendered properly in your Markdown viewer.

You can find all the original Whisper checkpoints under the [Whisper](https://huggingface.co/collections/openai/whisper-release-6501bba2cf999715fd953013) collection.

> [!NOTE]
> The `head_mask` argument is ignored when using all attention implementation other than "eager". If you have a `head_mask` and want it to have effect, load the model with `XXXModel.from_pretrained(model_id, attn_implementation="eager")`

> [!TIP]
> Click on the Whisper models in the right sidebar for more examples of how to apply Whisper to different audio tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ There are three supported implementations available.
- [xFormers](https://github.com/facebookresearch/xformers) or Memory-Efficient Attention is able to support models with the fp32 torch type.
- C++ implementation of scaled dot product attention

SDPA is used by default for PyTorch v2.1.1. and greater when an implementation is available. You could explicitly enable SDPA by setting `attn_implementation="sdpa"` in [`~PreTrainedModel.from_pretrained`] though. Certain attention parameters, such as `head_mask` and `output_attentions=True`, are unsupported and returns a warning that Transformers will fall back to the (slower) eager implementation.
SDPA is used by default for PyTorch v2.1.1. and greater when an implementation is available. You could explicitly enable SDPA by setting `attn_implementation="sdpa"` in [`~PreTrainedModel.from_pretrained`] though. Certain attention parameters, such as `output_attentions=True`, are unsupported and returns a warning that Transformers will fall back to the (slower) eager implementation.

Refer to the [AttentionInterface](./attention_interface) guide to learn how to change the attention implementation after loading a model.

Expand Down
Loading