huggingface · regisss · Sep 9, 2025 · Sep 4, 2025
@@ -16,7 +16,7 @@ limitations under the License.
 
 # GaudiTrainer
 
-The [`GaudiTrainer`](https://huggingface.co/docs/optimum/habana/package_reference/trainer#optimum.habana.GaudiTrainer) class provides an extended API for the feature-complete [Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer). It is used in all the [example scripts](https://github.com/huggingface/optimum-habana/tree/main/examples).
+The [`GaudiTrainer`](https://huggingface.co/docs/optimum/habana/package_reference/trainer#optimum.habana.GaudiTrainer) class provides an extended API for the feature-complete [Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer). It is used in all the [example scripts](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples).
 
 Before instantiating your [`GaudiTrainer`](https://huggingface.co/docs/optimum/habana/package_reference/trainer#optimum.habana.GaudiTrainer), create a [`GaudiTrainingArguments`] object to access all the points of customization during training.
 

@@ -73,9 +73,9 @@ git clone -b v1.19.0 https://github.com/huggingface/optimum-habana
 pip install ./optimum-habana
 ```
 
-All available examples are under [optimum-habana/examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
+All available examples are under [optimum-habana/examples](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples).
 
-Here is [text-generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) example,
+Here is [text-generation](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-generation) example,
 to run Llama-2 7B text generation example on Gaudi, complete the prerequisite setup:
 ```bash
 cd ~/optimum-habana/examples/text-generation
@@ -136,7 +136,7 @@ run_generation.py \
 🤗 Optimum for Intel Gaudi contains a number of examples demonstrating single and multi Gaudi device training/fine-tuning.
 
 For example, a number of language models can be trained with the scripts provided
-[language modeling examples section](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling).
+[language modeling examples section](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/language-modeling).
 
 As an illustration, let us run GPT-2 single and multi card training examples on Gaudi.
 
@@ -240,7 +240,7 @@ outputs = pipeline(
 ```
 
 In addition, sample scripts for fine-tuning diffusion models are given in
-[Stable Diffusion training section](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion/training).
+[Stable Diffusion training section](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/stable-diffusion/training).
 
 A more comprehensive list of examples in Optimum for Intel Gaudi is given next.
 
@@ -253,37 +253,37 @@ to see more options for running inference.
 Here are examples for various modalities and tasks that can be used out of the box:
 
 - **Text**
-  - [language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)
-  - [multi node training](https://github.com/huggingface/optimum-habana/tree/main/examples/multi-node-training)
-  - [protein folding](https://github.com/huggingface/optimum-habana/tree/main/examples/protein-folding)
-  - [question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering)
-  - [sentence transformers training](https://github.com/huggingface/optimum-habana/tree/main/examples/sentence-transformers-training)
-  - [summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)
-  - [table detection](https://github.com/huggingface/optimum-habana/tree/main/examples/table-detection)
-  - [text classification](https://github.com/huggingface/optimum-habana/tree/main/examples/text-classification)
-  - [text feature extraction](https://github.com/huggingface/optimum-habana/tree/main/examples/text-feature-extraction)
-  - [text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
-  - [translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)
-  - [trl](https://github.com/huggingface/optimum-habana/tree/main/examples/trl)
+  - [language modeling](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/language-modeling)
+  - [multi node training](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/multi-node-training)
+  - [protein folding](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/protein-folding)
+  - [question answering](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/question-answering)
+  - [sentence transformers training](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/sentence-transformers-training)
+  - [summarization](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/summarization)
+  - [table detection](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/table-detection)
+  - [text classification](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-classification)
+  - [text feature extraction](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-feature-extraction)
+  - [text generation](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-generation)
+  - [translation](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/translation)
+  - [trl](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/trl)
 
 - **Audio**
-  - [audio classification](https://github.com/huggingface/optimum-habana/tree/main/examples/audio-classification)
-  - [speech recognition](https://github.com/huggingface/optimum-habana/tree/main/examples/speech-recognition)
-  - [text to speech](https://github.com/huggingface/optimum-habana/tree/main/examples/text-to-speech)
+  - [audio classification](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/audio-classification)
+  - [speech recognition](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/speech-recognition)
+  - [text to speech](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-to-speech)
 
 - **Images**
-  - [object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection)
-  - [object segementation](https://github.com/huggingface/optimum-habana/tree/main/examples/object-segementation)
-  - [image classification](https://github.com/huggingface/optimum-habana/tree/main/examples/image-classification)
-  - [image to text](https://github.com/huggingface/optimum-habana/tree/main/examples/image-to-text)
-  - [contrastive image text](https://github.com/huggingface/optimum-habana/tree/main/examples/contrastive-image-text)
-  - [stable diffusion](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)
-  - [visual question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/visual-question-answering)
-  - [zero-shot object detection](https://github.com/huggingface/optimum-habana/tree/main/examples/zero-shot-object-detection)
+  - [object detection](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/object-detection)
+  - [object segementation](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/object-segementation)
+  - [image classification](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/image-classification)
+  - [image to text](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/image-to-text)
+  - [contrastive image text](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/contrastive-image-text)
+  - [stable diffusion](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/stable-diffusion)
+  - [visual question answering](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/visual-question-answering)
+  - [zero-shot object detection](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/zero-shot-object-detection)
 
 - **Video**
-  - [stable-video-diffusion](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)
-  - [video-classification](https://github.com/huggingface/optimum-habana/tree/main/examples/video-classification)
+  - [stable-video-diffusion](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/stable-diffusion)
+  - [video-classification](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/video-classification)
 
 To learn more about how to adapt 🤗 Transformers or Diffusers scripts for Intel Gaudi, check out
 [Script Adaptation](https://huggingface.co/docs/optimum/habana/usage_guides/script_adaptation) guide.
@@ -18,10 +18,10 @@ limitations under the License.
 
 As models get bigger, parallelism has emerged as a strategy for training larger models on limited hardware and accelerating training speed by several orders of magnitude.
 
-All the [PyTorch examples](https://github.com/huggingface/optimum-habana/tree/main/examples) and the `GaudiTrainer` script work out of the box with distributed training.
+All the [PyTorch examples](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples) and the `GaudiTrainer` script work out of the box with distributed training.
 There are two ways of launching them:
 
-1. Using the [gaudi_spawn.py](https://github.com/huggingface/optimum-habana/blob/main/examples/gaudi_spawn.py) script:
+1. Using the [gaudi_spawn.py](https://github.com/huggingface/optimum-habana/blob/v1.20-release/examples/gaudi_spawn.py) script:
 
    - Use MPI for distributed training:
 
@@ -32,7 +32,7 @@ There are two ways of launching them:
      ```
 
      where `--argX` is an argument of the script to run in a distributed way.
-     Examples are given for question answering [here](https://github.com/huggingface/optimum-habana/blob/main/examples/question-answering/README.md#multi-card-training) and text classification [here](https://github.com/huggingface/optimum-habana/tree/main/examples/text-classification#multi-card-training).
+     Examples are given for question answering [here](https://github.com/huggingface/optimum-habana/blob/v1.20-release/examples/question-answering/README.md#multi-card-training) and text classification [here](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-classification#multi-card-training).
 
    - Use DeepSpeed for distributed training:
 
@@ -43,7 +43,7 @@ There are two ways of launching them:
      ```
 
      where `--argX` is an argument of the script to run in a distributed way.
-     Examples are given for question answering [here](https://github.com/huggingface/optimum-habana/blob/main/examples/question-answering/README.md#using-deepspeed) and text classification [here](https://github.com/huggingface/optimum-habana/tree/main/examples/text-classification#using-deepspeed).
+     Examples are given for question answering [here](https://github.com/huggingface/optimum-habana/blob/v1.20-release/examples/question-answering/README.md#using-deepspeed) and text classification [here](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-classification#using-deepspeed).
 
 2. Using the `DistributedRunner` directly in code:
 

@@ -22,7 +22,7 @@ An effective quick start would be to review the inference examples provided in t
 [here].
 
 You can also explore the 
-[examples in the Optimum for Intel Gaudi repository]((https://github.com/huggingface/optimum-habana/tree/main/examples)).
+[examples in the Optimum for Intel Gaudi repository]((https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples)).
 While the examples folder includes both training and inference, the inference-specific content
 provides valuable guidance for optimizing and running workloads on Intel Gaudi accelerators.
 
@@ -64,7 +64,7 @@ The variable `my_args` should contain some inference-specific arguments, you can
 
 ## In our Examples
 
-All [our examples](https://github.com/huggingface/optimum-habana/tree/main/examples) contain instructions for running inference with a given model on a given dataset.
+All [our examples](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples) contain instructions for running inference with a given model on a given dataset.
 The reasoning is the same for every example: run the example script with `--do_eval` and `--per_device_eval_batch_size` and without `--do_train`.
 A simple template is the following:
 ```bash

@@ -60,7 +60,7 @@ Generated images can be returned as either PIL images or NumPy arrays, depending
 
 <Tip>
 
-Check out the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official Github repository.
+Check out the [example](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/stable-diffusion) provided in the official Github repository.
 
 </Tip>
 
@@ -179,4 +179,4 @@ pipeline = GaudiStableDiffusionPipeline.from_pretrained(
 
 [Textual Inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like Stable Diffusion on your own images using just 3-5 examples.
 
-You can find [here](https://github.com/huggingface/optimum-habana/blob/main/examples/stable-diffusion/textual_inversion.py) an example script that implements this training method.
+You can find [here](https://github.com/huggingface/optimum-habana/blob/v1.20-release/examples/stable-diffusion/textual_inversion.py) an example script that implements this training method.
@@ -105,7 +105,7 @@ This argument both indicates that DeepSpeed should be used and points to your De
 
 Finally, there are two possible ways to launch your script:
 
-1. Using the [gaudi_spawn.py](https://github.com/huggingface/optimum-habana/blob/main/examples/gaudi_spawn.py) script:
+1. Using the [gaudi_spawn.py](https://github.com/huggingface/optimum-habana/blob/v1.20-release/examples/gaudi_spawn.py) script:
 
 ```bash
 python gaudi_spawn.py \

@@ -46,7 +46,7 @@ Once your Intel Gaudi instances are ready, follow the steps for [setting up a mu
 
 Finally, there are two possible ways to run your training script on several nodes:
 
-1. With the [`gaudi_spawn.py`](https://github.com/huggingface/optimum-habana/blob/main/examples/gaudi_spawn.py) script, you can run the following command:
+1. With the [`gaudi_spawn.py`](https://github.com/huggingface/optimum-habana/blob/v1.20-release/examples/gaudi_spawn.py) script, you can run the following command:
 ```bash
 python gaudi_spawn.py \
     --hostfile path_to_my_hostfile --use_deepspeed \
@@ -79,7 +79,7 @@ env_variable_2_name=value
 
 ## Recommendations
 
-- It is strongly recommended to use gradient checkpointing for multi-node runs to get the highest speedups. You can enable it with `--gradient_checkpointing` in [these examples](https://github.com/huggingface/optimum-habana/tree/main/examples) or with `gradient_checkpointing=True` in your `GaudiTrainingArguments`.
+- It is strongly recommended to use gradient checkpointing for multi-node runs to get the highest speedups. You can enable it with `--gradient_checkpointing` in [these examples](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples) or with `gradient_checkpointing=True` in your `GaudiTrainingArguments`.
 - Larger batch sizes should lead to higher speedups.
 - Multi-node inference is not recommended and can provide inconsistent results.
 - On Intel Tiber AI Cloud instances, run your Docker containers with the `--privileged` flag so that EFA devices are visible.
@@ -88,7 +88,7 @@ env_variable_2_name=value
 ## Example
 
 In this example, we fine-tune a pre-trained GPT2-XL model on the [WikiText dataset](https://huggingface.co/datasets/wikitext).
-We are going to use the [causal language modeling example which is given in the Github repository](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#gpt-2gpt-and-causal-language-modeling).
+We are going to use the [causal language modeling example which is given in the Github repository](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/language-modeling#gpt-2gpt-and-causal-language-modeling).
 
 The first step consists in training the model on several nodes with this command:
 ```bash

@@ -17,7 +17,7 @@ limitations under the License.
 # Quantization
 
 Intel® Gaudi® offers several possibilities to make inference faster. For examples of FP8 and UINT4 for Inference, see the
-[text-generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation) example.
+[text-generation](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/text-generation) example.
 
 This guide provides the steps required to enable FP8 and UINT4 precision on your Intel® Gaudi® AI
 accelerator using the Intel® Neural Compressor (INC) package.

@@ -17,7 +17,7 @@ limitations under the License.
 
 This folder contains actively maintained examples of use of 🤗 Optimum Habana for various ML tasks.
 
-Other [examples](https://github.com/huggingface/transformers/tree/main/examples/pytorch) from the 🤗 Transformers library can be adapted the same way to enable deployment on Gaudi processors. This simply consists in:
+Other [examples](https://github.com/huggingface/transformers/tree/v1.20-release/examples/pytorch) from the 🤗 Transformers library can be adapted the same way to enable deployment on Gaudi processors. This simply consists in:
 - replacing the `Trainer` from 🤗 Transformers with the `GaudiTrainer` from 🤗 Optimum Habana,
 - replacing the `TrainingArguments` from 🤗 Transformers with the `GaudiTrainingArguments` from 🤗 Optimum Habana.
 
@@ -70,7 +70,7 @@ ip_2 slots=8
 ip_n slots=8
 ```
 
-You can find more information about multi-node training in the [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/multi_node_training) and in the [`multi-node-training`](https://github.com/huggingface/optimum-habana/tree/main/examples/multi-node-training) folder where a Dockerfile is provided to easily set up your environment.
+You can find more information about multi-node training in the [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/multi_node_training) and in the [`multi-node-training`](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples/multi-node-training) folder where a Dockerfile is provided to easily set up your environment.
 
 
 ## Loading from a Tensorflow/Flax checkpoint file instead of a PyTorch model

@@ -164,7 +164,7 @@ python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_clip.py \
 
 ### DeepSpeed
 
-You can check the [DeepSpeed](https://github.com/huggingface/optimum-habana/tree/main/examples#deepspeed) section in Optimum Habana examples for how to run DeepSpeed.
+You can check the [DeepSpeed](https://github.com/huggingface/optimum-habana/tree/v1.20-release/examples#deepspeed) section in Optimum Habana examples for how to run DeepSpeed.
 You can also look at the [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/deepspeed) for more information about how to use DeepSpeed in Optimum Habana.