Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/source/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ To be able to run gated models like [Llama-2 7B](https://huggingface.co/meta-lla

Run single Gaudi device (HPU) inference with Llama-2 7B model:
```bash
python run_generation.py \
PT_HPU_LAZY_MODE=1 python run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--use_hpu_graphs \
--use_kv_cache \
Expand Down Expand Up @@ -121,7 +121,7 @@ pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.20.0
With DeepSpeed successfully installed we can now run a distributed GPT-2 inference on an 8 HPU system as follows:
```bash
number_of_devices=8 \
python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--use_hpu_graphs \
Expand All @@ -148,7 +148,7 @@ pip install -r requirements.txt

To train GPT-2 model on a single card, use:
```bash
python run_clm.py \
PT_HPU_LAZY_MODE=1 python run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
Expand All @@ -167,7 +167,7 @@ python run_clm.py \
To train GPT-2 model using multi-card Gaudi system:
```bash
number_of_devices=8 \
python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
Expand Down Expand Up @@ -200,7 +200,7 @@ pip install -r requirements.txt

Here is an example of running Stable Diffusion text to image inference on Gaudi:
```bash
python text_to_image_generation.py \
PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
--prompts "An image of a squirrel in Picasso style" \
--num_images_per_prompt 10 \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/inference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ All [our examples](https://github.com/huggingface/optimum-habana/tree/main/examp
The reasoning is the same for every example: run the example script with `--do_eval` and `--per_device_eval_batch_size` and without `--do_train`.
A simple template is the following:
```bash
python path_to_the_example_script \
PT_HPU_LAZY_MODE=1 python path_to_the_example_script \
--model_name_or_path my_model_name \
--gaudi_config_name my_gaudi_config_name \
--dataset_name my_dataset_name \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/stable_diffusion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ This will also save memory.
You just need to pass `torch_dtype=torch.bfloat16` to `from_pretrained` when instantiating your pipeline.
Here is how to do it:

```py
```python
import torch

pipeline = GaudiStableDiffusionPipeline.from_pretrained(
Expand Down
4 changes: 2 additions & 2 deletions docs/source/usage_guides/multi_node_training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ We are going to use the [causal language modeling example which is given in the

The first step consists in training the model on several nodes with this command:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--hostfile path_to_hostfile --use_deepspeed run_clm.py \
--model_name_or_path gpt2-xl \
--gaudi_config_name Habana/gpt2 \
Expand All @@ -115,7 +115,7 @@ Evaluation is not performed in the same command because we do not recommend perf
Once the model is trained, we can evaluate it with the following command.
The argument `--model_name_or_path` should be equal to the argument `--output_dir` of the previous command.
```bash
python run_clm.py \
PT_HPU_LAZY_MODE=1 python run_clm.py \
--model_name_or_path /tmp/gpt2_xl_multi_node \
--gaudi_config_name Habana/gpt2 \
--dataset_name wikitext \
Expand Down
6 changes: 3 additions & 3 deletions examples/audio-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ pip install -r requirements.txt
The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset on a single HPU.

```bash
python run_audio_classification.py \
PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
--model_name_or_path facebook/wav2vec2-base \
--dataset_name superb \
--dataset_config_name ks \
Expand Down Expand Up @@ -75,7 +75,7 @@ On a single HPU, this script should run in ~13 minutes and yield an accuracy of
The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) for 🌎 **Language Identification** on the [CommonLanguage dataset](https://huggingface.co/datasets/anton-l/common_language) on 8 HPUs.

```bash
PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py \
python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_audio_classification.py \
--model_name_or_path facebook/wav2vec2-base \
--dataset_name common_language \
Expand Down Expand Up @@ -118,7 +118,7 @@ To run only inference, you can start from the commands above and you just have t

For instance, you can run inference with Wav2Vec2 on the Keyword Spotting subset on 1 Gaudi card with the following command:
```bash
python run_audio_classification.py \
PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
--model_name_or_path facebook/wav2vec2-base \
--dataset_name superb \
--dataset_config_name ks \
Expand Down
12 changes: 6 additions & 6 deletions examples/contrastive-image-text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ cd ..

Having downloaded COCO dataset manually you should be able to load with the `ydshieh/coco_dataset_script` dataset loading script:

```py
```python
import os
import datasets

Expand All @@ -65,7 +65,7 @@ Next, we create a [VisionTextDualEncoderModel](https://huggingface.co/docs/trans
The `VisionTextDualEncoderModel` class lets you load any vision and text encoder model to create a dual encoder.
Here is an example of how to load the model using pre-trained vision and text models.

```python3
```python
from transformers import (
VisionTextDualEncoderModel,
VisionTextDualEncoderProcessor,
Expand Down Expand Up @@ -96,7 +96,7 @@ Finally, we can run the example script to train the model.
Run the following command for single-device training:

```bash
PT_HPU_LAZY_MODE=0 python run_clip.py \
python run_clip.py \
--output_dir ./clip-roberta-finetuned \
--model_name_or_path ./clip-roberta \
--data_dir $PWD/data \
Expand Down Expand Up @@ -128,7 +128,7 @@ PT_HPU_LAZY_MODE=0 python run_clip.py \
Run the following command for distributed training:

```bash
PT_HPU_LAZY_MODE=0 PT_ENABLE_INT64_SUPPORT=1 \
PT_ENABLE_INT64_SUPPORT=1 \
python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_clip.py \
--output_dir=/tmp/clip_roberta \
--model_name_or_path=./clip-roberta \
Expand Down Expand Up @@ -173,7 +173,7 @@ For training BridgeTower, you need to run the `run_bridgetower.py` script.
For instance, to reproduce the results presented in [this blog post](https://huggingface.co/blog/bridgetower), you should run:

```bash
python ../gaudi_spawn.py --use_mpi --world_size 8 run_bridgetower.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_mpi --world_size 8 run_bridgetower.py \
--output_dir /tmp/bridgetower-test \
--model_name_or_path BridgeTower/bridgetower-large-itm-mlm-itc \
--dataset_name jmhessel/newyorker_caption_contest --dataset_config_name matching \
Expand Down Expand Up @@ -204,7 +204,7 @@ To run only inference, you can start from the commands above and you just have t

For instance, you can run inference with CLIP on COCO on 1 Gaudi card with the following command:
```bash
python run_clip.py \
PT_HPU_LAZY_MODE=1 python run_clip.py \
--output_dir ./clip-roberta-finetuned \
--model_name_or_path ./clip-roberta \
--data_dir $PWD/data \
Expand Down
14 changes: 7 additions & 7 deletions examples/image-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ pip install -r requirements.txt
Here we show how to fine-tune a Vision Transformer (`ViT`) on Cifar10:

```bash
PT_HPU_LAZY_MODE=0 python run_image_classification.py \
python run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
--output_dir /tmp/outputs/ \
Expand Down Expand Up @@ -94,7 +94,7 @@ root/cat/[...]/asd932_.png
In other words, you need to organize your images in subfolders, based on their class. You can then run the script like this:

```bash
PT_HPU_LAZY_MODE=0 python run_image_classification.py \
python run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--train_dir <path-to-train-root> \
--output_dir /tmp/outputs/ \
Expand Down Expand Up @@ -176,7 +176,7 @@ $ huggingface-cli login
3. When running the script, pass the following arguments:

```bash
python run_image_classification.py \
PT_HPU_LAZY_MODE=1 python run_image_classification.py \
--push_to_hub \
--push_to_hub_model_id <name-your-model> \
...
Expand All @@ -188,7 +188,7 @@ python run_image_classification.py \
Here is how you would fine-tune ViT on Cifar10 using 8 HPUs:

```bash
PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py \
python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
Expand Down Expand Up @@ -230,7 +230,7 @@ For Swin, you need to change/add the following arguments:
Similarly to multi-HPU training, here is how you would fine-tune ViT on Cifar10 using 8 HPUs with DeepSpeed:

```bash
PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py \
python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
Expand Down Expand Up @@ -288,7 +288,7 @@ To run only inference, you can start from the commands above and you just have t

For instance, you can run inference with ViT on Cifar10 on 1 Gaudi card with the following command:
```bash
python run_image_classification.py \
PT_HPU_LAZY_MODE=1 python run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
--output_dir /tmp/outputs/ \
Expand All @@ -312,7 +312,7 @@ This directory contains an example script that demonstrates using FastViT with g
### Single-HPU inference

```bash
python3 run_timm_example.py \
PT_HPU_LAZY_MODE=1 python3 run_timm_example.py \
--model_name_or_path "timm/fastvit_t8.apple_in1k" \
--image_path "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png" \
--warmup 3 \
Expand Down
20 changes: 10 additions & 10 deletions examples/image-to-text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Habana FusedSDPA is a fused and optimized implementation of torch.nn.functional.
To run Llama inference with SDPA, use the following command:

```bash
python3 run_pipeline.py \
PT_HPU_LAZY_MODE=1 python3 run_pipeline.py \
--model_name_or_path meta-llama/Llama-3.2-11B-Vision-Instruct \
--use_hpu_graphs \
--bf16 \
Expand All @@ -35,20 +35,20 @@ python3 run_pipeline.py \

To run inference with THUDM/glm-4v-9b, use the following command (Note that you need to set the environment variable `GLM=4v` to distinguish between glm4v and chatglm, as these models are customized and share the same model type named "chatglm"):
```bash
GLM=4v python3 run_pipeline.py \
PT_HPU_LAZY_MODE=1 GLM=4v python3 run_pipeline.py \
--model_name_or_path THUDM/glm-4v-9b \
--use_hpu_graphs \
--bf16 \
--sdp_on_bf16 \
--use_flash_attention \
--use_kv_cache

```

### Multi-cards inference with BF16

Use the following commands to run Llama-3.2-90B-Vision-Instruct BF16 inference with FusedSDPA on 8 HPUs:
```bash
PT_HPU_ENABLE_LAZY_COLLECTIVES=true python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
PT_HPU_LAZY_MODE=1 PT_HPU_ENABLE_LAZY_COLLECTIVES=true python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
--model_name_or_path meta-llama/Llama-3.2-90B-Vision-Instruct \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
Expand All @@ -66,7 +66,7 @@ More information on enabling FP8 in SynapseAI is available here:
### Single card inference with FP8
Here is an example to measure the tensor quantization statistics on Llava-v1.6-vicuna-13b with SDPA:
```bash
QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
Expand All @@ -76,7 +76,7 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \

Here is an example to quantize the model based on previous measurements for Llava-v1.6-vicuna-13b with SDPA:
```bash
QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python run_pipeline.py \
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
Expand All @@ -87,7 +87,7 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python r
### Multi-cards inference with FP8
Here is an example of measuring the tensor quantization statistics on Llava-v1.6-mistral-7b with FusedSDPA on 8 HPUs:
```bash
QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
Expand All @@ -98,7 +98,7 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py

Here is an example of quantizing the model based on previous measurements for Llava-v1.6-mistral-7b with FusedSDPA on 8 HPUs:
```bash
QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
--use_hpu_graphs \
Expand All @@ -112,7 +112,7 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python .
Here are single-/multi-device command examples for meta-llama/Llama-3.2-11B-Vision-Instruct.

```bash
python3 run_image2text_lora_finetune.py \
PT_HPU_LAZY_MODE=1 python3 run_image2text_lora_finetune.py \
--model_name_or_path meta-llama/Llama-3.2-11B-Vision-Instruct \
--dataset_name nielsr/docvqa_1200_examples \
--bf16 True \
Expand Down Expand Up @@ -145,7 +145,7 @@ python3 run_image2text_lora_finetune.py \
```

```bash
python3 ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py \
--world_size 8 --use_mpi run_image2text_lora_finetune.py \
--model_name_or_path meta-llama/Llama-3.2-11B-Vision-Instruct \
--dataset_name nielsr/docvqa_1200_examples \
Expand Down
Loading