Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0
With DeepSpeed successfully installed we can now run a distributed GPT-2 inference on an 8 HPU system as follows:
```bash
number_of_devices=8 \
python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--use_hpu_graphs \
Expand Down Expand Up @@ -167,7 +167,7 @@ python run_clm.py \
To train GPT-2 model using multi-card Gaudi system:
```bash
number_of_devices=8 \
python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_deepspeed --world_size ${number_of_devices} \
run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage_guides/multi_node_training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ We are going to use the [causal language modeling example which is given in the

The first step consists in training the model on several nodes with this command:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--hostfile path_to_hostfile --use_deepspeed run_clm.py \
--model_name_or_path gpt2-xl \
--gaudi_config_name Habana/gpt2 \
Expand Down
2 changes: 1 addition & 1 deletion examples/contrastive-image-text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ For training BridgeTower, you need to run the `run_bridgetower.py` script.
For instance, to reproduce the results presented in [this blog post](https://huggingface.co/blog/bridgetower), you should run:

```bash
python ../gaudi_spawn.py --use_mpi --world_size 8 run_bridgetower.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_mpi --world_size 8 run_bridgetower.py \
--output_dir /tmp/bridgetower-test \
--model_name_or_path BridgeTower/bridgetower-large-itm-mlm-itc \
--dataset_name jmhessel/newyorker_caption_contest --dataset_config_name matching \
Expand Down
4 changes: 2 additions & 2 deletions examples/image-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ $ huggingface-cli login
3. When running the script, pass the following arguments:

```bash
python run_image_classification.py \
PT_HPU_LAZY_MODE=1 python run_image_classification.py \
--push_to_hub \
--push_to_hub_model_id <name-your-model> \
...
Expand Down Expand Up @@ -288,7 +288,7 @@ To run only inference, you can start from the commands above and you just have t

For instance, you can run inference with ViT on Cifar10 on 1 Gaudi card with the following command:
```bash
python run_image_classification.py \
PT_HPU_LAZY_MODE=1 python run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
--output_dir /tmp/outputs/ \
Expand Down
12 changes: 6 additions & 6 deletions examples/language-modeling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ python run_clm.py \
### Multi-card Training (GPT2)

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
Expand Down Expand Up @@ -109,7 +109,7 @@ Fine tuning on 8 HPU cards takes around 6 minutes with a batch size of 32 (4 per
It reaches a perplexity of 14.011.

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_clm.py \
--model_name_or_path EleutherAI/gpt-j-6b \
--dataset_name wikitext \
Expand Down Expand Up @@ -143,7 +143,7 @@ It reaches a perplexity of 10.469.
> Please refer to [this page](https://github.com/huggingface/optimum-habana/tree/main/examples/multi-node-training) for performing multi-node training properly.

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--hostfile path_to_my_hostfile --use_deepspeed run_clm.py \
--model_name_or_path EleutherAI/gpt-neox-20b \
--dataset_name wikitext \
Expand Down Expand Up @@ -175,7 +175,7 @@ converge slightly slower (over-fitting takes more epochs).
### Multi-card Training

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_mlm.py \
--model_name_or_path roberta-base \
--dataset_name wikitext \
Expand Down Expand Up @@ -292,7 +292,7 @@ python3 run_lora_clm.py \

- Multi-card finetuning of gemma2 using chat template:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 2 --use_mpi run_lora_clm.py \
--model_name_or_path google/gemma-2b-it \
--per_device_train_batch_size 16 \
Expand Down Expand Up @@ -509,7 +509,7 @@ Default `peft_type` is `prompt_tuning`, you could enable prefix-tuning or p-tuni

Use the prompt finetuned model for text-generation:
```bash
python3 ../text-generation/run_generation.py \
PT_HPU_LAZY_MODE=1 python3 ../text-generation/run_generation.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--max_new_tokens 128 \
--bf16 \
Expand Down
2 changes: 1 addition & 1 deletion examples/protein-folding/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ python run_zero_shot_eval.py --bf16 --max_seq_length 1024
## Multi-HPU finetune for sequence classification task

```bash
python ../gaudi_spawn.py --world_size 8 --use_mpi run_sequence_classification.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --world_size 8 --use_mpi run_sequence_classification.py \
--output_dir ./out \
--model_name_or_path mila-intel/protst-esm1b-for-sequential-classification \
--tokenizer_name facebook/esm1b_t33_650M_UR50S \
Expand Down
2 changes: 1 addition & 1 deletion examples/question-answering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ pip install -r requirements.txt

Here is a command you can run to train a Llama model for question answering:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_qa.py \
--model_name_or_path meta-llama/Llama-2-7b-chat-hf \
--gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
Expand Down
6 changes: 3 additions & 3 deletions examples/speech-recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ On a single HPU, this script should run in *ca.* 6 hours and yield a CTC loss of
The following command shows how to fine-tune [wav2vec2-large-lv60](https://huggingface.co/facebook/wav2vec2-large-lv60) on [Librispeech](https://huggingface.co/datasets/librispeech_asr) using 8 HPUs.

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_speech_recognition_ctc.py \
--dataset_name librispeech_asr \
--model_name_or_path facebook/wav2vec2-large-lv60 \
Expand Down Expand Up @@ -154,7 +154,7 @@ DeepSpeed can be used with almost the same command as for a multi-card run:

For example:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_speech_recognition_ctc.py \
--dataset_name librispeech_asr \
--model_name_or_path facebook/wav2vec2-large-lv60 \
Expand Down Expand Up @@ -273,7 +273,7 @@ If training on a different language, you should be sure to change the `language`
### Multi HPU Whisper Training with Seq2Seq
The following example shows how to fine-tune the [Whisper large](https://huggingface.co/openai/whisper-large) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) using 8 HPU devices in half-precision:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_speech_recognition_seq2seq.py \
--model_name_or_path="openai/whisper-large" \
--dataset_name="mozilla-foundation/common_voice_11_0" \
Expand Down
31 changes: 28 additions & 3 deletions examples/stable-diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ python text_to_image_generation.py \
Distributed inference with multiple HPUs is also supported. Below is an example demonstrating how to generate images with two prompts on two HPUs:

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
--prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
Expand Down Expand Up @@ -147,7 +147,8 @@ python text_to_image_generation.py \
Here is how to generate images and depth maps with two prompts on two HPUs:

```bash
python ../gaudi_spawn.py --world_size 2 text_to_image_generation.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path "Intel/ldm3d-4c" \
--prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
--num_images_per_prompt 10 \
Expand Down Expand Up @@ -219,7 +220,8 @@ python text_to_image_generation.py \
SDXL also supports distributed inferencing with Intel Gaudi accelerators. Below is an example of generating SDXL images in a distributed manner using two prompts on two HPUs:

```bash
python ../gaudi_spawn.py --world_size 2 text_to_image_generation.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
--model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
--prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
--prompts_2 "Red tone" "Blue tone" \
Expand Down Expand Up @@ -481,7 +483,30 @@ The ControlNet example can be run with multiple prompts by supplying more than o
Additionally, it supports distributed execution. Below is an example of generating images conditioned by the Canny edge model using two prompts on two HPUs:

```bash
<<<<<<< HEAD
python ../gaudi_spawn.py --world_size 2 text_to_image_generation.py \
=======
python text_to_image_generation.py \
--model_name_or_path CompVis/stable-diffusion-v1-4 \
--controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
--prompts "futuristic-looking woman" "a rusty robot" \
--control_image https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png \
--num_images_per_prompt 28 \
--batch_size 7 \
--image_save_dir /tmp/controlnet_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--sdp_on_bf16 \
--bf16
```

Here is how to generate images conditioned by canny edge model and with two prompts on two HPUs:

```bash
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 2 text_to_image_generation.py \
>>>>>>> c6d15a26 ([SW-218526] Updated Readme files for explicite lazy mode (#174))
--model_name_or_path CompVis/stable-diffusion-v1-4 \
--controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
--prompts "futuristic-looking woman" "a rusty robot" \
Expand Down
2 changes: 1 addition & 1 deletion examples/summarization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ And as with the CSV files, you can specify which values to select from the file,

Here is an example on 8 HPUs:
```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_summarization.py \
--model_name_or_path t5-small \
--do_train \
Expand Down
6 changes: 3 additions & 3 deletions examples/text-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ python run_glue.py \
Here is how you would fine-tune the BERT large model (with whole word masking) on the text classification MRPC task using the `run_glue` script, with 8 HPUs:

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
Expand Down Expand Up @@ -101,7 +101,7 @@ python ../gaudi_spawn.py \
Similarly to multi-card training, here is how you would fine-tune the BERT large model (with whole word masking) on the text classification MRPC task using DeepSpeed with 8 HPUs:

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
Expand Down Expand Up @@ -176,7 +176,7 @@ Llama Guard can be used for text classification. The Transformers library will c
Llama Guard can be fine-tuned with DeepSpeed, here is how you would do it on the text classification MRPC task using DeepSpeed with 8 HPUs:

```bash
python ../gaudi_spawn.py \
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_glue.py \
--model_name_or_path meta-llama/LlamaGuard-7b \
--gaudi_config Habana/llama \
Expand Down
Loading