huggingface · jasi306 · Feb 27, 2025 · Mar 26, 2025
@@ -35,7 +35,7 @@ pip install -r requirements.txt
 The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset on a single HPU.
 
 ```bash
-python run_audio_classification.py \
+PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
     --model_name_or_path facebook/wav2vec2-base \
     --dataset_name superb \
     --dataset_config_name ks \
@@ -118,7 +118,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with Wav2Vec2 on the Keyword Spotting subset on 1 Gaudi card with the following command:
 ```bash
-python run_audio_classification.py \
+PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
     --model_name_or_path facebook/wav2vec2-base \
     --dataset_name superb \
     --dataset_config_name ks \

@@ -204,7 +204,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with CLIP on COCO on 1 Gaudi card with the following command:
 ```bash
-python run_clip.py \
+PT_HPU_LAZY_MODE=1 python run_clip.py \
     --output_dir ./clip-roberta-finetuned \
     --model_name_or_path ./clip-roberta \
     --data_dir $PWD/data \

@@ -312,7 +312,7 @@ This directory contains an example script that demonstrates using FastViT with g
 ### Single-HPU inference
 
 ```bash
-python3 run_timm_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_timm_example.py \
     --model_name_or_path "timm/fastvit_t8.apple_in1k" \
     --image_path "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png" \
     --warmup 3 \

@@ -37,7 +37,7 @@ The following examples fine-tune GPT-2, GPT-J-6B and GPT-NeoX-20B on WikiText-2.
 ### Single-card Training (GPT2)
 
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --dataset_name wikitext \
     --dataset_config_name wikitext-2-raw-v1 \
@@ -59,7 +59,7 @@ a perplexity of about 20.9963 once fine-tuned on the dataset.
 To run on your own training and validation files, use the following command:
 
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --train_file path_to_train_file \
     --validation_file path_to_validation_file \
@@ -175,7 +175,7 @@ converge slightly slower (over-fitting takes more epochs).
 ### Multi-card Training
 
 ```bash
-python ../gaudi_spawn.py \
+PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
     --world_size 8 --use_mpi run_mlm.py \
     --model_name_or_path roberta-base \
     --dataset_name wikitext \
@@ -211,7 +211,7 @@ You can easily train a model from scratch by replacing `--model_name_or_path my_
 
 For example with GPT2:
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --config_name gpt2 \
     --tokenizer_name gpt2 \
     --dataset_name wikitext \
@@ -235,7 +235,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with GPT2 on the Wikitext dataset on 1 Gaudi card with the following command:
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --dataset_name wikitext \
     --dataset_config_name wikitext-2-raw-v1 \
@@ -321,7 +321,7 @@ python ../gaudi_spawn.py \
 
 - Multi-card finetuning of Falcon-40B:
 ```bash
-PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
+PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py \
     --world_size 8 --use_mpi run_lora_clm.py \
     --model_name_or_path tiiuae/falcon-40b \
     --dataset_name timdettmers/openassistant-guanaco \
@@ -361,8 +361,8 @@ PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py
   > The following command requires Habana DeepSpeed 1.13.0 or later.
 
 ```bash
-PT_HPU_MAX_COMPOUND_OP_SIZE=10 \
-python3 ../gaudi_spawn.py --use_deepspeed  --world_size 8  run_lora_clm.py \
+PT_HPU_MAX_COMPOUND_OP_SIZE=10 PT_HPU_LAZY_MODE=1 \
+python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py \
   --model_name_or_path meta-llama/Llama-2-70b-hf \
   --deepspeed llama2_ds_zero3_config.json \
   --dataset_name tatsu-lab/alpaca \
@@ -445,7 +445,7 @@ Default `peft_type` is `lora`, you could enable adalora or ia3 using `--peft_typ
 To run on your own training and validation files, use the following command:
 
 ```bash
-python run_lora_clm.py \
+PT_HPU_LAZY_MODE=1 python run_lora_clm.py \
     --model_name_or_path bigcode/starcoder \
     --train_file path_to_train_file \
     --validation_file path_to_validation_file \
@@ -488,7 +488,7 @@ To run prompt tuning finetuning, you can use `run_prompt_tuning_clm.py`.
 Here are single-card command examples for Llama2-7B:
 - single-card finetuning of meta-llama/Llama-2-7b-hf with dataset "ought/raft" and config "twitter_complaints":
 ```bash
-python3 run_prompt_tuning_clm.py \
+PT_HPU_LAZY_MODE=1 python3 run_prompt_tuning_clm.py \
     --model_name_or_path meta-llama/Llama-2-7b-hf \
     --output_dir prompt_tuning_out \
     --bf16 True \
@@ -526,7 +526,7 @@ python3 ../text-generation/run_generation.py \
 To run multitask prompt seq2seq finetuning, you can use `run_multitask_prompt_tuning.py`.
 Here is a multi-device command example for [google/flan-t5-base](https://huggingface.co/google/flan-t5-base):
 ```bash
-python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_multitask_prompt_tuning.py \
+PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_multitask_prompt_tuning.py \
     --model_name_or_path google/flan-t5-base \
     --do_train \
     --report_to=none \
@@ -548,7 +548,7 @@ python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_multitask_prompt_tuning.p
 To run poly seq2seq finetuning, you can use `peft_poly_seq2seq_with_generate.py`.
 Here is a multi-device command example for [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl):
 ```bash
-python3 ../gaudi_spawn.py --world_size 8 --use_mpi peft_poly_seq2seq_with_generate.py \
+PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 --use_mpi peft_poly_seq2seq_with_generate.py \
     --model_name_or_path google/flan-t5-xl \
     --do_train \
     --report_to=none \
@@ -578,7 +578,7 @@ We have added support for [Deepspeed Ulysses](https://github.com/microsoft/DeepS
 
 ```bash
 HL_DS_DISTRIBUTED_ATTENTION_SEQ_DIM=1   \
-python3 ../gaudi_spawn.py  \
+PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py  \
         --world_size 8  --use_deepspeed run_lora_clm.py \
         --model_name_or_path meta-llama/Llama-3.1-8B \
         --dataset_name tatsu-lab/alpaca \
@@ -622,7 +622,7 @@ To use the streaming dataset mode which can be very useful for large datasets, a
 
 For example:
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --dataset_name wikitext \
     --dataset_config_name wikitext-2-raw-v1 \
@@ -646,7 +646,7 @@ python run_clm.py \
 When training a model from scratch, configuration values may be overridden with the help of `--config_overrides`:
 
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_type gpt2 \
     --tokenizer_name gpt2 \
     --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=1024" \

@@ -21,7 +21,7 @@ This folder contains an example script which demonstrates the usage of DETR to r
 ## Single-HPU inference
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
 	--model_name_or_path facebook/detr-resnet-101 \
 	--image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
 	--use_hpu_graphs \

@@ -20,7 +20,7 @@ This directory contains two example scripts that demonstrate how to perform obje
 ### ClipSeg Model
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
     --model_name_or_path "CIDAS/clipseg-rd64-refined" \
     --image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
     --prompt "cat, remote, blanket" \
@@ -34,7 +34,7 @@ python3 run_example.py \
 ### Segment Anything Model
 
 ```bash
-python3 run_example_sam.py \
+PT_HPU_LAZY_MODE=1 python3 run_example_sam.py \
     --model_name_or_path "facebook/sam-vit-huge" \
     --image_path "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png" \
     --point_prompt "450,600" \

@@ -36,7 +36,7 @@ Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.
 ### Training with HPU graph mode
 
 ```bash
-python train_hpu_graph.py \
+PT_HPU_LAZY_MODE=1 python train_hpu_graph.py \
     --data-dir ./ \
     --dataset hfds/johnowhitaker/imagenette2-320 \
     --device 'hpu' \
@@ -53,7 +53,7 @@ Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.
 ### Training with HPU graph mode
 
 ```bash
-torchrun --nnodes 1 --nproc_per_node 2 \
+PT_HPU_LAZY_MODE=1 torchrun --nnodes 1 --nproc_per_node 2 \
     train_hpu_graph.py \
     --data-dir ./ \
     --dataset hfds/johnowhitaker/imagenette2-320 \
@@ -71,7 +71,7 @@ Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.
 
 ### HPU with graph mode
 ```bash
-python inference.py \
+PT_HPU_LAZY_MODE=1 python inference.py \
     --data-dir='./' \
     --dataset hfds/johnowhitaker/imagenette2-320 \
     --device='hpu' \

@@ -197,7 +197,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with Wav2Vec2 on the Librispeech dataset on 1 Gaudi card with the following command:
 ```bash
-python run_speech_recognition_ctc.py \
+PT_HPU_LAZY_MODE=1 python run_speech_recognition_ctc.py \
     --dataset_name="librispeech_asr" \
     --model_name_or_path="facebook/wav2vec2-large-lv60" \
     --dataset_config_name="clean" \