diff --git a/examples/audio-classification/README.md b/examples/audio-classification/README.md
index c8dd7b126c..40d5434ddc 100644
--- a/examples/audio-classification/README.md
+++ b/examples/audio-classification/README.md
@@ -35,7 +35,7 @@ pip install -r requirements.txt
 The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset on a single HPU.
 
 ```bash
-python run_audio_classification.py \
+PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
     --model_name_or_path facebook/wav2vec2-base \
     --dataset_name superb \
     --dataset_config_name ks \
@@ -118,7 +118,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with Wav2Vec2 on the Keyword Spotting subset on 1 Gaudi card with the following command:
 ```bash
-python run_audio_classification.py \
+PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
     --model_name_or_path facebook/wav2vec2-base \
     --dataset_name superb \
     --dataset_config_name ks \
diff --git a/examples/contrastive-image-text/README.md b/examples/contrastive-image-text/README.md
index def6d74ec0..63378e3aea 100644
--- a/examples/contrastive-image-text/README.md
+++ b/examples/contrastive-image-text/README.md
@@ -204,7 +204,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with CLIP on COCO on 1 Gaudi card with the following command:
 ```bash
-python run_clip.py \
+PT_HPU_LAZY_MODE=1 python run_clip.py \
     --output_dir ./clip-roberta-finetuned \
     --model_name_or_path ./clip-roberta \
     --data_dir $PWD/data \
diff --git a/examples/image-classification/README.md b/examples/image-classification/README.md
index 01b19b25ba..58352650cc 100644
--- a/examples/image-classification/README.md
+++ b/examples/image-classification/README.md
@@ -312,7 +312,7 @@ This directory contains an example script that demonstrates using FastViT with g
 ### Single-HPU inference
 
 ```bash
-python3 run_timm_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_timm_example.py \
     --model_name_or_path "timm/fastvit_t8.apple_in1k" \
     --image_path "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png" \
     --warmup 3 \
diff --git a/examples/language-modeling/README.md b/examples/language-modeling/README.md
index 5cce1528dc..138f4e8b2a 100644
--- a/examples/language-modeling/README.md
+++ b/examples/language-modeling/README.md
@@ -37,7 +37,7 @@ The following examples fine-tune GPT-2, GPT-J-6B and GPT-NeoX-20B on WikiText-2.
 ### Single-card Training (GPT2)
 
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --dataset_name wikitext \
     --dataset_config_name wikitext-2-raw-v1 \
@@ -59,7 +59,7 @@ a perplexity of about 20.9963 once fine-tuned on the dataset.
 To run on your own training and validation files, use the following command:
 
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --train_file path_to_train_file \
     --validation_file path_to_validation_file \
@@ -175,7 +175,7 @@ converge slightly slower (over-fitting takes more epochs).
 ### Multi-card Training
 
 ```bash
-python ../gaudi_spawn.py \
+PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
     --world_size 8 --use_mpi run_mlm.py \
     --model_name_or_path roberta-base \
     --dataset_name wikitext \
@@ -211,7 +211,7 @@ You can easily train a model from scratch by replacing `--model_name_or_path my_
 
 For example with GPT2:
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --config_name gpt2 \
     --tokenizer_name gpt2 \
     --dataset_name wikitext \
@@ -235,7 +235,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with GPT2 on the Wikitext dataset on 1 Gaudi card with the following command:
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --dataset_name wikitext \
     --dataset_config_name wikitext-2-raw-v1 \
@@ -321,7 +321,7 @@ python ../gaudi_spawn.py \
 
 - Multi-card finetuning of Falcon-40B:
 ```bash
-PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
+PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py \
     --world_size 8 --use_mpi run_lora_clm.py \
     --model_name_or_path tiiuae/falcon-40b \
     --dataset_name timdettmers/openassistant-guanaco \
@@ -361,8 +361,8 @@ PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py
   > The following command requires Habana DeepSpeed 1.13.0 or later.
 
 ```bash
-PT_HPU_MAX_COMPOUND_OP_SIZE=10 \
-python3 ../gaudi_spawn.py --use_deepspeed  --world_size 8  run_lora_clm.py \
+PT_HPU_MAX_COMPOUND_OP_SIZE=10 PT_HPU_LAZY_MODE=1 \
+python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py \
   --model_name_or_path meta-llama/Llama-2-70b-hf \
   --deepspeed llama2_ds_zero3_config.json \
   --dataset_name tatsu-lab/alpaca \
@@ -445,7 +445,7 @@ Default `peft_type` is `lora`, you could enable adalora or ia3 using `--peft_typ
 To run on your own training and validation files, use the following command:
 
 ```bash
-python run_lora_clm.py \
+PT_HPU_LAZY_MODE=1 python run_lora_clm.py \
     --model_name_or_path bigcode/starcoder \
     --train_file path_to_train_file \
     --validation_file path_to_validation_file \
@@ -488,7 +488,7 @@ To run prompt tuning finetuning, you can use `run_prompt_tuning_clm.py`.
 Here are single-card command examples for Llama2-7B:
 - single-card finetuning of meta-llama/Llama-2-7b-hf with dataset "ought/raft" and config "twitter_complaints":
 ```bash
-python3 run_prompt_tuning_clm.py \
+PT_HPU_LAZY_MODE=1 python3 run_prompt_tuning_clm.py \
     --model_name_or_path meta-llama/Llama-2-7b-hf \
     --output_dir prompt_tuning_out \
     --bf16 True \
@@ -526,7 +526,7 @@ python3 ../text-generation/run_generation.py \
 To run multitask prompt seq2seq finetuning, you can use `run_multitask_prompt_tuning.py`.
 Here is a multi-device command example for [google/flan-t5-base](https://huggingface.co/google/flan-t5-base):
 ```bash
-python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_multitask_prompt_tuning.py \
+PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_multitask_prompt_tuning.py \
     --model_name_or_path google/flan-t5-base \
     --do_train \
     --report_to=none \
@@ -548,7 +548,7 @@ python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_multitask_prompt_tuning.p
 To run poly seq2seq finetuning, you can use `peft_poly_seq2seq_with_generate.py`.
 Here is a multi-device command example for [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl):
 ```bash
-python3 ../gaudi_spawn.py --world_size 8 --use_mpi peft_poly_seq2seq_with_generate.py \
+PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 --use_mpi peft_poly_seq2seq_with_generate.py \
     --model_name_or_path google/flan-t5-xl \
     --do_train \
     --report_to=none \
@@ -578,7 +578,7 @@ We have added support for [Deepspeed Ulysses](https://github.com/microsoft/DeepS
 
 ```bash
 HL_DS_DISTRIBUTED_ATTENTION_SEQ_DIM=1   \
-python3 ../gaudi_spawn.py  \
+PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py  \
         --world_size 8  --use_deepspeed run_lora_clm.py \
         --model_name_or_path meta-llama/Llama-3.1-8B \
         --dataset_name tatsu-lab/alpaca \
@@ -622,7 +622,7 @@ To use the streaming dataset mode which can be very useful for large datasets, a
 
 For example:
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_name_or_path gpt2 \
     --dataset_name wikitext \
     --dataset_config_name wikitext-2-raw-v1 \
@@ -646,7 +646,7 @@ python run_clm.py \
 When training a model from scratch, configuration values may be overridden with the help of `--config_overrides`:
 
 ```bash
-python run_clm.py \
+PT_HPU_LAZY_MODE=1 python run_clm.py \
     --model_type gpt2 \
     --tokenizer_name gpt2 \
     --config_overrides="n_embd=1024,n_head=16,n_layer=48,n_positions=1024" \
diff --git a/examples/object-detection/README.md b/examples/object-detection/README.md
index 0ce639dc9b..8060d0434b 100644
--- a/examples/object-detection/README.md
+++ b/examples/object-detection/README.md
@@ -21,7 +21,7 @@ This folder contains an example script which demonstrates the usage of DETR to r
 ## Single-HPU inference
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
 	--model_name_or_path facebook/detr-resnet-101 \
 	--image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
 	--use_hpu_graphs \
diff --git a/examples/object-segementation/README.md b/examples/object-segementation/README.md
index 2b8728eb56..99c1d89657 100644
--- a/examples/object-segementation/README.md
+++ b/examples/object-segementation/README.md
@@ -20,7 +20,7 @@ This directory contains two example scripts that demonstrate how to perform obje
 ### ClipSeg Model
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
     --model_name_or_path "CIDAS/clipseg-rd64-refined" \
     --image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
     --prompt "cat, remote, blanket" \
@@ -34,7 +34,7 @@ python3 run_example.py \
 ### Segment Anything Model
 
 ```bash
-python3 run_example_sam.py \
+PT_HPU_LAZY_MODE=1 python3 run_example_sam.py \
     --model_name_or_path "facebook/sam-vit-huge" \
     --image_path "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png" \
     --point_prompt "450,600" \
diff --git a/examples/pytorch-image-models/README.md b/examples/pytorch-image-models/README.md
index 731e61d612..392a35d3b1 100644
--- a/examples/pytorch-image-models/README.md
+++ b/examples/pytorch-image-models/README.md
@@ -36,7 +36,7 @@ Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.
 ### Training with HPU graph mode
 
 ```bash
-python train_hpu_graph.py \
+PT_HPU_LAZY_MODE=1 python train_hpu_graph.py \
     --data-dir ./ \
     --dataset hfds/johnowhitaker/imagenette2-320 \
     --device 'hpu' \
@@ -53,7 +53,7 @@ Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.
 ### Training with HPU graph mode
 
 ```bash
-torchrun --nnodes 1 --nproc_per_node 2 \
+PT_HPU_LAZY_MODE=1 torchrun --nnodes 1 --nproc_per_node 2 \
     train_hpu_graph.py \
     --data-dir ./ \
     --dataset hfds/johnowhitaker/imagenette2-320 \
@@ -71,7 +71,7 @@ Here we show how to fine-tune the [imagenette2-320 dataset](https://huggingface.
 
 ### HPU with graph mode
 ```bash
-python inference.py \
+PT_HPU_LAZY_MODE=1 python inference.py \
     --data-dir='./' \
     --dataset hfds/johnowhitaker/imagenette2-320 \
     --device='hpu' \
diff --git a/examples/speech-recognition/README.md b/examples/speech-recognition/README.md
index 1f0f8fbe38..342e98d7da 100644
--- a/examples/speech-recognition/README.md
+++ b/examples/speech-recognition/README.md
@@ -197,7 +197,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with Wav2Vec2 on the Librispeech dataset on 1 Gaudi card with the following command:
 ```bash
-python run_speech_recognition_ctc.py \
+PT_HPU_LAZY_MODE=1 python run_speech_recognition_ctc.py \
     --dataset_name="librispeech_asr" \
     --model_name_or_path="facebook/wav2vec2-large-lv60" \
     --dataset_config_name="clean" \
diff --git a/examples/stable-diffusion/README.md b/examples/stable-diffusion/README.md
index 9919780543..a806388c0f 100644
--- a/examples/stable-diffusion/README.md
+++ b/examples/stable-diffusion/README.md
@@ -35,7 +35,7 @@ pip install -r requirements.txt
 Here's how to generate images using the Stable Diffusion 1.4 model with a single prompt:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
     --prompts "An image of a squirrel in Picasso style" \
     --num_images_per_prompt 28 \
@@ -56,7 +56,7 @@ python text_to_image_generation.py \
 To generate images with multiple prompts, simply include two prompts in your input as shown below:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
     --prompts "An image of a squirrel in Picasso style" "A shiny flying horse taking off" \
     --num_images_per_prompt 32 \
@@ -101,7 +101,7 @@ You can run other older Stable Diffusion models in a similar manner. For example
 to generate images with this script. Here is an example demonstrating image generation with a single prompt:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-2-1 \
     --prompts "An image of a squirrel in Picasso style" \
     --num_images_per_prompt 28 \
@@ -130,7 +130,7 @@ to generate RGBD images from text prompts.
 are open source. A [demo](https://huggingface.co/spaces/Intel/ldm3d) is also available. Here is how to run this model:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path "Intel/ldm3d-4c" \
     --prompts "An image of a squirrel in Picasso style" \
     --num_images_per_prompt 28 \
@@ -176,7 +176,7 @@ by the Stability AI team.
 Here is how to generate SDXL images with a single prompt:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
     --prompts "Sailing ship painting by Van Gogh" \
     --num_images_per_prompt 28 \
@@ -199,7 +199,7 @@ python text_to_image_generation.py \
 SDXL integrates a second text encoder (OpenCLIP ViT-bigG/14), alongside the original Stable Diffusion text encoder. This addition significantly increases the number of parameters, enabling more detailed and descriptive prompts. Below is an example of how to generate images using multiple prompts for both `prompt` (primary text encoder) and `prompt_2` (secondary text encoder), along with their respective negative prompts:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
     --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
     --prompts_2 "Red tone" "Blue tone" \
@@ -243,7 +243,7 @@ inference in mixed FP8 precision.
 Here is how to generate SDXL images with optimized pipeline in FP8 precision:
 ```bash
 QUANT_CONFIG=quantization/stable-diffusion-xl/quantize_config.json \
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
     --prompts "Sailing ship painting by Van Gogh" \
     --num_images_per_prompt 28 \
@@ -267,7 +267,7 @@ optimized for real-time synthesis.
 Here is how to generate images with multiple prompts:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path stabilityai/sdxl-turbo \
     --prompts "Sailing ship painting by Van Gogh" "A shiny flying horse taking off" \
     --num_images_per_prompt 32 \
@@ -305,6 +305,7 @@ huggingface-cli login
 Here is how to generate SD3 images with a single prompt:
 
 ```bash
+PT_HPU_MAX_COMPOUND_OP_SIZE=1 PT_HPU_LAZY_MODE=1 \
 python text_to_image_generation.py \
     --model_name_or_path stabilityai/stable-diffusion-3-medium-diffusers \
     --prompts "Sailing ship painting by Van Gogh" \
@@ -369,7 +370,7 @@ FLUX.1 was introduced by Black Forest Labs [here](https://blackforestlabs.ai/ann
 Here is how to run FLUX.1-schnell model (distilled fast version of FLUX.1):
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path black-forest-labs/FLUX.1-schnell \
     --prompts "A cat holding a sign that says hello world" \
     --num_images_per_prompt 10 \
@@ -396,7 +397,7 @@ huggingface-cli login
 Here is how to run FLUX.1-dev model:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path black-forest-labs/FLUX.1-dev \
     --prompts "A cat holding a sign that says hello world" \
     --num_images_per_prompt 10 \
@@ -416,7 +417,7 @@ This model can also be quantized with some ops running in FP8 precision.
 Before quantization, run stats collection using measure mode:
 
 ```bash
-QUANT_CONFIG=quantization/flux/measure_config.json \
+QUANT_CONFIG=quantization/flux/measure_config.json PT_HPU_LAZY_MODE=1 \
 python text_to_image_generation.py \
     --model_name_or_path black-forest-labs/FLUX.1-dev \
     --prompts "A cat holding a sign that says hello world" \
@@ -436,7 +437,7 @@ python text_to_image_generation.py \
 After stats collection, here is how to run FLUX.1-dev in quantization mode:
 
 ```bash
-QUANT_CONFIG=quantization/flux/quantize_config.json \
+QUANT_CONFIG=quantization/flux/quantize_config.json PT_HPU_LAZY_MODE=1 \
 python text_to_image_generation.py \
     --model_name_or_path black-forest-labs/FLUX.1-dev \
     --prompts "A cat holding a sign that says hello world" \
@@ -462,7 +463,7 @@ by Lvmin Zhang and Maneesh Agrawala, enables conditioning the Stable Diffusion m
 Here is how to generate images conditioned by Canny edge model:
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
     --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
     --prompts "futuristic-looking woman" \
@@ -481,7 +482,8 @@ The ControlNet example can be run with multiple prompts by supplying more than o
 Additionally, it supports distributed execution. Below is an example of generating images conditioned by the Canny edge model using two prompts on two HPUs:
 
 ```bash
-python ../gaudi_spawn.py --world_size 2 text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
+    --world_size 2 text_to_image_generation.py \
     --model_name_or_path CompVis/stable-diffusion-v1-4 \
     --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
     --prompts "futuristic-looking woman" "a rusty robot" \
@@ -507,7 +509,7 @@ please refer to [Hugging Face Diffusers doc](https://huggingface.co/docs/diffuse
 ### Stable Diffusion Inpainting
 
 ```bash
-python text_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
     --model_name_or_path  stabilityai/stable-diffusion-2-inpainting \
     --base_image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png \
     --mask_image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png \
@@ -526,8 +528,8 @@ python text_to_image_generation.py \
 ### Stable Diffusion XL Inpainting
 
 ```bash
-python text_to_image_generation.py \
-    --model_name_or_path  diffusers/stable-diffusion-xl-1.0-inpainting-0.1 \
+PT_HPU_LAZY_MODE=1 python text_to_image_generation.py \
+    --model_name_or_path  diffusers/stable-diffusion-xl-1.0-inpainting-0.1\
     --base_image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png \
     --mask_image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint_mask.png \
     --prompts "concept art digital painting of an elven castle, inspired by lord of the rings, highly detailed, 8k" \
@@ -639,7 +641,7 @@ Images can also be generated using initial input images to guide the diffusion-b
 Here is how to generate images using a single prompt and an input image with the `timbrooks/instruct-pix2pix` model, which is based on Stable Diffusion:
 
 ```bash
-python image_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python image_to_image_generation.py \
     --model_name_or_path "timbrooks/instruct-pix2pix" \
     --src_image_path "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg" \
     --prompts "turn him into cyborg" \
@@ -666,7 +668,7 @@ python image_to_image_generation.py \
 Here is how to refine SDXL images using a single image and prompt:
 
 ```bash
-python image_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python image_to_image_generation.py \
     --model_name_or_path "stabilityai/stable-diffusion-xl-refiner-1.0" \
     --src_image_path "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg" \
     --prompts "turn him into cyborg" \
@@ -687,7 +689,7 @@ python image_to_image_generation.py \
 Here is how to generate a FLUX.1 image using a single input image and prompt:
 
 ```bash
-python image_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python image_to_image_generation.py \
     --model_name_or_path "black-forest-labs/FLUX.1-dev" \
     --src_image_path "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png" \
     --prompts "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k" \
@@ -709,7 +711,7 @@ python image_to_image_generation.py \
 Here is how to generate image variations of a single image (without any input prompts):
 
 ```bash
-python image_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python image_to_image_generation.py \
     --model_name_or_path "lambdalabs/sd-image-variations-diffusers" \
     --src_image_path "https://github.com/SHI-Labs/Versatile-Diffusion/blob/master/assets/demo/reg_example/ghibli.jpg?raw=true" \
     --num_images_per_prompt 20 \
@@ -728,7 +730,7 @@ python image_to_image_generation.py \
 Here is an example of performing depth-guided image generation:
 
 ```bash
-python depth_to_image_generation.py \
+PT_HPU_LAZY_MODE=1 python depth_to_image_generation.py \
     --model_name_or_path "stabilityai/stable-diffusion-2-depth" \
     --prompts "two tigers" \
     --base_image "http://images.cocodataset.org/val2017/000000039769.jpg" \
@@ -768,7 +770,7 @@ Here is how to generate video with one image prompt:
 
 ```bash
 PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
-python image_to_video_generation.py \
+PT_HPU_LAZY_MODE=1 python image_to_video_generation.py \
     --model_name_or_path "stabilityai/stable-video-diffusion-img2vid-xt" \
     --image_path "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png" \
     --num_videos_per_prompt 1 \
@@ -791,7 +793,7 @@ Here is how to generate videos with several image prompts:
 
 ```bash
 PT_HPU_MAX_COMPOUND_OP_SIZE=1 \
-python image_to_video_generation.py \
+PT_HPU_LAZY_MODE=1 python image_to_video_generation.py \
     --model_name_or_path "stabilityai/stable-video-diffusion-img2vid-xt" \
     --image_path \
         "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png" \
@@ -816,8 +818,8 @@ python image_to_video_generation.py \
 
 Here is how to generate video conditioned by depth:
 
-```bash
-python image_to_video_generation.py \
+```
+PT_HPU_LAZY_MODE=1 python image_to_video_generation.py \
     --model_name_or_path "stabilityai/stable-video-diffusion-img2vid" \
     --controlnet_model_name_or_path "CiaraRowles/temporal-controlnet-depth-svd-v1" \
     --control_image_path \
diff --git a/examples/summarization/README.md b/examples/summarization/README.md
index bdaef78edf..9e1e7820d7 100644
--- a/examples/summarization/README.md
+++ b/examples/summarization/README.md
@@ -35,7 +35,7 @@ pip install -r requirements.txt
 Here is an example of a summarization task with T5:
 
 ```bash
-python run_summarization.py \
+PT_HPU_LAZY_MODE=1 python run_summarization.py \
     --model_name_or_path t5-small \
     --do_train \
     --do_eval \
@@ -68,7 +68,7 @@ And here is how you would use it on your own files, after adjusting the values f
 `--train_file`, `--validation_file`, `--text_column` and `--summary_column` to match your setup:
 
 ```bash
-python run_summarization.py \
+PT_HPU_LAZY_MODE=1 python run_summarization.py \
     --model_name_or_path t5-small \
     --do_train \
     --do_eval \
@@ -189,7 +189,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with T5 on the CNN-DailyMail dataset on 1 Gaudi card with the following command:
 ```bash
-python run_summarization.py \
+PT_HPU_LAZY_MODE=1 python run_summarization.py \
     --model_name_or_path t5-small \
     --do_eval \
     --dataset_name cnn_dailymail \
diff --git a/examples/table-detection/README.md b/examples/table-detection/README.md
index b7bbef51c2..8577c766d4 100644
--- a/examples/table-detection/README.md
+++ b/examples/table-detection/README.md
@@ -28,7 +28,7 @@ pip install -r requirements.txt
 ## Single HPU Inference
 
 ```bash
-python run_example.py \
+PT_HPU_LAZY_MODE=1 python run_example.py \
     --model_name_or_path microsoft/table-transformer-detection \
     --dataset_name nielsr/example-pdf \
     --filename example_pdf.png \
diff --git a/examples/text-classification/README.md b/examples/text-classification/README.md
index 9ffc78ae43..5e18b16495 100644
--- a/examples/text-classification/README.md
+++ b/examples/text-classification/README.md
@@ -45,7 +45,7 @@ For the following cases, an example of a Gaudi configuration file is given
 The following example fine-tunes BERT Large (lazy mode) on the `mrpc` dataset hosted on our [hub](https://huggingface.co/datasets):
 
 ```bash
-python run_glue.py \
+PT_HPU_LAZY_MODE=1 python run_glue.py \
   --model_name_or_path bert-large-uncased-whole-word-masking \
   --gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
   --task_name mrpc \
@@ -152,7 +152,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with BERT on GLUE on 1 Gaudi card with the following command:
 ```bash
-python run_glue.py \
+PT_HPU_LAZY_MODE=1 python run_glue.py \
   --model_name_or_path bert-large-uncased-whole-word-masking \
   --gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
   --task_name mrpc \
@@ -176,7 +176,7 @@ Llama Guard can be used for text classification. The Transformers library will c
 Llama Guard can be fine-tuned with DeepSpeed, here is how you would do it on the text classification MRPC task using DeepSpeed with 8 HPUs:
 
 ```bash
-python ../gaudi_spawn.py \
+PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
     --world_size 8 --use_deepspeed run_glue.py \
     --model_name_or_path meta-llama/LlamaGuard-7b \
     --gaudi_config Habana/llama \
@@ -207,7 +207,7 @@ You can look at the [documentation](https://huggingface.co/docs/optimum/habana/u
 You can run inference with Llama Guard on GLUE on 1 Gaudi card with the following command:
 
 ```bash
-python run_glue.py \
+PT_HPU_LAZY_MODE=1 python run_glue.py \
   --model_name_or_path meta-llama/LlamaGuard-7b \
   --gaudi_config Habana/llama \
   --task_name mrpc \
diff --git a/examples/text-feature-extraction/README.md b/examples/text-feature-extraction/README.md
index e46168840b..ec835e8d8f 100644
--- a/examples/text-feature-extraction/README.md
+++ b/examples/text-feature-extraction/README.md
@@ -21,7 +21,7 @@ This directory contains a script that showcases how to use text embedding models
 ## Single-HPU inference
 
 ```bash
-python run_feature_extraction.py \
+PT_HPU_LAZY_MODE=1 python run_feature_extraction.py \
     --model_name_or_path Supabase/gte-small \
     --source_sentence "What is a deep learning architecture for feature extraction?" \
     --input_texts "There are many different variants of apples created every year." \
diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
index fb9bf9b0a8..a53c0613a1 100755
--- a/examples/text-generation/README.md
+++ b/examples/text-generation/README.md
@@ -73,7 +73,7 @@ python run_generation.py --help
 If you want to generate a sequence of text from a prompt of your choice, you should use the `--prompt` argument.
 For example:
 ```
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path gpt2 \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -85,7 +85,7 @@ python run_generation.py \
 
 If you want to provide several prompts as inputs, here is how to do it:
 ```
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path gpt2 \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -103,7 +103,7 @@ python run_generation.py \
 If you want to generate a sequence of text from a prompt of your choice using assisted decoding, you can use the following command as an example:
 
 ```
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path gpt2 \
 --assistant_model distilgpt2 \
 --batch_size 1 \
@@ -163,7 +163,7 @@ python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
 
 To run Falcon-7B inference, use the following command:
 ```bash
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
  --model_name_or_path tiiuae/falcon-7b \
  --bf16 \
  --use_hpu_graphs \
@@ -244,7 +244,7 @@ By default, the first column in the dataset of type `string` will be used as pro
 
 Here is an example with [JulesBelveze/tldr_news](https://huggingface.co/datasets/JulesBelveze/tldr_news):
 ```bash
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path gpt2 \
 --batch_size 2 \
 --max_new_tokens 100 \
@@ -265,7 +265,7 @@ You can also provide the path to a PEFT model to perform generation with the arg
 
 For example:
 ```bash
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path meta-llama/Llama-2-7b-hf \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -285,7 +285,7 @@ With `--bucket_size`, instead of padding up the kv-cache up to full size before
 
 Here is an example:
 ```bash
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path path_to_model    \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -308,7 +308,7 @@ While `--bucket_size` works for any model without model file changes, an even mo
 
 Here is an example:
 ```bash
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path Qwen/Qwen2-7b-Instruct \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -432,7 +432,7 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
 
 Here is an example to measure the tensor quantization statistics on Mixtral-8x7B with 1 card:
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py \
+QUANT_CONFIG=./quantization_config/maxabs_measure.json PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path mistralai/Mixtral-8x7B-v0.1 \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -445,7 +445,7 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py
 
 Here is an example to quantize the model based on previous measurements for Mixtral-8x7B with 1 card:
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json python run_generation.py \
+QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path mistralai/Mixtral-8x7B-v0.1 \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -534,7 +534,7 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
 Here is an example to measure the tensor quantization statistics on phi-2 with 1 card:
 
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_lm_eval.py \
+QUANT_CONFIG=./quantization_config/maxabs_measure.json PT_HPU_LAZY_MODE=1 python run_lm_eval.py \
 -o acc_phi-2_bs1_measure.txt  \
 --model_name_or_path microsoft/phi-2 \
 --use_hpu_graphs \
@@ -548,7 +548,7 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_lm_eval.py \
 
 Here is an example to quantize the model based on previous measurements for phi-2 with 1 card:
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_quant_phi.json python run_generation.py \
+QUANT_CONFIG=./quantization_config/maxabs_quant_phi.json PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path microsoft/phi-2 \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -562,7 +562,7 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_phi.json python run_generation.p
 Here is an example to measure the tensor quantization statistics on gemma with 1 card:
 
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py \
+QUANT_CONFIG=./quantization_config/maxabs_measure.json PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path google/gemma-7b \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -575,7 +575,7 @@ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py
 
 Here is an example to quantize the model based on previous measurements for gemma with 1 card:
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_quant_gemma.json python run_generation.py \
+QUANT_CONFIG=./quantization_config/maxabs_quant_gemma.json PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path google/gemma-7b \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -618,7 +618,7 @@ Here is an example of using disk_offload in quantize command.
 Please follow the [Running FP8 models on single device](#running-fp8-models-on-single-device) section first before running the cmd below.
 
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_quant.json TQDM_DISABLE=1 \
+QUANT_CONFIG=./quantization_config/maxabs_quant.json TQDM_DISABLE=1 PT_HPU_LAZY_MODE=1 \
 python run_generation.py \
 --model_name_or_path meta-llama/Llama-2-70b-hf \
 --attn_softmax_bf16 \
@@ -645,7 +645,7 @@ After quantizing the model, we can save it to a local path.
 
 Here is an example of how to quantize and save the LLama3.1-70B model on two cards:
 ```bash
-QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
+QUANT_CONFIG=./quantization_config/maxabs_quant.json PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
 --use_deepspeed --world_size 2 run_generation.py \
 --model_name_or_path meta-llama/Llama-3.1-70B \
 --attn_softmax_bf16 \
@@ -664,15 +664,40 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
 --saved_model_path <model_path_on_local_disk>
 ```
 
+<<<<<<< HEAD
 > [!NOTE]
 > For multi-card usage, the number of cards loaded and used needs to be kept consistent with that when saving.
+=======
+### Loading FP8 Checkpoints saved in Hugging Face format
+
+You can load pre-quantized FP8 models with the argument `--load_quantized_model_with_inc`. The `model_name_or_path` is the model path saved on local disk with upper command.
+
+Below is an example to load a model with FP8 checkpoints on 1 card.
+Please note that model name is denoted as `<model_path_on_local_disk>`
+```bash
+PT_HPU_LAZY_MODE=1 python run_lm_eval.py \
+-o acc_load_fp8_model.txt \
+--model_name_or_path <model_path_on_local_disk> \
+--use_hpu_graphs \
+--use_kv_cache \
+--trim_logits \
+--batch_size 1 \
+--bf16 \
+--use_flash_attention \
+--flash_attention_recompute \
+--attn_softmax_bf16 \
+--bucket_size=128 \
+--bucket_internal \
+--load_quantized_model_with_inc
+```
+>>>>>>> b56bafaf ([SW-218526] Updated Readme files for explicite lazy mode part2 (#177))
 
 ### Loading FP8 Checkpoints from Hugging Face
 You can load pre-quantized FP8 models using the `--load_quantized_model_with_inc` argument. The `model_name_or_path` should be a model name from [Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) or a path to FP8 Checkpoints saved in Hugging Face format.
 
 Below is an example of how to load `neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8` on two cards.
 ```bash
-python ../gaudi_spawn.py \
+PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
 --use_deepspeed --world_size 2 run_lm_eval.py \
 -o acc_load_fp8_model.txt \
 --model_name_or_path neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 \
@@ -700,7 +725,7 @@ Below is an example to load a model with 4bit checkpoints from Hugging Face.
 Please note that model name is denoted as `<model_path_in_hugging_face>`.
 
 ```bash
-python run_lm_eval.py \
+PT_HPU_LAZY_MODE=1 python run_lm_eval.py \
 -o acc_load_uint4_model.txt \
 --model_name_or_path <model_path_in_hugging_face> \
 --use_hpu_graphs \
@@ -726,7 +751,7 @@ Below is an example of loading a llama2-7b model with a 4bit checkpoint quantize
 Please note that the model checkpoint name is denoted as `<local_model_path_from_inc>`.
 
 ```bash
-python run_lm_eval.py \
+PT_HPU_LAZY_MODE=1 python run_lm_eval.py \
 -o acc_load_uint4_model.txt \
 --model_name_or_path meta-llama/Llama-2-7b-hf \
 --use_hpu_graphs \
@@ -781,7 +806,7 @@ You can run a *UINT4 weight quantized* model using AutoGPTQ by adding the argume
 
 Here is an example to run a quantized model <quantized_gptq_model>:
 ```bash
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --attn_softmax_bf16 \
 --model_name_or_path <quantized_gptq_model> \
 --use_hpu_graphs \
@@ -858,7 +883,7 @@ pip install -r requirements_lm_eval.txt
 
 Evaluate Llama 7B on Gaudi on task PiQA, using the BF16 data type:
 ```
-python run_lm_eval.py \
+PT_HPU_LAZY_MODE=1 python run_lm_eval.py \
 --model_name_or_path meta-llama/Llama-2-7b-hf \
 --use_hpu_graphs \
 --use_kv_cache \
@@ -870,7 +895,7 @@ python run_lm_eval.py \
 
 Evaluate Llama 70B on 8 Gaudi2 cards on task WinoGrande, using the BF16 data type:
 ```
-deepspeed --num_gpus 8 run_lm_eval.py \
+PT_HPU_LAZY_MODE=1 deepspeed --num_gpus 8 run_lm_eval.py \
 --model_name_or_path meta-llama/Llama-2-70b-hf \
 --use_hpu_graphs \
 --use_kv_cache \
diff --git a/examples/text-to-speech/README.md b/examples/text-to-speech/README.md
index 21070d275f..7e6429f22a 100644
--- a/examples/text-to-speech/README.md
+++ b/examples/text-to-speech/README.md
@@ -28,7 +28,7 @@ pip install -r requirements.txt
 ## Single-HPU inference
 
 ```bash
-python3 run_pipeline.py \
+PT_HPU_LAZY_MODE=1 python3 run_pipeline.py \
     --model_name_or_path microsoft/speecht5_tts \
     --text "Hello, my dog is cooler than you!" \
     --use_hpu_graphs \
diff --git a/examples/translation/README.md b/examples/translation/README.md
index 1d705d23fc..c88ed47bf1 100644
--- a/examples/translation/README.md
+++ b/examples/translation/README.md
@@ -34,7 +34,7 @@ Here is an example of a translation fine-tuning with a T5 model.
 T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For instance:
 
 ```bash
-python run_translation.py \
+PT_HPU_LAZY_MODE=1 python run_translation.py \
     --model_name_or_path t5-small \
     --do_train \
     --do_eval \
@@ -69,7 +69,7 @@ And here is how you would use the translation finetuning on your own files, afte
 values for the arguments `--train_file`, `--validation_file` to match your setup:
 
 ```bash
-python run_translation.py \
+PT_HPU_LAZY_MODE=1 python run_translation.py \
     --model_name_or_path t5-small \
     --do_train \
     --do_eval \
@@ -106,7 +106,7 @@ Here the languages are Romanian (`ro`) and English (`en`).
 If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name stas/wmt14-en-de-pre-processed`, as follows:
 
 ```bash
-python run_translation.py \
+PT_HPU_LAZY_MODE=1 python run_translation.py \
     --model_name_or_path t5-small \
     --do_train \
     --do_eval \
@@ -221,7 +221,7 @@ To run only inference, you can start from the commands above and you just have t
 
 For instance, you can run inference with BERT on GLUE on 1 Gaudi card with the following command:
 ```bash
-python run_translation.py \
+PT_HPU_LAZY_MODE=1 python run_translation.py \
     --model_name_or_path t5-small \
     --do_eval \
     --source_lang en \
diff --git a/examples/trl/README.md b/examples/trl/README.md
index 5e488e7072..59790b264b 100644
--- a/examples/trl/README.md
+++ b/examples/trl/README.md
@@ -12,7 +12,7 @@ $ pip install -U -r requirements.txt
 1. The following example is for the supervised Lora finetune with Qwen2 model for conversational format dataset.
 
     ```
-    python sft.py \
+    PT_HPU_LAZY_MODE=1 python sft.py \
         --model_name_or_path "Qwen/Qwen2-7B" \
         --dataset_name "philschmid/dolly-15k-oai-style" \
         --streaming False \
@@ -88,7 +88,7 @@ steps like:
 1. Supervised fine-tuning of the base llama-v2-70b model to create llama-v2-70b-se:
 
     ```
-    DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python ../gaudi_spawn.py --world_size 8 --use_deepspeed sft.py \
+    DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --world_size 8 --use_deepspeed sft.py \
         --model_name_or_path meta-llama/Llama-2-70b-hf \
         --dataset_name "lvwerra/stack-exchange-paired" \
         --deepspeed ../language-modeling/llama2_ds_zero3_config.json \
@@ -163,7 +163,7 @@ The following example is for the creation of StackLlaMa 2: a Stack exchange llam
 There are three main steps to the PPO training process:
 1. Supervised fine-tuning of the base llama-v2-7b model to create llama-v2-7b-se:
     ```
-    python ../gaudi_spawn.py --world_size 8 --use_mpi sft.py \
+    PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --world_size 8 --use_mpi sft.py \
         --model_name_or_path meta-llama/Llama-2-7b-hf \
         --dataset_name "lvwerra/stack-exchange-paired" \
         --output_dir="./sft" \
@@ -193,7 +193,7 @@ There are three main steps to the PPO training process:
     ```
 2. Reward modeling using dialog pairs from the SE dataset on the llama-v2-7b-se to create llama-v2-7b-se-rm
     ```
-    python ../gaudi_spawn.py --world_size 8 --use_mpi reward_modeling.py \
+    PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --world_size 8 --use_mpi reward_modeling.py \
         --model_name_or_path=./sft/final_merged_checkpoint \
         --tokenizer_name_or_path=meta-llama/Llama-2-7b-hf \
         --output_dir=./rm
@@ -206,7 +206,7 @@ There are three main steps to the PPO training process:
 
 3. RL fine-tuning of llama-v2-7b-se with the llama-v2-7b-se-rm reward model:
     ```
-    python ../gaudi_spawn.py --world_size 8 --use_mpi ppo.py \
+    PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --world_size 8 --use_mpi ppo.py \
         --model_name_or_path=./sft/final_merged_checkpoint \
         --reward_model_name=./rm_merged_checkpoint \
         --tokenizer_name_or_path=meta-llama/Llama-2-7b-hf \
@@ -231,7 +231,7 @@ There are three main steps to the PPO training process:
 We can load the PPO-trained LoRA adaptors which were saved by the PPO training step and run it through the [text-generation example](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation).
 
 ```
-python run_generation.py \
+PT_HPU_LAZY_MODE=1 python run_generation.py \
 --model_name_or_path ../trl/rl_merged_checkpoint/ \
 --use_hpu_graphs --use_kv_cache --batch_size 1 --bf16 --max_new_tokens 100 \
 --prompt "Here is my prompt"
@@ -251,7 +251,7 @@ There are two main steps to the DDPO training process:
 
 1. Fine-tuning of the base stable-diffusion model with LoRA to create ddpo-aesthetic-predictor:
 ```
-python ddpo.py \
+PT_HPU_LAZY_MODE=1 python ddpo.py \
   --num_epochs=200 \
   --train_gradient_accumulation_steps=1 \
   --sample_num_steps=50 \
diff --git a/examples/video-classification/README.md b/examples/video-classification/README.md
index 6e672b5c7c..ec762b7861 100644
--- a/examples/video-classification/README.md
+++ b/examples/video-classification/README.md
@@ -30,7 +30,7 @@ pip install -r requirements.txt
 ### Single video example
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
     --model_name_or_path MCG-NJU/videomae-base-finetuned-kinetics \
     --video_paths "https://ak.picdn.net/shutterstock/videos/21179416/preview/stock-footage-aerial-shot-winter-forest.mp4" \
     --use_hpu_graphs \
@@ -45,7 +45,7 @@ Predicted class for stock-footage-aerial-shot-winter-forest.mp4 is sled dog raci
 ### Multi-video example
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
     --model_name_or_path MCG-NJU/videomae-base-finetuned-kinetics \
     --use_hpu_graphs \
     --bf16 \
@@ -57,7 +57,7 @@ python3 run_example.py \
     "https://ak.picdn.net/shutterstock/videos/9607838/preview/stock-footage-zrenjanin-serbia-march-fans-watching-live-concert-bokeh-blur-urban-background-x.mp4"
 ```
 
-Outputs: 
+Outputs:
 ```
 Predicted class for stock-footage-senior-couple-looking-through-binoculars-on-sailboat-together-shot-on-red-epic-for-high-quality-k.mp4 is sailing and took 3.372e-01 seconds
 Predicted class for stock-footage-aerial-shot-winter-forest.mp4 is sled dog racing and took 3.360e-01 seconds
diff --git a/examples/video-comprehension/README.md b/examples/video-comprehension/README.md
index da54f26740..4a2790063a 100644
--- a/examples/video-comprehension/README.md
+++ b/examples/video-comprehension/README.md
@@ -20,7 +20,7 @@ This directory contains example scripts that demonstrate how to perform video co
 ### Video-LLaVA Model
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
     --model_name_or_path "LanguageBind/Video-LLaVA-7B-hf" \
     --warmup 3 \
     --n_iterations 5 \
diff --git a/examples/visual-question-answering/README.md b/examples/visual-question-answering/README.md
index 36f81e481b..89e25ae603 100644
--- a/examples/visual-question-answering/README.md
+++ b/examples/visual-question-answering/README.md
@@ -21,7 +21,7 @@ limitations under the License.
 The `run_pipeline.py` script showcases how to use the Transformers pipeline API to run visual question answering task on HPUs.
 
 ```bash
-python3 run_pipeline.py \
+PT_HPU_LAZY_MODE=1 python3 run_pipeline.py \
     --model_name_or_path Salesforce/blip-vqa-capfilt-large \
     --image_path "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" \
     --question "how many dogs are in the picture?" \
@@ -40,7 +40,7 @@ pip install -r openclip_requirements.txt
 By default, the script runs the sample outlined in [BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 notebook](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/blob/main/biomed_clip_example.ipynb). One can also can also run other OpenCLIP models by specifying model, classifier labels and image URL(s) like so:
 
 ```bash
-python run_openclip_vqa.py \
+PT_HPU_LAZY_MODE=1 python run_openclip_vqa.py \
     --model_name_or_path laion/CLIP-ViT-g-14-laion2B-s12B-b42K \
     --labels "a dog" "a cat" \
     --image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
diff --git a/examples/zero-shot-object-detection/README.md b/examples/zero-shot-object-detection/README.md
index eea67a8ce8..d80890cff0 100644
--- a/examples/zero-shot-object-detection/README.md
+++ b/examples/zero-shot-object-detection/README.md
@@ -21,7 +21,7 @@ This folder contains an example script which demonstrates the usage of OWL-ViT t
 ## Single-HPU inference
 
 ```bash
-python3 run_example.py \
+PT_HPU_LAZY_MODE=1 python3 run_example.py \
     --model_name_or_path google/owlvit-base-patch32 \
     --image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
     --prompt "a photo of a cat, a photo of a dog" \