From f48a84d5cfcff04a98cc7ff22b0b644dd4a850a2 Mon Sep 17 00:00:00 2001
From: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Date: Fri, 18 Oct 2024 10:42:23 -0700
Subject: [PATCH 1/2] Update text-gen README.md to add auto-gptq fork install
 steps

---
 examples/text-generation/README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
index 4257827596..e7769fff6b 100755
--- a/examples/text-generation/README.md
+++ b/examples/text-generation/README.md
@@ -593,6 +593,10 @@ For more details see [documentation](https://docs.habana.ai/en/latest/PyTorch/Mo
 Llama2-7b in UINT4 weight only quantization is enabled using [AutoGPTQ Fork](https://github.com/HabanaAI/AutoGPTQ), which provides quantization capabilities in PyTorch.
 Currently, the support is for UINT4 inference of pre-quantized models only.
 
+```bash
+BUILD_CUDA_EXT=0 python -m pip install -vvv --no-build-isolation git+https://github.com/HabanaAI/AutoGPTQ.git
+```
+
 You can run a *UINT4 weight quantized* model using AutoGPTQ by setting the following environment variables:
 `SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false ENABLE_EXPERIMENTAL_FLAGS=true` before running the command,
 and by adding the argument `--load_quantized_model_with_autogptq`.

From cf2b58a65e25d6393160d1462a8ff881477a634d Mon Sep 17 00:00:00 2001
From: Harish Subramony <81822986+hsubramony@users.noreply.github.com>
Date: Fri, 18 Oct 2024 11:19:06 -0700
Subject: [PATCH 2/2] update text-gen readme for tp strategy

---
 examples/text-generation/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
index e7769fff6b..8aaccfd124 100755
--- a/examples/text-generation/README.md
+++ b/examples/text-generation/README.md
@@ -282,7 +282,7 @@ You will also need to add `--torch_compile` and `--parallel_strategy="tp"` in yo
 Here is an example:
 ```bash
 PT_ENABLE_INT64_SUPPORT=1 PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py  --world_size 8 run_generation.py \
---model_name_or_path meta-llama/Llama-2-70b-hf  \
+--model_name_or_path meta-llama/Llama-2-7b-hf  \
 --trim_logits \
 --use_kv_cache \
 --attn_softmax_bf16 \