From 926111f9fc7f8c645a2f7b2e29c767ba1c964e9e Mon Sep 17 00:00:00 2001
From: "Wang, Yi A" <yi.a.wang@intel.com>
Date: Wed, 16 Oct 2024 03:20:52 -0700
Subject: [PATCH] add peft generation example

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---
 examples/text-generation/README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
index 22390d6497..b99d3d8a6a 100755
--- a/examples/text-generation/README.md
+++ b/examples/text-generation/README.md
@@ -214,6 +214,22 @@ python run_generation.py \
 
 > The prompt length is limited to 16 tokens. Prompts longer than this will be truncated.
 
+### Use PEFT models for generation
+
+You can also provide the path to a PEFT model to perform generation with the argument `--peft_model`.
+
+For example:
+```bash
+python run_generation.py \
+--model_name_or_path meta-llama/Llama-2-7b-hf \
+--use_hpu_graphs \
+--use_kv_cache \
+--batch_size 1 \
+--bf16 \
+--max_new_tokens 100 \
+--prompt "Here is my prompt" \
+--peft_model yard1/llama-2-7b-sql-lora-test
+```
 
 ### Using growing bucket optimization