From 0a5272a27d1245ad5a4c3c442d92461e15cb914d Mon Sep 17 00:00:00 2001
From: yan tomsinsky <ytomsinsky@habana.ai>
Date: Wed, 8 May 2024 12:30:08 +0300
Subject: [PATCH] Remove --fp8 flag from README

---
 examples/text-generation/README.md | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
index 2844e8fb24..18a38ca707 100644
--- a/examples/text-generation/README.md
+++ b/examples/text-generation/README.md
@@ -107,7 +107,6 @@ Here are a few settings you may be interested in:
 - `--prompt` to benchmark the model on one or several prompts of your choice
 - `--attn_softmax_bf16` to run attention softmax layer in bfloat16 precision provided that the model (such as Llama) supports it
 - `--trim_logits` to calculate logits only for the last token in the first time step provided that the model (such as Llama) supports it
-- `--fp8` Enable Quantization to fp8
 
 For example, you can reproduce the results presented in [this blog post](https://huggingface.co/blog/habana-gaudi-2-bloom) with the following command:
 ```bash
@@ -284,7 +283,6 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
 --reuse_cache \
 --bf16 \
 --batch_size 1 \
---fp8
 ```
 
 Alternatively, here is another example to quantize the model based on previous measurements for LLama2-70b:
@@ -302,7 +300,6 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
 --max_new_tokens 2048 \
 --max_input_tokens 2048 \
 --limit_hpu_graphs \
---fp8
 ```
 
 Here is an example to measure the tensor quantization statistics on Mixtral-8x7B with 1 card:
@@ -329,7 +326,6 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json python run_generati
 --max_new_tokens 2048 \
 --batch_size 16 \
 --bf16 \
---fp8
 ```
 
 Here is an example to measure the tensor quantization statistics on Falcon-180B with 8 cards:
@@ -361,9 +357,7 @@ QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
 --bf16 \
 --reuse_cache \
 --trim_logits \
---fp8
 ```
-`--fp8` is required to enable quantization in fp8.
 
 
 ### Using Habana Flash Attention