Gaudi Text-Generation Pipeline Blog by sjagtap1803 · Pull Request #1734 · huggingface/blog

sjagtap1803 · 2024-01-04T19:14:16Z

This blogpost provides instructions to use the Gaudi Text-Generation Pipeline from optimum-habana: (huggingface/optimum-habana#526)

regisss

I think it would be nice to show an example of integration of the pipeline with LangChain as you shared with me at the beginning. Is that possible?
IMO what the pipeline brings to the table is to generate text from a prompt in one function call without caring about the (pre-/post-)processing, so it would be really cool to show an example of such an integration 🙂

I also left a couple of cosmetic comments.

regisss · 2024-01-05T09:37:14Z

+  title: "Text-Generation Pipeline on Habana Gaudi2 Accelerator"
+  author: siddjags
+  thumbnail: /blog/assets/textgen-pipe-gaudi/thumbnail.png
+  date: Jan 4, 2024


Let's make sure the date is updated when we merge it

I have changed the date for now. Will change again before merging.

sjagtap1803 · 2024-01-05T21:37:38Z

I think it would be nice to show an example of integration of the pipeline with LangChain as you shared with me at the beginning. Is that possible? IMO what the pipeline brings to the table is to generate text from a prompt in one function call without caring about the (pre-/post-)processing, so it would be really cool to show an example of such an integration 🙂

I also left a couple of cosmetic comments.

Thanks for reviewing this PR. I added a python snippet that shows how the pipeline can be used in custom scripts.

Regarding LangChain, I am not sure if it's possible anymore as the original pipeline class was modified to incorporate methods from text-generation/utils.py.

regisss

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

regisss · 2024-01-08T09:19:42Z

+    print(f"Generated Text: {repr(output)}")
+```
+
+Note: You will have to run the above script with `python <name_of_script>.py --model_name_or_path gpt2` as `--model_name_or_path` is a required argument. However, the model name can be programatically changed as shown in the python snippet.


Good catch! Maybe we can make the model_name_or_path arg not required removing the line here, WDYT?

We can make this arg not required and possibly set a small model (maybe gpt2) as the default value. Please let me know if the README needs to be edited to incorporate this change.

Sounds good to me. I'll open a PR today and I'll let you know here when it is merged.

sjagtap1803 · 2024-01-08T16:29:59Z

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Yes, I tried feeding the pipeline to a few LangChain classes and it did not work.

regisss · 2024-01-09T08:55:04Z

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Yes, I tried feeding the pipeline to a few LangChain classes and it did not work.

What is the error you got? Is it fixable keeping the same pipeline structure?

sjagtap1803 · 2024-01-09T20:03:23Z

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Yes, I tried feeding the pipeline to a few LangChain classes and it did not work.

What is the error you got? Is it fixable keeping the same pipeline structure?

Here's the code snippet and the corresponding error.

# The pipeline class can also be used with langchain
llm = HuggingFacePipeline(pipeline=pipe)

template = """Answer the question based on the context below. If the question cannot be answered using the information provided answer with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP. Their superior performance over smaller models has made them incredibly useful for developers building NLP enabled applications. These models can be accessed via Hugging Face's `transformers` library, via OpenAI using the `openai` library, and via Cohere using the `cohere` library.

Question: {question}

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.run("What are LLMs and what are there advantages?"))

I don't think it's fixable without making major changes to the pipeline structure.

regisss · 2024-01-12T14:04:52Z

What's the change that lead to this error compared to your first version that worked?
Because here it seems the type of the input provided to the tokenizer is wrong, but you were also using a tokenizer in your first version I guess?

sjagtap1803 · 2024-01-12T16:19:44Z

I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think.

regisss · 2024-01-12T16:35:54Z

I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think.

Okay, so if you provide a list of strings it works now right?
For the generation process ignoring eos tokens, have you tried setting ignore_eos=False in the generation config?

sjagtap1803 · 2024-01-12T17:21:33Z

I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think.

Okay, so if you provide a list of strings it works now right? For the generation process ignoring eos tokens, have you tried setting ignore_eos=False in the generation config?

Yeah, it does seem to work with a list of strings but leads to the following warning

I tried setting ignore_eos=False but the generated response does have some additional content at the end.

regisss · 2024-01-12T18:03:54Z

I tried setting ignore_eos=False but the generated response does have some additional content at the end.

This may be a bug in Optimum Habana, I'll take a look 👍

sjagtap1803 · 2024-02-07T06:26:23Z

I tried setting ignore_eos=False but the generated response does have some additional content at the end.

This may be a bug in Optimum Habana, I'll take a look 👍

FYI, the pipeline class works fine with LangChain version 0.0.191. Here's the output generated by the sample script.

[WARNING|utils.py:198] 2024-02-07 05:09:18,834 >> optimum-habana v1.11.0.dev0 has been validated for SynapseAI v1.14.0 but the driver version is v1.12.0, this could lead to undefined behavior!
Fetching 3 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 16666.11it/s]
Fetching 3 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 6598.28it/s]
02/07/2024 05:09:19 - INFO - __main__ - Single-device run.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.05it/s]
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM       : 1056399508 KB
------------------------------------------------------------------------------
02/07/2024 05:10:46 - INFO - __main__ - Args: Namespace(device='hpu', model_name_or_path='meta-llama/Llama-2-13b-chat-hf', bf16=False, max_new_tokens=1000, max_input_tokens=2048, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=True, num_beams=1, trim_logits=False, seed=27, profiling_warmup_steps=0, profiling_steps=0, prompt=None, bad_words=None, force_words=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=False, output_dir=None, bucket_size=-1, dataset_max_samples=-1, limit_hpu_graphs=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, kv_cache_fp8=False, fp8=False, use_flash_attention=False, torch_compile=False, temperature=0.2, top_p=0.95, quant_config='', world_size=0, global_rank=0)
02/07/2024 05:10:46 - INFO - __main__ - device: hpu, n_hpu: 0, bf16: False
02/07/2024 05:10:46 - INFO - __main__ - Model initialization took 87.551s
02/07/2024 05:10:46 - INFO - __main__ - Graph compilation...
Question 1: Which libraries and model providers offer LLMs?
Response 1:  Based on the context, the following libraries and model providers offer LLMs:

1. Hugging Face's `transformers` library
2. OpenAI using the `openai` library
3. Cohere using the `cohere` library.

Question 2: What is the provided context about?
Response 2:  The provided context is about Large Language Models (LLMs) and how they can be accessed via different libraries such as Hugging Face's `transformers` library, OpenAI's `openai` library, and Cohere's `cohere` library.

regisss · 2024-02-07T07:39:00Z

LGTM!

Gently pinging @pcuenca for approval and after that we can share a draft with Intel/Habana.

pcuenca

Thank you! Made a couple of minor suggestions, and suggested to include the Meta part of the approval process.

pcuenca · 2024-02-07T08:05:39Z

+
+# Text-Generation Pipeline on Habana Gaudi2 Accelerator
+
+With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code.


Suggested change

With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code.

With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama 2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article shows how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class – you'll be able to run the models with just a few lines of code!

I think the official family name is "Llama 2"

pcuenca · 2024-02-07T08:15:52Z

+git clone https://github.com/huggingface/optimum-habana.git
+```
+
+In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0.


Suggested change

In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0.

In case you are planning to run distributed inference, install DeepSpeed depending on your SynapseAI version. In this case, I am using SynapseAI 1.14.0.

pcuenca · 2024-02-07T08:21:42Z

+Now you are all set to perform text-generation with the pipeline!
+
+## Using the Pipeline
+Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`.


Suggested change

Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`.

First, go to the following directory in your `optimum-habana` checkout where the pipeline scripts are located, and follow the instructions in the `README` to update your `PYTHONPATH`.

pcuenca · 2024-02-07T08:23:51Z

+python ../../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py --model_name_or_path meta-llama/Llama-2-70b-hf --max_new_tokens 100 --bf16 --use_hpu_graphs --use_kv_cache --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?" "Here is my prompt" "Once upon a time"
+```
+
+Last but not the least, you can use the pipeline class in your own scripts as shown in the example below. Run the following sample script from `optimum-habana/examples/text-generation/text-generation-pipeline`.


Maybe add a subsection for Python use?

I added a new subsection 'Usage in Python Scripts'.

pcuenca · 2024-02-07T08:26:55Z

+python run_pipeline.py  --model_name_or_path meta-llama/Llama-2-7b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt"
+```
+
+You can also pass multiple prompts as input and change the temperature and top_p values for generation as follows.


Are they batched? Do we observe performance benefits if so?

Unfortunately, the pipeline does not support batching.

pcuenca · 2024-02-07T08:29:25Z

+
+## Conclusion
+
+In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain.


Suggested change

In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain.

We presented a custom text-generation pipeline on Habana Gaudi2 that accepts single or multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts, and is compatible with LangChain.

regisss

LGTM!
I'm going to generate a PDF preview of this blog post and share it with you @sjagtap1803.

sjagtap1803 · 2024-02-12T04:13:59Z

Hi @regisss. I pushed some new changes to this PR. Would be great if you could review and approve it.

Thanks!

regisss

LGTM

I'll also sync with my contacts at Habana to make sure they are in the loop.

regisss · 2024-02-12T05:47:15Z

+> Use of the pretrained model is subject to compliance with third party licenses, including the “Llama 2 Community License Agreement” (LLAMAV2). For guidance on the intended use of the LLAMA2 model, what will be considered misuse and out-of-scope uses, who are the intended users and additional terms please review and read the instructions in this link [https://ai.meta.com/llama/license/](https://ai.meta.com/llama/license/). Users bear sole liability and responsibility to follow and comply with any third party licenses, and Habana Labs disclaims and will bear no liability with respect to users’ use or compliance with third party licenses.
+To be able to run gated models like this Llama-2-70b-hf, you need the following:
+> * Have a HuggingFace account
+> * Agree to the terms of use of the model in its model card on the HF Hub
+> * set a read token
+> * Login to your account using the HF CLI: run huggingface-cli login before launching your script


Redundant with line 15 IMO. Let's keep it if that matters for Intel, otherwise we can remove it.

regisss

@sjagtap1803 Ileft a couple of comments to update the date and the version to install. Can you quickly run the example on your side to make sure if works with Optimum Habana v1.10.4 please? It should, but let's be sure.

Also, there are some merge conflicts to solve related to the blog posts that were published in the meantime.

sjagtap1803 · 2024-02-29T16:37:10Z

@sjagtap1803 Ileft a couple of comments to update the date and the version to install. Can you quickly run the example on your side to make sure if works with Optimum Habana v1.10.4 please? It should, but let's be sure.

Also, there are some merge conflicts to solve related to the blog posts that were published in the meantime.

I tested the blog examples with optimum habana v1.10.4 and they seem to work fine. Sharing a screenshot for your reference.

regisss reviewed Jan 5, 2024

View reviewed changes

regisss reviewed Jan 8, 2024

View reviewed changes

sjagtap1803 force-pushed the sjagtap1803/textgen-pipe-gaudi branch from 659921f to 7cfa1ec Compare February 7, 2024 04:54

regisss reviewed Feb 7, 2024

View reviewed changes

Comment thread textgen-pipe-gaudi.md Outdated

Comment thread textgen-pipe-gaudi.md Outdated

Comment thread textgen-pipe-gaudi.md Outdated

pcuenca approved these changes Feb 7, 2024

View reviewed changes

regisss approved these changes Feb 8, 2024

View reviewed changes

sjagtap1803 added 12 commits February 12, 2024 09:26

added files required for textgen pipeline blog

7a57dca

changed thumbnail extension

4e026c8

made cosmetic changes and added python snippet

b3657ca

added langchain content and snippet

b2c1eaa

included langchain object usage examples

9fb0842

made some cosmetic changes

226347d

included all suggestions

3904b21

fixed typos

3b6e3cf

added git clone command

f6d9164

included accelerator official name and Llama 2 disclaimer

910e26b

install text-gen examples requirements

5218f63

added paragraph explaining benefits of pipeline class

a372903

sjagtap1803 force-pushed the sjagtap1803/textgen-pipe-gaudi branch from b48cab2 to a372903 Compare February 12, 2024 03:57

regisss approved these changes Feb 12, 2024

View reviewed changes

regisss reviewed Feb 12, 2024

View reviewed changes

Comment thread textgen-pipe-gaudi.md Outdated

locked in version 1.10.0

c5a9e90

regisss reviewed Feb 29, 2024

View reviewed changes

Comment thread textgen-pipe-gaudi.md Outdated

Comment thread _blog.yml Outdated

sjagtap1803 added 2 commits February 29, 2024 10:28

update optimum habana version and publish date

0673d00

fixed merge conflicts in _blog.yml

e371ae6

regisss merged commit 804022d into huggingface:main Mar 1, 2024


		# Text-Generation Pipeline on Habana Gaudi2 Accelerator

		With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code.

	In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0.
	In case you are planning to run distributed inference, install DeepSpeed depending on your SynapseAI version. In this case, I am using SynapseAI 1.14.0.

	Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`.
	First, go to the following directory in your `optimum-habana` checkout where the pipeline scripts are located, and follow the instructions in the `README` to update your `PYTHONPATH`.


		## Conclusion

		In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain.

	In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain.
	We presented a custom text-generation pipeline on Habana Gaudi2 that accepts single or multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts, and is compatible with LangChain.

Conversation

sjagtap1803 commented Jan 4, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sjagtap1803 commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjagtap1803 Jan 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjagtap1803 commented Jan 8, 2024

Uh oh!

regisss commented Jan 9, 2024

Uh oh!

sjagtap1803 commented Jan 9, 2024

Uh oh!

regisss commented Jan 12, 2024

Uh oh!

sjagtap1803 commented Jan 12, 2024

Uh oh!

regisss commented Jan 12, 2024

Uh oh!

sjagtap1803 commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss commented Jan 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjagtap1803 commented Feb 7, 2024

Uh oh!

regisss commented Feb 7, 2024

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sjagtap1803 commented Jan 5, 2024 •

edited

Loading

sjagtap1803 Jan 8, 2024 •

edited

Loading

sjagtap1803 commented Jan 12, 2024 •

edited

Loading