Skip to content

Gaudi Text-Generation Pipeline Blog#1734

Merged
regisss merged 15 commits into
huggingface:mainfrom
sjagtap1803:sjagtap1803/textgen-pipe-gaudi
Mar 1, 2024
Merged

Gaudi Text-Generation Pipeline Blog#1734
regisss merged 15 commits into
huggingface:mainfrom
sjagtap1803:sjagtap1803/textgen-pipe-gaudi

Conversation

@sjagtap1803
Copy link
Copy Markdown
Contributor

This blogpost provides instructions to use the Gaudi Text-Generation Pipeline from optimum-habana: (huggingface/optimum-habana#526)

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to show an example of integration of the pipeline with LangChain as you shared with me at the beginning. Is that possible?
IMO what the pipeline brings to the table is to generate text from a prompt in one function call without caring about the (pre-/post-)processing, so it would be really cool to show an example of such an integration 🙂

I also left a couple of cosmetic comments.

Comment thread _blog.yml
Comment thread _blog.yml Outdated
title: "Text-Generation Pipeline on Habana Gaudi2 Accelerator"
author: siddjags
thumbnail: /blog/assets/textgen-pipe-gaudi/thumbnail.png
date: Jan 4, 2024
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure the date is updated when we merge it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the date for now. Will change again before merging.

Comment thread textgen-pipe-gaudi.md Outdated
Comment thread textgen-pipe-gaudi.md Outdated
@sjagtap1803
Copy link
Copy Markdown
Contributor Author

sjagtap1803 commented Jan 5, 2024

I think it would be nice to show an example of integration of the pipeline with LangChain as you shared with me at the beginning. Is that possible? IMO what the pipeline brings to the table is to generate text from a prompt in one function call without caring about the (pre-/post-)processing, so it would be really cool to show an example of such an integration 🙂

I also left a couple of cosmetic comments.

Thanks for reviewing this PR. I added a python snippet that shows how the pipeline can be used in custom scripts.

Regarding LangChain, I am not sure if it's possible anymore as the original pipeline class was modified to incorporate methods from text-generation/utils.py.

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Comment thread textgen-pipe-gaudi.md Outdated
print(f"Generated Text: {repr(output)}")
```

Note: You will have to run the above script with `python <name_of_script>.py --model_name_or_path gpt2` as `--model_name_or_path` is a required argument. However, the model name can be programatically changed as shown in the python snippet.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Maybe we can make the model_name_or_path arg not required removing the line here, WDYT?

Copy link
Copy Markdown
Contributor Author

@sjagtap1803 sjagtap1803 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this arg not required and possibly set a small model (maybe gpt2) as the default value. Please let me know if the README needs to be edited to incorporate this change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I'll open a PR today and I'll let you know here when it is merged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sjagtap1803
Copy link
Copy Markdown
Contributor Author

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Yes, I tried feeding the pipeline to a few LangChain classes and it did not work.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Jan 9, 2024

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Yes, I tried feeding the pipeline to a few LangChain classes and it did not work.

What is the error you got? Is it fixable keeping the same pipeline structure?

@sjagtap1803
Copy link
Copy Markdown
Contributor Author

For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.

Yes, I tried feeding the pipeline to a few LangChain classes and it did not work.

What is the error you got? Is it fixable keeping the same pipeline structure?

Here's the code snippet and the corresponding error.

# The pipeline class can also be used with langchain
llm = HuggingFacePipeline(pipeline=pipe)

template = """Answer the question based on the context below. If the question cannot be answered using the information provided answer with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP. Their superior performance over smaller models has made them incredibly useful for developers building NLP enabled applications. These models can be accessed via Hugging Face's `transformers` library, via OpenAI using the `openai` library, and via Cohere using the `cohere` library.

Question: {question}

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.run("What are LLMs and what are there advantages?"))

image

I don't think it's fixable without making major changes to the pipeline structure.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Jan 12, 2024

What's the change that lead to this error compared to your first version that worked?
Because here it seems the type of the input provided to the tokenizer is wrong, but you were also using a tokenizer in your first version I guess?

@sjagtap1803
Copy link
Copy Markdown
Contributor Author

I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Jan 12, 2024

I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think.

Okay, so if you provide a list of strings it works now right?
For the generation process ignoring eos tokens, have you tried setting ignore_eos=False in the generation config?

@sjagtap1803
Copy link
Copy Markdown
Contributor Author

sjagtap1803 commented Jan 12, 2024

I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think.

Okay, so if you provide a list of strings it works now right? For the generation process ignoring eos tokens, have you tried setting ignore_eos=False in the generation config?

Yeah, it does seem to work with a list of strings but leads to the following warning

image

I tried setting ignore_eos=False but the generated response does have some additional content at the end.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Jan 12, 2024

I tried setting ignore_eos=False but the generated response does have some additional content at the end.

This may be a bug in Optimum Habana, I'll take a look 👍

@sjagtap1803 sjagtap1803 force-pushed the sjagtap1803/textgen-pipe-gaudi branch from 659921f to 7cfa1ec Compare February 7, 2024 04:54
Comment thread textgen-pipe-gaudi.md Outdated
Comment thread textgen-pipe-gaudi.md Outdated
Comment thread textgen-pipe-gaudi.md Outdated
@sjagtap1803
Copy link
Copy Markdown
Contributor Author

I tried setting ignore_eos=False but the generated response does have some additional content at the end.

This may be a bug in Optimum Habana, I'll take a look 👍

FYI, the pipeline class works fine with LangChain version 0.0.191. Here's the output generated by the sample script.

[WARNING|utils.py:198] 2024-02-07 05:09:18,834 >> optimum-habana v1.11.0.dev0 has been validated for SynapseAI v1.14.0 but the driver version is v1.12.0, this could lead to undefined behavior!
Fetching 3 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 16666.11it/s]
Fetching 3 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 6598.28it/s]
02/07/2024 05:09:19 - INFO - __main__ - Single-device run.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.05it/s]
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM       : 1056399508 KB
------------------------------------------------------------------------------
02/07/2024 05:10:46 - INFO - __main__ - Args: Namespace(device='hpu', model_name_or_path='meta-llama/Llama-2-13b-chat-hf', bf16=False, max_new_tokens=1000, max_input_tokens=2048, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=True, num_beams=1, trim_logits=False, seed=27, profiling_warmup_steps=0, profiling_steps=0, prompt=None, bad_words=None, force_words=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=False, output_dir=None, bucket_size=-1, dataset_max_samples=-1, limit_hpu_graphs=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, kv_cache_fp8=False, fp8=False, use_flash_attention=False, torch_compile=False, temperature=0.2, top_p=0.95, quant_config='', world_size=0, global_rank=0)
02/07/2024 05:10:46 - INFO - __main__ - device: hpu, n_hpu: 0, bf16: False
02/07/2024 05:10:46 - INFO - __main__ - Model initialization took 87.551s
02/07/2024 05:10:46 - INFO - __main__ - Graph compilation...
Question 1: Which libraries and model providers offer LLMs?
Response 1:  Based on the context, the following libraries and model providers offer LLMs:

1. Hugging Face's `transformers` library
2. OpenAI using the `openai` library
3. Cohere using the `cohere` library.

Question 2: What is the provided context about?
Response 2:  The provided context is about Large Language Models (LLMs) and how they can be accessed via different libraries such as Hugging Face's `transformers` library, OpenAI's `openai` library, and Cohere's `cohere` library.

@regisss
Copy link
Copy Markdown
Contributor

regisss commented Feb 7, 2024

LGTM!

Gently pinging @pcuenca for approval and after that we can share a draft with Intel/Habana.

Copy link
Copy Markdown
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Made a couple of minor suggestions, and suggested to include the Meta part of the approval process.

Comment thread textgen-pipe-gaudi.md
Comment thread textgen-pipe-gaudi.md Outdated

# Text-Generation Pipeline on Habana Gaudi2 Accelerator

With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code.
With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama 2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article shows how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class – you'll be able to run the models with just a few lines of code!

I think the official family name is "Llama 2"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread textgen-pipe-gaudi.md Outdated
Comment thread textgen-pipe-gaudi.md Outdated
git clone https://github.com/huggingface/optimum-habana.git
```

In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0.
In case you are planning to run distributed inference, install DeepSpeed depending on your SynapseAI version. In this case, I am using SynapseAI 1.14.0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread textgen-pipe-gaudi.md
Comment thread textgen-pipe-gaudi.md Outdated
Now you are all set to perform text-generation with the pipeline!

## Using the Pipeline
Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`.
First, go to the following directory in your `optimum-habana` checkout where the pipeline scripts are located, and follow the instructions in the `README` to update your `PYTHONPATH`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread textgen-pipe-gaudi.md Outdated
python ../../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py --model_name_or_path meta-llama/Llama-2-70b-hf --max_new_tokens 100 --bf16 --use_hpu_graphs --use_kv_cache --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?" "Here is my prompt" "Once upon a time"
```

Last but not the least, you can use the pipeline class in your own scripts as shown in the example below. Run the following sample script from `optimum-habana/examples/text-generation/text-generation-pipeline`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a subsection for Python use?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new subsection 'Usage in Python Scripts'.

Comment thread textgen-pipe-gaudi.md
python run_pipeline.py --model_name_or_path meta-llama/Llama-2-7b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt"
```

You can also pass multiple prompts as input and change the temperature and top_p values for generation as follows.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they batched? Do we observe performance benefits if so?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the pipeline does not support batching.

Comment thread textgen-pipe-gaudi.md Outdated

## Conclusion

In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain.
We presented a custom text-generation pipeline on Habana Gaudi2 that accepts single or multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts, and is compatible with LangChain.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I'm going to generate a PDF preview of this blog post and share it with you @sjagtap1803.

@sjagtap1803 sjagtap1803 force-pushed the sjagtap1803/textgen-pipe-gaudi branch from b48cab2 to a372903 Compare February 12, 2024 03:57
@sjagtap1803
Copy link
Copy Markdown
Contributor Author

Hi @regisss. I pushed some new changes to this PR. Would be great if you could review and approve it.

Thanks!

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'll also sync with my contacts at Habana to make sure they are in the loop.

Comment thread textgen-pipe-gaudi.md
Comment on lines +192 to +197
> Use of the pretrained model is subject to compliance with third party licenses, including the “Llama 2 Community License Agreement” (LLAMAV2). For guidance on the intended use of the LLAMA2 model, what will be considered misuse and out-of-scope uses, who are the intended users and additional terms please review and read the instructions in this link [https://ai.meta.com/llama/license/](https://ai.meta.com/llama/license/). Users bear sole liability and responsibility to follow and comply with any third party licenses, and Habana Labs disclaims and will bear no liability with respect to users’ use or compliance with third party licenses.
To be able to run gated models like this Llama-2-70b-hf, you need the following:
> * Have a HuggingFace account
> * Agree to the terms of use of the model in its model card on the HF Hub
> * set a read token
> * Login to your account using the HF CLI: run huggingface-cli login before launching your script
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant with line 15 IMO. Let's keep it if that matters for Intel, otherwise we can remove it.

Comment thread textgen-pipe-gaudi.md Outdated
Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjagtap1803 Ileft a couple of comments to update the date and the version to install. Can you quickly run the example on your side to make sure if works with Optimum Habana v1.10.4 please? It should, but let's be sure.

Also, there are some merge conflicts to solve related to the blog posts that were published in the meantime.

Comment thread textgen-pipe-gaudi.md Outdated
Comment thread _blog.yml Outdated
@sjagtap1803
Copy link
Copy Markdown
Contributor Author

@sjagtap1803 Ileft a couple of comments to update the date and the version to install. Can you quickly run the example on your side to make sure if works with Optimum Habana v1.10.4 please? It should, but let's be sure.

Also, there are some merge conflicts to solve related to the blog posts that were published in the meantime.

I tested the blog examples with optimum habana v1.10.4 and they seem to work fine. Sharing a screenshot for your reference.

image

@regisss regisss merged commit 804022d into huggingface:main Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants