Gaudi Text-Generation Pipeline Blog#1734
Conversation
regisss
left a comment
There was a problem hiding this comment.
I think it would be nice to show an example of integration of the pipeline with LangChain as you shared with me at the beginning. Is that possible?
IMO what the pipeline brings to the table is to generate text from a prompt in one function call without caring about the (pre-/post-)processing, so it would be really cool to show an example of such an integration 🙂
I also left a couple of cosmetic comments.
| title: "Text-Generation Pipeline on Habana Gaudi2 Accelerator" | ||
| author: siddjags | ||
| thumbnail: /blog/assets/textgen-pipe-gaudi/thumbnail.png | ||
| date: Jan 4, 2024 |
There was a problem hiding this comment.
Let's make sure the date is updated when we merge it
There was a problem hiding this comment.
I have changed the date for now. Will change again before merging.
Thanks for reviewing this PR. I added a python snippet that shows how the pipeline can be used in custom scripts. Regarding LangChain, I am not sure if it's possible anymore as the original pipeline class was modified to incorporate methods from text-generation/utils.py. |
regisss
left a comment
There was a problem hiding this comment.
For LangChain, could you make sure it doesn't work as it is? If that's confirmed, we'll keep the example you added.
| print(f"Generated Text: {repr(output)}") | ||
| ``` | ||
|
|
||
| Note: You will have to run the above script with `python <name_of_script>.py --model_name_or_path gpt2` as `--model_name_or_path` is a required argument. However, the model name can be programatically changed as shown in the python snippet. |
There was a problem hiding this comment.
Good catch! Maybe we can make the model_name_or_path arg not required removing the line here, WDYT?
There was a problem hiding this comment.
We can make this arg not required and possibly set a small model (maybe gpt2) as the default value. Please let me know if the README needs to be edited to incorporate this change.
There was a problem hiding this comment.
Sounds good to me. I'll open a PR today and I'll let you know here when it is merged.
Yes, I tried feeding the pipeline to a few LangChain classes and it did not work. |
What is the error you got? Is it fixable keeping the same pipeline structure? |
Here's the code snippet and the corresponding error. # The pipeline class can also be used with langchain
llm = HuggingFacePipeline(pipeline=pipe)
template = """Answer the question based on the context below. If the question cannot be answered using the information provided answer with "I don't know".
Context: Large Language Models (LLMs) are the latest models used in NLP. Their superior performance over smaller models has made them incredibly useful for developers building NLP enabled applications. These models can be accessed via Hugging Face's `transformers` library, via OpenAI using the `openai` library, and via Cohere using the `cohere` library.
Question: {question}
Answer: """
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run("What are LLMs and what are there advantages?"))I don't think it's fixable without making major changes to the pipeline structure. |
|
What's the change that lead to this error compared to your first version that worked? |
|
I figured out the cause of the error message. The original pipeline was developed for an older version of LangChain. The newer versions require the pipeline input to be a list of strings instead of string. Also, we may have to add a custom stopping criteria as the models ignore eos tokens and keep generating text till max_new_tokens is reached. Let me know what you think. |
Okay, so if you provide a list of strings it works now right? |
Yeah, it does seem to work with a list of strings but leads to the following warning I tried setting |
This may be a bug in Optimum Habana, I'll take a look 👍 |
659921f to
7cfa1ec
Compare
FYI, the pipeline class works fine with LangChain version 0.0.191. Here's the output generated by the sample script. [WARNING|utils.py:198] 2024-02-07 05:09:18,834 >> optimum-habana v1.11.0.dev0 has been validated for SynapseAI v1.14.0 but the driver version is v1.12.0, this could lead to undefined behavior!
Fetching 3 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 16666.11it/s]
Fetching 3 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 6598.28it/s]
02/07/2024 05:09:19 - INFO - __main__ - Single-device run.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.05it/s]
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM : 1056399508 KB
------------------------------------------------------------------------------
02/07/2024 05:10:46 - INFO - __main__ - Args: Namespace(device='hpu', model_name_or_path='meta-llama/Llama-2-13b-chat-hf', bf16=False, max_new_tokens=1000, max_input_tokens=2048, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=True, num_beams=1, trim_logits=False, seed=27, profiling_warmup_steps=0, profiling_steps=0, prompt=None, bad_words=None, force_words=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=False, output_dir=None, bucket_size=-1, dataset_max_samples=-1, limit_hpu_graphs=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, kv_cache_fp8=False, fp8=False, use_flash_attention=False, torch_compile=False, temperature=0.2, top_p=0.95, quant_config='', world_size=0, global_rank=0)
02/07/2024 05:10:46 - INFO - __main__ - device: hpu, n_hpu: 0, bf16: False
02/07/2024 05:10:46 - INFO - __main__ - Model initialization took 87.551s
02/07/2024 05:10:46 - INFO - __main__ - Graph compilation...
Question 1: Which libraries and model providers offer LLMs?
Response 1: Based on the context, the following libraries and model providers offer LLMs:
1. Hugging Face's `transformers` library
2. OpenAI using the `openai` library
3. Cohere using the `cohere` library.
Question 2: What is the provided context about?
Response 2: The provided context is about Large Language Models (LLMs) and how they can be accessed via different libraries such as Hugging Face's `transformers` library, OpenAI's `openai` library, and Cohere's `cohere` library. |
|
LGTM! Gently pinging @pcuenca for approval and after that we can share a draft with Intel/Habana. |
pcuenca
left a comment
There was a problem hiding this comment.
Thank you! Made a couple of minor suggestions, and suggested to include the Meta part of the approval process.
|
|
||
| # Text-Generation Pipeline on Habana Gaudi2 Accelerator | ||
|
|
||
| With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code. |
There was a problem hiding this comment.
| With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama-2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article will demonstrate how easy it is to generate text with the Llama-2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class. You will then be able to generate text with only a few lines of code. | |
| With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama 2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article shows how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class – you'll be able to run the models with just a few lines of code! |
I think the official family name is "Llama 2"
| git clone https://github.com/huggingface/optimum-habana.git | ||
| ``` | ||
|
|
||
| In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0. |
There was a problem hiding this comment.
| In case you are planning to run distributed inference, install DeepSpeed depending on SynapseAI version. In this case, I am using SynapseAI 1.14.0. | |
| In case you are planning to run distributed inference, install DeepSpeed depending on your SynapseAI version. In this case, I am using SynapseAI 1.14.0. |
| Now you are all set to perform text-generation with the pipeline! | ||
|
|
||
| ## Using the Pipeline | ||
| Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`. |
There was a problem hiding this comment.
| Run the following command to access the pipeline scripts and follow the instructions provided in the README to update your `PYTHONPATH`. | |
| First, go to the following directory in your `optimum-habana` checkout where the pipeline scripts are located, and follow the instructions in the `README` to update your `PYTHONPATH`. |
| python ../../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py --model_name_or_path meta-llama/Llama-2-70b-hf --max_new_tokens 100 --bf16 --use_hpu_graphs --use_kv_cache --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?" "Here is my prompt" "Once upon a time" | ||
| ``` | ||
|
|
||
| Last but not the least, you can use the pipeline class in your own scripts as shown in the example below. Run the following sample script from `optimum-habana/examples/text-generation/text-generation-pipeline`. |
There was a problem hiding this comment.
Maybe add a subsection for Python use?
There was a problem hiding this comment.
I added a new subsection 'Usage in Python Scripts'.
| python run_pipeline.py --model_name_or_path meta-llama/Llama-2-7b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" | ||
| ``` | ||
|
|
||
| You can also pass multiple prompts as input and change the temperature and top_p values for generation as follows. |
There was a problem hiding this comment.
Are they batched? Do we observe performance benefits if so?
There was a problem hiding this comment.
Unfortunately, the pipeline does not support batching.
|
|
||
| ## Conclusion | ||
|
|
||
| In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain. |
There was a problem hiding this comment.
| In this blog, we presented a custom text-generation pipeline that accepts single as well as multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts and is compatible with LangChain. | |
| We presented a custom text-generation pipeline on Habana Gaudi2 that accepts single or multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts, and is compatible with LangChain. |
regisss
left a comment
There was a problem hiding this comment.
LGTM!
I'm going to generate a PDF preview of this blog post and share it with you @sjagtap1803.
b48cab2 to
a372903
Compare
|
Hi @regisss. I pushed some new changes to this PR. Would be great if you could review and approve it. Thanks! |
regisss
left a comment
There was a problem hiding this comment.
LGTM
I'll also sync with my contacts at Habana to make sure they are in the loop.
| > Use of the pretrained model is subject to compliance with third party licenses, including the “Llama 2 Community License Agreement” (LLAMAV2). For guidance on the intended use of the LLAMA2 model, what will be considered misuse and out-of-scope uses, who are the intended users and additional terms please review and read the instructions in this link [https://ai.meta.com/llama/license/](https://ai.meta.com/llama/license/). Users bear sole liability and responsibility to follow and comply with any third party licenses, and Habana Labs disclaims and will bear no liability with respect to users’ use or compliance with third party licenses. | ||
| To be able to run gated models like this Llama-2-70b-hf, you need the following: | ||
| > * Have a HuggingFace account | ||
| > * Agree to the terms of use of the model in its model card on the HF Hub | ||
| > * set a read token | ||
| > * Login to your account using the HF CLI: run huggingface-cli login before launching your script |
There was a problem hiding this comment.
Redundant with line 15 IMO. Let's keep it if that matters for Intel, otherwise we can remove it.
regisss
left a comment
There was a problem hiding this comment.
@sjagtap1803 Ileft a couple of comments to update the date and the version to install. Can you quickly run the example on your side to make sure if works with Optimum Habana v1.10.4 please? It should, but let's be sure.
Also, there are some merge conflicts to solve related to the blog posts that were published in the meantime.
I tested the blog examples with optimum habana v1.10.4 and they seem to work fine. Sharing a screenshot for your reference. |



This blogpost provides instructions to use the Gaudi Text-Generation Pipeline from optimum-habana: (huggingface/optimum-habana#526)