|
| 1 | +--- |
| 2 | +title: Haystack |
| 3 | +--- |
| 4 | +[](){ #deployment-haystack } |
| 5 | + |
| 6 | +# Haystack |
| 7 | + |
| 8 | +[Haystack](https://github.com/deepset-ai/haystack) is an end-to-end LLM framework that allows you to build applications powered by LLMs, Transformer models, vector search and more. Whether you want to perform retrieval-augmented generation (RAG), document search, question answering or answer generation, Haystack can orchestrate state-of-the-art embedding models and LLMs into pipelines to build end-to-end NLP applications and solve your use case. |
| 9 | + |
| 10 | +It allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | + |
| 14 | +- Setup vLLM and Haystack environment |
| 15 | + |
| 16 | +```console |
| 17 | +pip install vllm haystack-ai |
| 18 | +``` |
| 19 | + |
| 20 | +## Deploy |
| 21 | + |
| 22 | +- Start the vLLM server with the supported chat completion model, e.g. |
| 23 | + |
| 24 | +```console |
| 25 | +vllm serve mistralai/Mistral-7B-Instruct-v0.1 |
| 26 | +``` |
| 27 | + |
| 28 | +- Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server. |
| 29 | + |
| 30 | +```python |
| 31 | +from haystack.components.generators.chat import OpenAIChatGenerator |
| 32 | +from haystack.dataclasses import ChatMessage |
| 33 | +from haystack.utils import Secret |
| 34 | + |
| 35 | +generator = OpenAIChatGenerator( |
| 36 | + # for compatibility with the OpenAI API, a placeholder api_key is needed |
| 37 | + api_key=Secret.from_token("VLLM-PLACEHOLDER-API-KEY"), |
| 38 | + model="mistralai/Mistral-7B-Instruct-v0.1", |
| 39 | + api_base_url="http://{your-vLLM-host-ip}:{your-vLLM-host-port}/v1", |
| 40 | + generation_kwargs = {"max_tokens": 512} |
| 41 | +) |
| 42 | + |
| 43 | +response = generator.run( |
| 44 | + messages=[ChatMessage.from_user("Hi. Can you help me plan my next trip to Italy?")] |
| 45 | +) |
| 46 | + |
| 47 | +print("-"*30) |
| 48 | +print(response) |
| 49 | +print("-"*30) |
| 50 | +``` |
| 51 | + |
| 52 | +Output e.g.: |
| 53 | + |
| 54 | +```console |
| 55 | +------------------------------ |
| 56 | +{'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=' Of course! Where in Italy would you like to go and what type of trip are you looking to plan?')], _name=None, _meta={'model': 'mistralai/Mistral-7B-Instruct-v0.1', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 23, 'prompt_tokens': 21, 'total_tokens': 44, 'completion_tokens_details': None, 'prompt_tokens_details': None}})]} |
| 57 | +------------------------------ |
| 58 | +``` |
| 59 | + |
| 60 | +For details, see the tutorial [Using vLLM in Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/vllm.md). |
0 commit comments