vllm-project · DarkLight1337 · May 26, 2025 · May 26, 2025 · May 26, 2025
diff --git a/docs/deployment/frameworks/autogen.md b/docs/deployment/frameworks/autogen.md
@@ -0,0 +1,83 @@
+---
+title: AutoGen
+---
+[](){ #deployment-autogen }
+
+[AutoGen](https://github.com/microsoft/autogen) is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.
+
+## Prerequisites
+
+- Setup vLLM environment
+
+- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment
+
+```console
+pip install vllm
+
+# Install AgentChat and OpenAI client from Extensions
+# AutoGen requires Python 3.10 or later.
+pip install -U "autogen-agentchat" "autogen-ext[openai]"
+```
+
+## Deploy
+
+- Start the vLLM server with the supported chat completion model, e.g.
+
+```console
+python -m vllm.entrypoints.openai.api_server \
+    --model mistralai/Mistral-7B-Instruct-v0.2
+```
+
+- Call it with AutoGen:
+
+```python
+import asyncio
+from autogen_core.models import UserMessage
+from autogen_ext.models.openai import OpenAIChatCompletionClient
+from autogen_core.models import ModelFamily
+
+
+async def main() -> None:
+    # Create a model client
+    model_client = OpenAIChatCompletionClient(
+            model="mistralai/Mistral-7B-Instruct-v0.2",
+            base_url="http://{your-vllm-host-ip}:{your-vllm-host-port}/v1",
+            api_key="EMPTY",
+            model_info={
+                "vision": False,
+                "function_calling": False,
+                "json_output": False,
+                "family": ModelFamily.MISTRAL,
+                "structured_output": True,
+    },
+    )
+
+    messages = [UserMessage(content="Write a very short story about a dragon.", source="user")]
+
+    # Create a stream.
+    stream = model_client.create_stream(messages=messages)
+
+    # Iterate over the stream and print the responses.
+    print("Streamed responses:")
+    async for response in stream:
+        if isinstance(response, str):
+            # A partial response is a string.
+            print(response, flush=True, end="")
+        else:
+            # The last response is a CreateResult object with the complete message.
+            print("\n\n------------\n")
+            print("The complete response:", flush=True)
+            print(response.content, flush=True)
+
+    # Close the client when done.
+    await model_client.close()
+
+
+asyncio.run(main())
+```
+
+For details, see the tutorial:
+
+- [Using vLLM in AutoGen](https://microsoft.github.io/autogen/0.2/docs/topics/non-openai-models/local-vllm/)
+
+- [OpenAI-compatible API examples](https://microsoft.github.io/autogen/stable/reference/python/autogen_ext.models.openai.html#autogen_ext.models.openai.OpenAIChatCompletionClient)