Skip to content

Commit 34ab013

Browse files
olgavrousonichi
andauthored
Custom Model Client docs follow-up (microsoft#1545)
* custom model client docs followup * fix function name in docs * Update website/docs/Use-Cases/enhanced_inference.md Co-authored-by: Chi Wang <[email protected]> * Update website/docs/Use-Cases/enhanced_inference.md Co-authored-by: Chi Wang <[email protected]> * Update website/docs/Use-Cases/enhanced_inference.md Co-authored-by: Chi Wang <[email protected]> * Update website/docs/Use-Cases/enhanced_inference.md Co-authored-by: Chi Wang <[email protected]> --------- Co-authored-by: Chi Wang <[email protected]>
1 parent 69a976f commit 34ab013

File tree

5 files changed

+20
-15
lines changed

5 files changed

+20
-15
lines changed

autogen/oai/client.py

+2
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ class Choice(Protocol):
7777
class Message(Protocol):
7878
content: Optional[str]
7979

80+
message: Message
81+
8082
choices: List[Choice]
8183
model: str
8284

notebook/agentchat_custom_model.ipynb

+3-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,9 @@
9494
" class ModelClientResponseProtocol(Protocol):\n",
9595
" class Choice(Protocol):\n",
9696
" class Message(Protocol):\n",
97-
" content: str | None\n",
97+
" content: Optional[str]\n",
98+
"\n",
99+
" message: Message\n",
98100
"\n",
99101
" choices: List[Choice]\n",
100102
" model: str\n",

website/blog/2024-01-26-Custom-Models/index.mdx

+3-1
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,9 @@ class ModelClient(Protocol):
122122
class ModelClientResponseProtocol(Protocol):
123123
class Choice(Protocol):
124124
class Message(Protocol):
125-
content: str | None
125+
content: Optional[str]
126+
127+
message: Message
126128

127129
choices: List[Choice]
128130
model: str

website/docs/FAQ.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,10 @@ In version >=1, OpenAI renamed their `api_base` parameter to `base_url`. So for
8989

9090
### Can I use non-OpenAI models?
9191

92-
Yes. Autogen can work with any API endpoint which complies with OpenAI-compatible RESTful APIs - e.g. serving local LLM via FastChat or LM Studio. Please check https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs for an example.
92+
Yes. You currently have two options:
93+
94+
- Autogen can work with any API endpoint which complies with OpenAI-compatible RESTful APIs - e.g. serving local LLM via FastChat or LM Studio. Please check https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs for an example.
95+
- You can supply your own custom model implementation and use it with Autogen. Please check https://microsoft.github.io/autogen/blog/2024/01/26/Custom-Models for more information.
9396

9497
## Handle Rate Limit Error and Timeout Error
9598

website/docs/Use-Cases/enhanced_inference.md

+8-12
Original file line numberDiff line numberDiff line change
@@ -107,9 +107,6 @@ The tuned config can be used to perform inference.
107107

108108
## API unification
109109

110-
<!-- `autogen.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API.
111-
When chat models are used and `prompt` is given as the input to `autogen.Completion.create`, the prompt will be automatically converted into `messages` to fit the chat completion API requirement. One advantage is that one can experiment with both chat and non-chat models for the same prompt in a unified API. -->
112-
113110
`autogen.OpenAIWrapper.create()` can be used to create completions for both chat and non-chat models, and both OpenAI API and Azure OpenAI API.
114111

115112
```python
@@ -133,7 +130,7 @@ print(client.extract_text_or_completion_object(response))
133130

134131
For local LLMs, one can spin up an endpoint using a package like [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.
135132

136-
<!-- When only working with the chat-based models, `autogen.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages. -->
133+
For custom model clients, one can register the client with `autogen.OpenAIWrapper.register_model_client` and then use the same API to send a request. See [here](/blog/2024/01/26/Custom-Models) for examples on how to make inference with custom model clients.
137134

138135
## Usage Summary
139136

@@ -166,6 +163,8 @@ Total cost: 0.00027
166163
* Model 'gpt-3.5-turbo': cost: 0.00027, prompt_tokens: 50, completion_tokens: 100, total_tokens: 150
167164
```
168165

166+
Note: if using a custom model client (see [here](/blog/2024/01/26/Custom-Models) for details) and if usage summary is not implemented, then the usage summary will not be available.
167+
169168
## Caching
170169

171170
API call results are cached locally and reused when the same request is issued.
@@ -241,13 +240,6 @@ The differences between autogen's `cache_seed` and openai's `seed`:
241240

242241
### Runtime error
243242

244-
<!-- It is easy to hit error when calling OpenAI APIs, due to connection, rate limit, or timeout. Some of the errors are transient. `autogen.Completion.create` deals with the transient errors and retries automatically. Request timeout, max retry period and retry wait time can be configured via `request_timeout`, `max_retry_period` and `retry_wait_time`.
245-
246-
- `request_timeout` (int): the timeout (in seconds) sent with a single request.
247-
- `max_retry_period` (int): the total time (in seconds) allowed for retrying failed requests.
248-
- `retry_wait_time` (int): the time interval to wait (in seconds) before retrying a failed request.
249-
250-
Moreover, -->
251243
One can pass a list of configurations of different models/endpoints to mitigate the rate limits and other runtime error. For example,
252244

253245
```python
@@ -268,12 +260,16 @@ client = OpenAIWrapper(
268260
{
269261
"model": "llama2-chat-7B",
270262
"base_url": "http://127.0.0.1:8080",
263+
},
264+
{
265+
"model": "microsoft/phi-2",
266+
"model_client_cls": "CustomModelClient"
271267
}
272268
],
273269
)
274270
```
275271

276-
`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama2-chat-7B one by one,
272+
`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, a locally hosted llama2-chat-7B, and phi-2 using a custom model client class named `CustomModelClient`, one by one,
277273
until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck. An error will be raised if the last choice fails. So make sure the last choice in the list has the best availability.
278274

279275
For convenience, we provide a number of utility functions to load config lists.

0 commit comments

Comments
 (0)