You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* custom model client docs followup
* fix function name in docs
* Update website/docs/Use-Cases/enhanced_inference.md
Co-authored-by: Chi Wang <[email protected]>
* Update website/docs/Use-Cases/enhanced_inference.md
Co-authored-by: Chi Wang <[email protected]>
* Update website/docs/Use-Cases/enhanced_inference.md
Co-authored-by: Chi Wang <[email protected]>
* Update website/docs/Use-Cases/enhanced_inference.md
Co-authored-by: Chi Wang <[email protected]>
---------
Co-authored-by: Chi Wang <[email protected]>
Copy file name to clipboardExpand all lines: website/docs/FAQ.md
+4-1
Original file line number
Diff line number
Diff line change
@@ -89,7 +89,10 @@ In version >=1, OpenAI renamed their `api_base` parameter to `base_url`. So for
89
89
90
90
### Can I use non-OpenAI models?
91
91
92
-
Yes. Autogen can work with any API endpoint which complies with OpenAI-compatible RESTful APIs - e.g. serving local LLM via FastChat or LM Studio. Please check https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs for an example.
92
+
Yes. You currently have two options:
93
+
94
+
- Autogen can work with any API endpoint which complies with OpenAI-compatible RESTful APIs - e.g. serving local LLM via FastChat or LM Studio. Please check https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs for an example.
95
+
- You can supply your own custom model implementation and use it with Autogen. Please check https://microsoft.github.io/autogen/blog/2024/01/26/Custom-Models for more information.
Copy file name to clipboardExpand all lines: website/docs/Use-Cases/enhanced_inference.md
+8-12
Original file line number
Diff line number
Diff line change
@@ -107,9 +107,6 @@ The tuned config can be used to perform inference.
107
107
108
108
## API unification
109
109
110
-
<!-- `autogen.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API.
111
-
When chat models are used and `prompt` is given as the input to `autogen.Completion.create`, the prompt will be automatically converted into `messages` to fit the chat completion API requirement. One advantage is that one can experiment with both chat and non-chat models for the same prompt in a unified API. -->
112
-
113
110
`autogen.OpenAIWrapper.create()` can be used to create completions for both chat and non-chat models, and both OpenAI API and Azure OpenAI API.
For local LLMs, one can spin up an endpoint using a package like [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.
135
132
136
-
<!-- When only working with the chat-based models, `autogen.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages. -->
133
+
For custom model clients, one can register the client with`autogen.OpenAIWrapper.register_model_client` and then use the same API to send a request. See [here](/blog/2024/01/26/Custom-Models) for examples on how to make inference with custom model clients.
Note: if using a custom model client (see [here](/blog/2024/01/26/Custom-Models) for details) and if usage summary is not implemented, then the usage summary will not be available.
167
+
169
168
## Caching
170
169
171
170
API call results are cached locally and reused when the same request is issued.
@@ -241,13 +240,6 @@ The differences between autogen's `cache_seed` and openai's `seed`:
241
240
242
241
### Runtime error
243
242
244
-
<!-- It is easy to hit error when calling OpenAI APIs, due to connection, rate limit, or timeout. Some of the errors are transient. `autogen.Completion.create` deals with the transient errors and retries automatically. Request timeout, max retry period and retry wait time can be configured via `request_timeout`, `max_retry_period` and `retry_wait_time`.
245
-
246
-
- `request_timeout` (int): the timeout (in seconds) sent with a single request.
247
-
- `max_retry_period` (int): the total time (in seconds) allowed for retrying failed requests.
248
-
- `retry_wait_time` (int): the time interval to wait (in seconds) before retrying a failed request.
249
-
250
-
Moreover, -->
251
243
One can pass a list of configurations of different models/endpoints to mitigate the rate limits and other runtime error. For example,
252
244
253
245
```python
@@ -268,12 +260,16 @@ client = OpenAIWrapper(
268
260
{
269
261
"model": "llama2-chat-7B",
270
262
"base_url": "http://127.0.0.1:8080",
263
+
},
264
+
{
265
+
"model": "microsoft/phi-2",
266
+
"model_client_cls": "CustomModelClient"
271
267
}
272
268
],
273
269
)
274
270
```
275
271
276
-
`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama2-chat-7B one by one,
272
+
`client.create()` will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, a locally hosted llama2-chat-7B, and phi-2 using a custom model client class named `CustomModelClient`, one by one,
277
273
until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck. An error will be raised if the last choice fails. So make sure the last choice in the list has the best availability.
278
274
279
275
For convenience, we provide a number of utility functions to load config lists.
0 commit comments