-
Notifications
You must be signed in to change notification settings - Fork 342
Description
Describe the feature or improvement you are requesting
Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData and send it through the pipeline.
This does not maximize Prompt Caching capabilities, where the completion request should have tools, then history, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).
Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached
Asks for BinaryData from the options:
openai-dotnet/src/Custom/Chat/ChatClient.cs
Line 196 in c49dd70
| using BinaryContent content = options.ToBinaryContent(); |
Writes the JSON doc in non-optimal order:
| void IJsonModel<ChatCompletionOptions>.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) |
Uses the non-optimal serialization when constructing the BinaryData for the options
openai-dotnet/src/Generated/Models/ChatCompletionOptions.Serialization.cs
Lines 625 to 628 in c49dd70
| internal virtual BinaryContent ToBinaryContent() | |
| { | |
| return BinaryContent.Create(this, ModelSerializationExtensions.WireOptions); | |
| } |