Skip to content

Structure completion request to maximize Prompt Caching #281

@brandonh-msft

Description

@brandonh-msft

Describe the feature or improvement you are requesting

Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData and send it through the pipeline.

This does not maximize Prompt Caching capabilities, where the completion request should have tools, then history, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).

Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached

Asks for BinaryData from the options:

using BinaryContent content = options.ToBinaryContent();

Writes the JSON doc in non-optimal order:

void IJsonModel<ChatCompletionOptions>.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options)

Uses the non-optimal serialization when constructing the BinaryData for the options

internal virtual BinaryContent ToBinaryContent()
{
return BinaryContent.Create(this, ModelSerializationExtensions.WireOptions);
}

Additional context

microsoft/semantic-kernel#9444

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions