diff --git a/.dotnet/CHANGELOG.md b/.dotnet/CHANGELOG.md index 8a8d68fa1..7dacb2b6b 100644 --- a/.dotnet/CHANGELOG.md +++ b/.dotnet/CHANGELOG.md @@ -1,5 +1,16 @@ # Release History +## 2.2.0-beta.1 (Unreleased) + +### Features added + +- Chat completion now supports audio input and output! + - To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`. + - Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()` + - Output chat audio is provided on the `OutputAudio` property of `ChatCompletion` + - References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too + - For more information, see the example in the README + ## 2.1.0 (2024-12-04) ### Features added diff --git a/.dotnet/README.md b/.dotnet/README.md index 2d138e8a6..10a815604 100644 --- a/.dotnet/README.md +++ b/.dotnet/README.md @@ -18,6 +18,7 @@ It is generated from our [OpenAPI specification](https://github.com/openai/opena - [How to use chat completions with streaming](#how-to-use-chat-completions-with-streaming) - [How to use chat completions with tools and function calling](#how-to-use-chat-completions-with-tools-and-function-calling) - [How to use chat completions with structured outputs](#how-to-use-chat-completions-with-structured-outputs) +- [How to use chat completions with audio](#how-to-use-chat-completions-with-audio) - [How to generate text embeddings](#how-to-generate-text-embeddings) - [How to generate images](#how-to-generate-images) - [How to transcribe audio](#how-to-transcribe-audio) @@ -354,6 +355,75 @@ foreach (JsonElement stepElement in structuredJson.RootElement.GetProperty("step } ``` +## How to use chat completions with audio + +Starting with the `gpt-4o-audio-preview` model, chat completions can process audio input and output. + +This example demonstrates: + 1. Configuring the client with the supported `gpt-4o-audio-preview` model + 1. Supplying user audio input on a chat completion request + 1. Requesting model audio output from the chat completion operation + 1. Retrieving audio output from a `ChatCompletion` instance + 1. Using past audio output as `ChatMessage` conversation history + +```csharp +// Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview +ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY")); + +// Input audio is provided to a request by adding an audio content part to a user message +string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav"); +byte[] audioFileRawBytes = File.ReadAllBytes(audioFilePath); +BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes); +List messages = + [ + new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)), + ]; + +// Output audio is requested by configuring ChatCompletionOptions to include the appropriate +// ResponseModalities values and corresponding AudioOptions. +ChatCompletionOptions options = new() +{ + ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio, + AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3), +}; + +ChatCompletion completion = client.CompleteChat(messages, options); + +void PrintAudioContent() +{ + if (completion.OutputAudio is ChatOutputAudio outputAudio) + { + Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}"); + string outputFilePath = $"{outputAudio.Id}.mp3"; + using (FileStream outputFileStream = File.OpenWrite(outputFilePath)) + { + outputFileStream.Write(outputAudio.AudioBytes); + } + Console.WriteLine($"Response audio written to file: {outputFilePath}"); + Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}"); + } +} + +PrintAudioContent(); + +// To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier +// response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part. + +messages.Add(new AssistantChatMessage(completion)); +messages.Add("Can you say that like a pirate?"); + +completion = client.CompleteChat(messages, options); + +PrintAudioContent(); +``` + +Streaming is highly parallel: `StreamingChatCompletionUpdate` instances can include a `OutputAudioUpdate` that may +contain any of: + +- The `Id` of the streamed audio content, which can be referenced by subsequent `AssistantChatMessage` instances via `ChatAudioReference` once the streaming response is complete; this may appear across multiple `StreamingChatCompletionUpdate` instances but will always be the same value when present +- The `ExpiresAt` value that describes when the `Id` will no longer be valid for use with `ChatAudioReference` in subsequent requests; this typically appears once and only once, in the final `StreamingOutputAudioUpdate` +- Incremental `TranscriptUpdate` and/or `AudioBytesUpdate` values, which can incrementally consumed and, when concatenated, form the complete audio transcript and audio output for the overall response; many of these typically appear + ## How to generate text embeddings In this example, you want to create a trip-planning website that allows customers to write a prompt describing the kind of hotel that they are looking for and then offers hotel recommendations that closely match this description. To achieve this, it is possible to use text embeddings to measure the relatedness of text strings. In summary, you can get embeddings of the hotel descriptions, store them in a vector database, and use them to build a search index that you can query using the embedding of a given customer's prompt. diff --git a/.dotnet/api/OpenAI.netstandard2.0.cs b/.dotnet/api/OpenAI.netstandard2.0.cs index b5b587809..61cafcddf 100644 --- a/.dotnet/api/OpenAI.netstandard2.0.cs +++ b/.dotnet/api/OpenAI.netstandard2.0.cs @@ -1131,11 +1131,13 @@ public class AssistantChatMessage : ChatMessage, IJsonModel parameter instead.")] public AssistantChatMessage(ChatFunctionCall functionCall); public AssistantChatMessage(params ChatMessageContentPart[] contentParts); + public AssistantChatMessage(ChatOutputAudioReference outputAudioReference); public AssistantChatMessage(IEnumerable contentParts); public AssistantChatMessage(IEnumerable toolCalls); public AssistantChatMessage(string content); [Obsolete("This property is obsolete. Please use ToolCalls instead.")] public ChatFunctionCall FunctionCall { get; set; } + public ChatOutputAudioReference OutputAudioReference { get; set; } public string ParticipantName { get; set; } public string Refusal { get; set; } public IList ToolCalls { get; } @@ -1146,6 +1148,13 @@ public class AssistantChatMessage : ChatMessage, IJsonModel, IPersistableModel { + public ChatAudioOptions(ChatOutputAudioVoice outputAudioVoice, ChatOutputAudioFormat outputAudioFormat); + public ChatOutputAudioFormat OutputAudioFormat { get; } + public ChatOutputAudioVoice OutputAudioVoice { get; } + public static explicit operator ChatAudioOptions(ClientResult result); + public static implicit operator BinaryContent(ChatAudioOptions chatAudioOptions); + } public class ChatClient { protected ChatClient(); protected internal ChatClient(ClientPipeline pipeline, string model, OpenAIClientOptions options); @@ -1175,6 +1184,7 @@ public class ChatCompletion : IJsonModel, IPersistableModel RefusalTokenLogProbabilities { get; } public ChatMessageRole Role { get; } @@ -1186,6 +1196,7 @@ public class ChatCompletion : IJsonModel, IPersistableModel, IPersistableModel { public bool? AllowParallelToolCalls { get; set; } + public ChatAudioOptions AudioOptions { get; set; } public string EndUserId { get; set; } public float? FrequencyPenalty { get; set; } [Obsolete("This property is obsolete. Please use ToolChoice instead.")] @@ -1198,6 +1209,7 @@ public class ChatCompletionOptions : IJsonModel, IPersist public IDictionary Metadata { get; } public float? PresencePenalty { get; set; } public ChatResponseFormat ResponseFormat { get; set; } + public ChatResponseModalities ResponseModalities { get; set; } public long? Seed { get; set; } public IList StopSequences { get; } public bool? StoredOutputEnabled { get; set; } @@ -1256,6 +1268,20 @@ public class ChatFunctionChoice : IJsonModel, IPersistableMo public static bool operator !=(ChatImageDetailLevel left, ChatImageDetailLevel right); public override readonly string ToString(); } + public readonly partial struct ChatInputAudioFormat : IEquatable { + public ChatInputAudioFormat(string value); + public static ChatInputAudioFormat Mp3 { get; } + public static ChatInputAudioFormat Wav { get; } + public readonly bool Equals(ChatInputAudioFormat other); + [EditorBrowsable(EditorBrowsableState.Never)] + public override readonly bool Equals(object obj); + [EditorBrowsable(EditorBrowsableState.Never)] + public override readonly int GetHashCode(); + public static bool operator ==(ChatInputAudioFormat left, ChatInputAudioFormat right); + public static implicit operator ChatInputAudioFormat(string value); + public static bool operator !=(ChatInputAudioFormat left, ChatInputAudioFormat right); + public override readonly string ToString(); + } public class ChatInputTokenUsageDetails : IJsonModel, IPersistableModel { public int AudioTokenCount { get; } public int CachedTokenCount { get; } @@ -1267,6 +1293,7 @@ public class ChatMessage : IJsonModel, IPersistableModel contentParts); public static AssistantChatMessage CreateAssistantMessage(IEnumerable toolCalls); public static AssistantChatMessage CreateAssistantMessage(string content); @@ -1296,11 +1323,14 @@ public class ChatMessageContentPart : IJsonModel, IPersi public string ImageBytesMediaType { get; } public ChatImageDetailLevel? ImageDetailLevel { get; } public Uri ImageUri { get; } + public BinaryData InputAudioBytes { get; } + public ChatInputAudioFormat? InputAudioFormat { get; } public ChatMessageContentPartKind Kind { get; } public string Refusal { get; } public string Text { get; } public static ChatMessageContentPart CreateImagePart(BinaryData imageBytes, string imageBytesMediaType, ChatImageDetailLevel? imageDetailLevel = null); public static ChatMessageContentPart CreateImagePart(Uri imageUri, ChatImageDetailLevel? imageDetailLevel = null); + public static ChatMessageContentPart CreateInputAudioPart(BinaryData inputAudioBytes, ChatInputAudioFormat inputAudioFormat); public static ChatMessageContentPart CreateRefusalPart(string refusal); public static ChatMessageContentPart CreateTextPart(string text); public static explicit operator ChatMessageContentPart(ClientResult result); @@ -1310,7 +1340,8 @@ public class ChatMessageContentPart : IJsonModel, IPersi public enum ChatMessageContentPartKind { Text = 0, Refusal = 1, - Image = 2 + Image = 2, + InputAudio = 3 } public enum ChatMessageRole { System = 0, @@ -1319,6 +1350,55 @@ public enum ChatMessageRole { Tool = 3, Function = 4 } + public class ChatOutputAudio : IJsonModel, IPersistableModel { + public BinaryData AudioBytes { get; } + public DateTimeOffset ExpiresAt { get; } + public string Id { get; } + public string Transcript { get; } + public static explicit operator ChatOutputAudio(ClientResult result); + public static implicit operator BinaryContent(ChatOutputAudio chatOutputAudio); + } + public readonly partial struct ChatOutputAudioFormat : IEquatable { + public ChatOutputAudioFormat(string value); + public static ChatOutputAudioFormat Flac { get; } + public static ChatOutputAudioFormat Mp3 { get; } + public static ChatOutputAudioFormat Opus { get; } + public static ChatOutputAudioFormat Pcm16 { get; } + public static ChatOutputAudioFormat Wav { get; } + public readonly bool Equals(ChatOutputAudioFormat other); + [EditorBrowsable(EditorBrowsableState.Never)] + public override readonly bool Equals(object obj); + [EditorBrowsable(EditorBrowsableState.Never)] + public override readonly int GetHashCode(); + public static bool operator ==(ChatOutputAudioFormat left, ChatOutputAudioFormat right); + public static implicit operator ChatOutputAudioFormat(string value); + public static bool operator !=(ChatOutputAudioFormat left, ChatOutputAudioFormat right); + public override readonly string ToString(); + } + public class ChatOutputAudioReference : IJsonModel, IPersistableModel { + public ChatOutputAudioReference(string id); + public string Id { get; } + public static explicit operator ChatOutputAudioReference(ClientResult result); + public static implicit operator BinaryContent(ChatOutputAudioReference chatOutputAudioReference); + } + public readonly partial struct ChatOutputAudioVoice : IEquatable { + public ChatOutputAudioVoice(string value); + public static ChatOutputAudioVoice Alloy { get; } + public static ChatOutputAudioVoice Echo { get; } + public static ChatOutputAudioVoice Fable { get; } + public static ChatOutputAudioVoice Nova { get; } + public static ChatOutputAudioVoice Onyx { get; } + public static ChatOutputAudioVoice Shimmer { get; } + public readonly bool Equals(ChatOutputAudioVoice other); + [EditorBrowsable(EditorBrowsableState.Never)] + public override readonly bool Equals(object obj); + [EditorBrowsable(EditorBrowsableState.Never)] + public override readonly int GetHashCode(); + public static bool operator ==(ChatOutputAudioVoice left, ChatOutputAudioVoice right); + public static implicit operator ChatOutputAudioVoice(string value); + public static bool operator !=(ChatOutputAudioVoice left, ChatOutputAudioVoice right); + public override readonly string ToString(); + } public class ChatOutputTokenUsageDetails : IJsonModel, IPersistableModel { public int AudioTokenCount { get; } public int ReasoningTokenCount { get; } @@ -1332,6 +1412,12 @@ public class ChatResponseFormat : IJsonModel, IPersistableMo public static explicit operator ChatResponseFormat(ClientResult result); public static implicit operator BinaryContent(ChatResponseFormat chatResponseFormat); } + [Flags] + public enum ChatResponseModalities { + Default = 0, + Text = 1, + Audio = 2 + } public class ChatTokenLogProbabilityDetails : IJsonModel, IPersistableModel { public float LogProbability { get; } public string Token { get; } @@ -1401,15 +1487,17 @@ public class FunctionChatMessage : ChatMessage, IJsonModel, protected override BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options); } public static class OpenAIChatModelFactory { - public static ChatCompletion ChatCompletion(string id = null, ChatFinishReason finishReason = ChatFinishReason.Stop, ChatMessageContent content = null, string refusal = null, IEnumerable toolCalls = null, ChatMessageRole role = ChatMessageRole.System, ChatFunctionCall functionCall = null, IEnumerable contentTokenLogProbabilities = null, IEnumerable refusalTokenLogProbabilities = null, DateTimeOffset createdAt = default, string model = null, string systemFingerprint = null, ChatTokenUsage usage = null); + public static ChatCompletion ChatCompletion(string id = null, ChatFinishReason finishReason = ChatFinishReason.Stop, ChatMessageContent content = null, string refusal = null, IEnumerable toolCalls = null, ChatMessageRole role = ChatMessageRole.System, ChatFunctionCall functionCall = null, IEnumerable contentTokenLogProbabilities = null, IEnumerable refusalTokenLogProbabilities = null, DateTimeOffset createdAt = default, string model = null, string systemFingerprint = null, ChatTokenUsage usage = null, ChatOutputAudio outputAudio = null); public static ChatInputTokenUsageDetails ChatInputTokenUsageDetails(int audioTokenCount = 0, int cachedTokenCount = 0); + public static ChatOutputAudio ChatOutputAudio(BinaryData audioBytes, string id = null, string transcript = null, DateTimeOffset expiresAt = default); public static ChatOutputTokenUsageDetails ChatOutputTokenUsageDetails(int reasoningTokenCount = 0, int audioTokenCount = 0); public static ChatTokenLogProbabilityDetails ChatTokenLogProbabilityDetails(string token = null, float logProbability = 0, ReadOnlyMemory? utf8Bytes = null, IEnumerable topLogProbabilities = null); public static ChatTokenTopLogProbabilityDetails ChatTokenTopLogProbabilityDetails(string token = null, float logProbability = 0, ReadOnlyMemory? utf8Bytes = null); public static ChatTokenUsage ChatTokenUsage(int outputTokenCount = 0, int inputTokenCount = 0, int totalTokenCount = 0, ChatOutputTokenUsageDetails outputTokenDetails = null, ChatInputTokenUsageDetails inputTokenDetails = null); - public static StreamingChatCompletionUpdate StreamingChatCompletionUpdate(string completionId = null, ChatMessageContent contentUpdate = null, StreamingChatFunctionCallUpdate functionCallUpdate = null, IEnumerable toolCallUpdates = null, ChatMessageRole? role = null, string refusalUpdate = null, IEnumerable contentTokenLogProbabilities = null, IEnumerable refusalTokenLogProbabilities = null, ChatFinishReason? finishReason = null, DateTimeOffset createdAt = default, string model = null, string systemFingerprint = null, ChatTokenUsage usage = null); + public static StreamingChatCompletionUpdate StreamingChatCompletionUpdate(string completionId = null, ChatMessageContent contentUpdate = null, StreamingChatFunctionCallUpdate functionCallUpdate = null, IEnumerable toolCallUpdates = null, ChatMessageRole? role = null, string refusalUpdate = null, IEnumerable contentTokenLogProbabilities = null, IEnumerable refusalTokenLogProbabilities = null, ChatFinishReason? finishReason = null, DateTimeOffset createdAt = default, string model = null, string systemFingerprint = null, ChatTokenUsage usage = null, StreamingChatOutputAudioUpdate outputAudioUpdate = null); [Obsolete("This class is obsolete. Please use StreamingChatToolCallUpdate instead.")] public static StreamingChatFunctionCallUpdate StreamingChatFunctionCallUpdate(string functionName = null, BinaryData functionArgumentsUpdate = null); + public static StreamingChatOutputAudioUpdate StreamingChatOutputAudioUpdate(string id = null, DateTimeOffset? expiresAt = null, string transcriptUpdate = null, BinaryData audioBytesUpdate = null); public static StreamingChatToolCallUpdate StreamingChatToolCallUpdate(int index = 0, string toolCallId = null, ChatToolCallKind kind = ChatToolCallKind.Function, string functionName = null, BinaryData functionArgumentsUpdate = null); } public class StreamingChatCompletionUpdate : IJsonModel, IPersistableModel { @@ -1421,6 +1509,7 @@ public class StreamingChatCompletionUpdate : IJsonModel RefusalTokenLogProbabilities { get; } public string RefusalUpdate { get; } public ChatMessageRole? Role { get; } @@ -1437,6 +1526,14 @@ public class StreamingChatFunctionCallUpdate : IJsonModel, IPersistableModel { + public BinaryData AudioBytesUpdate { get; } + public DateTimeOffset? ExpiresAt { get; } + public string Id { get; } + public string TranscriptUpdate { get; } + public static explicit operator StreamingChatOutputAudioUpdate(ClientResult result); + public static implicit operator BinaryContent(StreamingChatOutputAudioUpdate streamingChatOutputAudioUpdate); + } public class StreamingChatToolCallUpdate : IJsonModel, IPersistableModel { public BinaryData FunctionArgumentsUpdate { get; } public string FunctionName { get; } @@ -1945,6 +2042,7 @@ namespace OpenAI.RealtimeConversation { } [Flags] public enum ConversationContentModalities { + Default = 0, Text = 1, Audio = 2 } diff --git a/.dotnet/examples/Chat/Example09_ChatWithAudio.cs b/.dotnet/examples/Chat/Example09_ChatWithAudio.cs new file mode 100644 index 000000000..b18ad79e2 --- /dev/null +++ b/.dotnet/examples/Chat/Example09_ChatWithAudio.cs @@ -0,0 +1,63 @@ +using NUnit.Framework; +using OpenAI.Chat; +using System; +using System.Collections.Generic; +using System.IO; + +namespace OpenAI.Examples; + +public partial class ChatExamples +{ + [Test] + public void Example09_ChatWithAudio() + { + // Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview + ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY")); + + // Input audio is provided to a request by adding an audio content part to a user message + string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav"); + byte[] audioFileRawBytes = File.ReadAllBytes(audioFilePath); + BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes); + List messages = + [ + new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)), + ]; + + // Output audio is requested by configuring ChatCompletionOptions to include the appropriate + // ResponseModalities values and corresponding AudioOptions. + ChatCompletionOptions options = new() + { + ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio, + AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3), + }; + + ChatCompletion completion = client.CompleteChat(messages, options); + + void PrintAudioContent() + { + if (completion.OutputAudio is ChatOutputAudio outputAudio) + { + Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}"); + string outputFilePath = $"{outputAudio.Id}.mp3"; + using (FileStream outputFileStream = File.OpenWrite(outputFilePath)) + { + outputFileStream.Write(outputAudio.AudioBytes); + } + Console.WriteLine($"Response audio written to file: {outputFilePath}"); + Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}"); + } + } + + PrintAudioContent(); + + // To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier + // response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part. + + messages.Add(new AssistantChatMessage(completion)); + messages.Add("Can you say that like a pirate?"); + + completion = client.CompleteChat(messages, options); + + PrintAudioContent(); + } +} diff --git a/.dotnet/examples/Chat/Example09_ChatWithAudioAsync.cs b/.dotnet/examples/Chat/Example09_ChatWithAudioAsync.cs new file mode 100644 index 000000000..529c0c604 --- /dev/null +++ b/.dotnet/examples/Chat/Example09_ChatWithAudioAsync.cs @@ -0,0 +1,64 @@ +using NUnit.Framework; +using OpenAI.Chat; +using System; +using System.Collections.Generic; +using System.IO; +using System.Threading.Tasks; + +namespace OpenAI.Examples; + +public partial class ChatExamples +{ + [Test] + public async Task Example09_ChatWithAudioAsync() + { + // Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview + ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY")); + + // Input audio is provided to a request by adding an audio content part to a user message + string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav"); + byte[] audioFileRawBytes = await File.ReadAllBytesAsync(audioFilePath); + BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes); + List messages = + [ + new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)), + ]; + + // Output audio is requested by configuring ChatCompletionOptions to include the appropriate + // ResponseModalities values and corresponding AudioOptions. + ChatCompletionOptions options = new() + { + ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio, + AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3), + }; + + ChatCompletion completion = await client.CompleteChatAsync(messages, options); + + async Task PrintAudioContentAsync() + { + if (completion.OutputAudio is ChatOutputAudio outputAudio) + { + Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}"); + string outputFilePath = $"{outputAudio.Id}.mp3"; + using (FileStream outputFileStream = File.OpenWrite(outputFilePath)) + { + await outputFileStream.WriteAsync(outputAudio.AudioBytes); + } + Console.WriteLine($"Response audio written to file: {outputFilePath}"); + Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}"); + } + } + + await PrintAudioContentAsync(); + + // To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier + // response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part. + + messages.Add(new AssistantChatMessage(completion)); + messages.Add("Can you say that like a pirate?"); + + completion = await client.CompleteChatAsync(messages, options); + + await PrintAudioContentAsync(); + } +} diff --git a/.dotnet/src/Custom/Chat/AssistantChatMessage.Serialization.cs b/.dotnet/src/Custom/Chat/AssistantChatMessage.Serialization.cs index d5fac5775..f12246a67 100644 --- a/.dotnet/src/Custom/Chat/AssistantChatMessage.Serialization.cs +++ b/.dotnet/src/Custom/Chat/AssistantChatMessage.Serialization.cs @@ -44,6 +44,7 @@ internal override void WriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions writer.WriteOptionalProperty("name"u8, ParticipantName, options); writer.WriteOptionalCollection("tool_calls"u8, ToolCalls, options); writer.WriteOptionalProperty("function_call"u8, FunctionCall, options); + writer.WriteOptionalProperty("audio"u8, OutputAudioReference, options); writer.WriteSerializedAdditionalRawData(_additionalBinaryDataProperties, options); writer.WriteEndObject(); } diff --git a/.dotnet/src/Custom/Chat/AssistantChatMessage.cs b/.dotnet/src/Custom/Chat/AssistantChatMessage.cs index db2b80de3..99d83eec3 100644 --- a/.dotnet/src/Custom/Chat/AssistantChatMessage.cs +++ b/.dotnet/src/Custom/Chat/AssistantChatMessage.cs @@ -1,5 +1,6 @@ using System; using System.Collections.Generic; +using System.Linq; namespace OpenAI.Chat; @@ -83,6 +84,18 @@ public AssistantChatMessage(ChatFunctionCall functionCall) FunctionCall = functionCall; } + /// + /// Creates a new instance of that represents a prior response from the model + /// that included audio with a correlation ID. + /// + /// The audio reference with an id, produced by the model. + public AssistantChatMessage(ChatOutputAudioReference outputAudioReference) + { + Argument.AssertNotNull(outputAudioReference, nameof(outputAudioReference)); + + OutputAudioReference = outputAudioReference; + } + /// /// Creates a new instance of from a with /// an assistant role response. @@ -109,6 +122,10 @@ public AssistantChatMessage(ChatCompletion chatCompletion) Refusal = chatCompletion.Refusal; FunctionCall = chatCompletion.FunctionCall; + if (chatCompletion.OutputAudio is not null) + { + OutputAudioReference = new(chatCompletion.OutputAudio.Id); + } foreach (ChatToolCall toolCall in chatCompletion.ToolCalls ?? []) { ToolCalls.Add(toolCall); @@ -129,4 +146,8 @@ public AssistantChatMessage(ChatCompletion chatCompletion) [Obsolete($"This property is obsolete. Please use {nameof(ToolCalls)} instead.")] public ChatFunctionCall FunctionCall { get; set; } + + // CUSTOM: Renamed. + [CodeGenMember("Audio")] + public ChatOutputAudioReference OutputAudioReference { get; set; } } \ No newline at end of file diff --git a/.dotnet/src/Custom/Chat/ChatAudioOptions.cs b/.dotnet/src/Custom/Chat/ChatAudioOptions.cs new file mode 100644 index 000000000..03b4f6a51 --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatAudioOptions.cs @@ -0,0 +1,30 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics.CodeAnalysis; + +namespace OpenAI.Chat; + +/// +/// Represents the configuration details for output audio requested in a chat completion request. +/// +/// +/// When provided to a instance's property, +/// the request's specified content modalities will be automatically updated to reflect desired audio output. +/// +[CodeGenModel("CreateChatCompletionRequestAudio")] +public partial class ChatAudioOptions +{ + // CUSTOM: Renamed. + /// + /// Gets or sets the voice model that the response should use to synthesize audio. + /// + [CodeGenMember("Voice")] + public ChatOutputAudioVoice OutputAudioVoice { get; } + + // CUSTOM: Renamed. + /// + /// Specifies the output format desired for synthesized audio. + /// + [CodeGenMember("Format")] + public ChatOutputAudioFormat OutputAudioFormat { get; } +} diff --git a/.dotnet/src/Custom/Chat/ChatCompletion.cs b/.dotnet/src/Custom/Chat/ChatCompletion.cs index 03f9ea3f0..bb57e3c0f 100644 --- a/.dotnet/src/Custom/Chat/ChatCompletion.cs +++ b/.dotnet/src/Custom/Chat/ChatCompletion.cs @@ -84,4 +84,7 @@ public partial class ChatCompletion // CUSTOM: Flattened choice message property. [Obsolete($"This property is obsolete. Please use {nameof(ToolCalls)} instead.")] public ChatFunctionCall FunctionCall => Choices[0].Message.FunctionCall; + + /// The audio response generated by the model. + public ChatOutputAudio OutputAudio => Choices[0].Message.Audio; } diff --git a/.dotnet/src/Custom/Chat/ChatCompletionOptions.cs b/.dotnet/src/Custom/Chat/ChatCompletionOptions.cs index d5b3b4545..a78f659d0 100644 --- a/.dotnet/src/Custom/Chat/ChatCompletionOptions.cs +++ b/.dotnet/src/Custom/Chat/ChatCompletionOptions.cs @@ -176,4 +176,38 @@ public ChatCompletionOptions() /// [CodeGenMember("Store")] public bool? StoredOutputEnabled { get; set; } + + // CUSTOM: Made internal for automatic enablement via audio options. + [CodeGenMember("Modalities")] + private IList _internalModalities = new ChangeTrackingList(); + + /// + /// Specifies the content types that the model should generate in its responses. + /// + /// + /// Most models can generate text and the default ["text"] value, from , requests this. + /// Some models like gpt-4o-audio-preview can also generate audio, and this can be requested by combining ["text","audio"] via + /// the flags | . + /// + public ChatResponseModalities ResponseModalities + { + get => ChatResponseModalitiesExtensions.FromInternalModalities(_internalModalities); + set => _internalModalities = value.ToInternalModalities(); + } + + // CUSTOM: supplemented with custom setter to internally enable audio output via modalities. + [CodeGenMember("Audio")] + private ChatAudioOptions _audioOptions; + + public ChatAudioOptions AudioOptions + { + get => _audioOptions; + set + { + _audioOptions = value; + _internalModalities = value is null + ? new ChangeTrackingList() + : [InternalCreateChatCompletionRequestModality.Text, InternalCreateChatCompletionRequestModality.Audio]; + } + } } diff --git a/.dotnet/src/Custom/Chat/ChatInputAudioFormat.cs b/.dotnet/src/Custom/Chat/ChatInputAudioFormat.cs new file mode 100644 index 000000000..cd32981ed --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatInputAudioFormat.cs @@ -0,0 +1,11 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics.CodeAnalysis; + +namespace OpenAI.Chat; + +[CodeGenModel("ChatCompletionRequestMessageContentPartAudioInputAudioFormat")] +public readonly partial struct ChatInputAudioFormat +{ + +} diff --git a/.dotnet/src/Custom/Chat/ChatMessage.cs b/.dotnet/src/Custom/Chat/ChatMessage.cs index 78dcab46a..9b276c299 100644 --- a/.dotnet/src/Custom/Chat/ChatMessage.cs +++ b/.dotnet/src/Custom/Chat/ChatMessage.cs @@ -56,6 +56,11 @@ namespace OpenAI.Chat; [CodeGenSerialization(nameof(Content), SerializationValueHook = nameof(SerializeContentValue), DeserializationValueHook = nameof(DeserializeContentValue))] public partial class ChatMessage { + /// + /// The content associated with the message. The interpretation of this content will vary depending on the message type. + /// + public ChatMessageContent Content { get; } = new ChatMessageContent(); + // CUSTOM: Changed type from string to ChatMessageRole. [CodeGenMember("Role")] internal ChatMessageRole Role { get; set; } @@ -89,11 +94,6 @@ internal ChatMessage(ChatMessageRole role, string content = null) : this(role) } } - /// - /// The content associated with the message. The interpretation of this content will vary depending on the message type. - /// - public ChatMessageContent Content { get; } = new ChatMessageContent(); - #region SystemChatMessage /// public static SystemChatMessage CreateSystemMessage(string content) => new(content); @@ -134,6 +134,10 @@ internal ChatMessage(ChatMessageRole role, string content = null) : this(role) /// public static AssistantChatMessage CreateAssistantMessage(ChatCompletion chatCompletion) => new(chatCompletion); + + /// + public static AssistantChatMessage CreateAssistantMessage(ChatOutputAudioReference outputAudioReference) => new(outputAudioReference); + #endregion #region ToolChatMessage diff --git a/.dotnet/src/Custom/Chat/ChatMessageContentPart.Serialization.cs b/.dotnet/src/Custom/Chat/ChatMessageContentPart.Serialization.cs index eb2277bf6..20faaf3f5 100644 --- a/.dotnet/src/Custom/Chat/ChatMessageContentPart.Serialization.cs +++ b/.dotnet/src/Custom/Chat/ChatMessageContentPart.Serialization.cs @@ -33,6 +33,11 @@ internal static void WriteCoreContentPart(ChatMessageContentPart instance, Utf8J writer.WritePropertyName("image_url"u8); writer.WriteObjectValue(instance._imageUri, options); } + else if (instance._kind == ChatMessageContentPartKind.InputAudio) + { + writer.WritePropertyName("input_audio"u8); + writer.WriteObjectValue(instance._inputAudio, options); + } writer.WriteSerializedAdditionalRawData(instance._additionalBinaryDataProperties, options); writer.WriteEndObject(); } @@ -50,6 +55,7 @@ internal static ChatMessageContentPart DeserializeChatMessageContentPart(JsonEle string text = default; string refusal = default; InternalChatCompletionRequestMessageContentPartImageImageUrl imageUri = default; + InternalChatCompletionRequestMessageContentPartAudioInputAudio inputAudio = default; IDictionary serializedAdditionalRawData = default; Dictionary rawDataDictionary = new Dictionary(); foreach (var property in element.EnumerateObject()) @@ -74,12 +80,18 @@ internal static ChatMessageContentPart DeserializeChatMessageContentPart(JsonEle refusal = property.Value.GetString(); continue; } + if (property.NameEquals("input_audio"u8)) + { + inputAudio = InternalChatCompletionRequestMessageContentPartAudioInputAudio + .DeserializeInternalChatCompletionRequestMessageContentPartAudioInputAudio(property.Value, options); + continue; + } if (true) { rawDataDictionary.Add(property.Name, BinaryData.FromString(property.Value.GetRawText())); } } serializedAdditionalRawData = rawDataDictionary; - return new ChatMessageContentPart(kind, text, imageUri, refusal, serializedAdditionalRawData); + return new ChatMessageContentPart(kind, text, imageUri, refusal, inputAudio, serializedAdditionalRawData); } } diff --git a/.dotnet/src/Custom/Chat/ChatMessageContentPart.cs b/.dotnet/src/Custom/Chat/ChatMessageContentPart.cs index f170d8b83..1d9607808 100644 --- a/.dotnet/src/Custom/Chat/ChatMessageContentPart.cs +++ b/.dotnet/src/Custom/Chat/ChatMessageContentPart.cs @@ -19,6 +19,10 @@ namespace OpenAI.Chat; /// Call to create a that /// encapsulates a refusal coming from the model. /// +/// +/// Call to create a content part +/// encapsulating input audio for user role messages. +/// /// /// [CodeGenModel("ChatMessageContentPart")] @@ -28,6 +32,7 @@ public partial class ChatMessageContentPart private readonly ChatMessageContentPartKind _kind; private readonly string _text; private readonly InternalChatCompletionRequestMessageContentPartImageImageUrl _imageUri; + private readonly InternalChatCompletionRequestMessageContentPartAudioInputAudio _inputAudio; private readonly string _refusal; // CUSTOM: Made internal. @@ -36,12 +41,19 @@ internal ChatMessageContentPart() } // CUSTOM: Added to support deserialization. - internal ChatMessageContentPart(ChatMessageContentPartKind kind, string text, InternalChatCompletionRequestMessageContentPartImageImageUrl imageUri, string refusal, IDictionary serializedAdditionalRawData) + internal ChatMessageContentPart( + ChatMessageContentPartKind kind, + string text = default, + InternalChatCompletionRequestMessageContentPartImageImageUrl imageUri = default, + string refusal = default, + InternalChatCompletionRequestMessageContentPartAudioInputAudio inputAudio = default, + IDictionary serializedAdditionalRawData = default) { _kind = kind; _text = text; _imageUri = imageUri; _refusal = refusal; + _inputAudio = inputAudio; _additionalBinaryDataProperties = serializedAdditionalRawData; } @@ -68,6 +80,24 @@ internal ChatMessageContentPart(ChatMessageContentPartKind kind, string text, In /// Present when is . public string ImageBytesMediaType => _imageUri?.ImageBytesMediaType; + /// + /// The encoded binary audio payload associated with the content part. + /// + /// + /// Present when is . The content part + /// represents user role audio input. + /// + public BinaryData InputAudioBytes => _inputAudio?.Data; + + /// + /// The encoding format that the audio data provided in should be interpreted with. + /// + /// + /// Present when is . The content part + /// represents user role audio input. + /// + public ChatInputAudioFormat? InputAudioFormat => _inputAudio?.Format; + // CUSTOM: Spread. /// /// The level of detail with which the model should process the image and generate its textual understanding of @@ -88,12 +118,7 @@ public static ChatMessageContentPart CreateTextPart(string text) { Argument.AssertNotNull(text, nameof(text)); - return new ChatMessageContentPart( - kind: ChatMessageContentPartKind.Text, - text: text, - imageUri: null, - refusal: null, - serializedAdditionalRawData: null); + return new ChatMessageContentPart(ChatMessageContentPartKind.Text, text: text); } /// Creates a new that encapsulates an image. @@ -109,10 +134,7 @@ public static ChatMessageContentPart CreateImagePart(Uri imageUri, ChatImageDeta return new ChatMessageContentPart( kind: ChatMessageContentPartKind.Image, - text: null, - imageUri: new(imageUri) { Detail = imageDetailLevel }, - refusal: null, - serializedAdditionalRawData: null); + imageUri: new(imageUri) { Detail = imageDetailLevel }); } /// Creates a new that encapsulates an image. @@ -131,10 +153,7 @@ public static ChatMessageContentPart CreateImagePart(BinaryData imageBytes, stri return new ChatMessageContentPart( kind: ChatMessageContentPartKind.Image, - text: null, - imageUri: new(imageBytes, imageBytesMediaType) { Detail = imageDetailLevel }, - refusal: null, - serializedAdditionalRawData: null); + imageUri: new(imageBytes, imageBytesMediaType) { Detail = imageDetailLevel }); } /// Creates a new that encapsulates a refusal coming from the model. @@ -146,10 +165,23 @@ public static ChatMessageContentPart CreateRefusalPart(string refusal) return new ChatMessageContentPart( kind: ChatMessageContentPartKind.Refusal, - text: null, - imageUri: null, - refusal: refusal, - serializedAdditionalRawData: null); + refusal: refusal); + } + + /// Creates a new that encapsulates user role input audio in a known format. + /// + /// Binary audio content parts may only be used with instances to represent user audio input. When referring to + /// past audio output from the model, use instead. + /// + /// The audio data. + /// The format of the audio data. + public static ChatMessageContentPart CreateInputAudioPart(BinaryData inputAudioBytes, ChatInputAudioFormat inputAudioFormat) + { + Argument.AssertNotNull(inputAudioBytes, nameof(inputAudioBytes)); + + return new ChatMessageContentPart( + kind: ChatMessageContentPartKind.InputAudio, + inputAudio: new(inputAudioBytes, inputAudioFormat)); } /// diff --git a/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.Serialization.cs b/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.Serialization.cs index 2ef688cd0..eae666c74 100644 --- a/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.Serialization.cs +++ b/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.Serialization.cs @@ -13,6 +13,7 @@ internal static partial class ChatMessageContentPartKindExtensions ChatMessageContentPartKind.Text => "text", ChatMessageContentPartKind.Refusal => "refusal", ChatMessageContentPartKind.Image => "image_url", + ChatMessageContentPartKind.InputAudio => "input_audio", _ => throw new ArgumentOutOfRangeException(nameof(value), value, "Unknown ChatMessageContentPartKind value.") }; @@ -21,6 +22,7 @@ public static ChatMessageContentPartKind ToChatMessageContentPartKind(this strin if (StringComparer.OrdinalIgnoreCase.Equals(value, "text")) return ChatMessageContentPartKind.Text; if (StringComparer.OrdinalIgnoreCase.Equals(value, "refusal")) return ChatMessageContentPartKind.Refusal; if (StringComparer.OrdinalIgnoreCase.Equals(value, "image_url")) return ChatMessageContentPartKind.Image; + if (StringComparer.OrdinalIgnoreCase.Equals(value, "input_audio")) return ChatMessageContentPartKind.InputAudio; throw new ArgumentOutOfRangeException(nameof(value), value, "Unknown ChatMessageContentPartKind value."); } } diff --git a/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.cs b/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.cs index a6cc4f5e4..c371039b4 100644 --- a/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.cs +++ b/.dotnet/src/Custom/Chat/ChatMessageContentPartKind.cs @@ -9,5 +9,7 @@ public enum ChatMessageContentPartKind Refusal, - Image + Image, + + InputAudio, } \ No newline at end of file diff --git a/.dotnet/src/Custom/Chat/ChatOutputAudio.cs b/.dotnet/src/Custom/Chat/ChatOutputAudio.cs new file mode 100644 index 000000000..1c151bc93 --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatOutputAudio.cs @@ -0,0 +1,16 @@ +using System; +using System.Collections.Generic; +using System.Text.RegularExpressions; + +namespace OpenAI.Chat; + +/// +/// Represents the audio output generated by the model as part of a chat completion response. +/// +[CodeGenModel("ChatCompletionResponseMessageAudio")] +public partial class ChatOutputAudio +{ + // CUSTOM: Renamed. + [CodeGenMember("Data")] + public BinaryData AudioBytes { get; } +} \ No newline at end of file diff --git a/.dotnet/src/Custom/Chat/ChatOutputAudioFormat.cs b/.dotnet/src/Custom/Chat/ChatOutputAudioFormat.cs new file mode 100644 index 000000000..12cb5e909 --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatOutputAudioFormat.cs @@ -0,0 +1,15 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics.CodeAnalysis; + +namespace OpenAI.Chat; + +/// +/// Specifies the audio format the model should use when generating output audio as part of a chat completion +/// response. +/// +[CodeGenModel("CreateChatCompletionRequestAudioFormat")] +public readonly partial struct ChatOutputAudioFormat +{ + +} diff --git a/.dotnet/src/Custom/Chat/ChatOutputAudioReference.cs b/.dotnet/src/Custom/Chat/ChatOutputAudioReference.cs new file mode 100644 index 000000000..2991ca107 --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatOutputAudioReference.cs @@ -0,0 +1,20 @@ +using System; +using System.Collections.Generic; +using System.Text.RegularExpressions; + +namespace OpenAI.Chat; + +/// +/// Represents an ID-based reference to a past audio output as received from a prior chat completion response, as +/// provided when creating an instance for use in a conversation history. +/// +/// +/// This value is obtained from the or +/// properties for streaming and non-streaming +/// responses, respectively. The constructor overload can also be +/// used to automatically populate the appropriate properties from a instance. +/// +[CodeGenModel("ChatCompletionRequestAssistantMessageAudio")] +public partial class ChatOutputAudioReference +{ +} \ No newline at end of file diff --git a/.dotnet/src/Custom/Chat/ChatOutputAudioVoice.cs b/.dotnet/src/Custom/Chat/ChatOutputAudioVoice.cs new file mode 100644 index 000000000..4d8fc3bd1 --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatOutputAudioVoice.cs @@ -0,0 +1,14 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics.CodeAnalysis; + +namespace OpenAI.Chat; + +/// +/// Specifies the available voices that the model can use when generating output audio as part of a chat completion. +/// +[CodeGenModel("CreateChatCompletionRequestAudioVoice")] +public readonly partial struct ChatOutputAudioVoice +{ + +} diff --git a/.dotnet/src/Custom/Chat/ChatResponseModalities.Serialization.cs b/.dotnet/src/Custom/Chat/ChatResponseModalities.Serialization.cs new file mode 100644 index 000000000..252d9b7ab --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatResponseModalities.Serialization.cs @@ -0,0 +1,37 @@ +using System.Collections.Generic; + +namespace OpenAI.Chat; + +internal static partial class ChatResponseModalitiesExtensions +{ + internal static IList ToInternalModalities(this ChatResponseModalities modalities) + { + ChangeTrackingList internalModalities = new(); + if (modalities.HasFlag(ChatResponseModalities.Text)) + { + internalModalities.Add(InternalCreateChatCompletionRequestModality.Text); + } + if (modalities.HasFlag(ChatResponseModalities.Audio)) + { + internalModalities.Add(InternalCreateChatCompletionRequestModality.Audio); + } + return internalModalities; + } + + internal static ChatResponseModalities FromInternalModalities(IEnumerable internalModalities) + { + ChatResponseModalities result = 0; + foreach (InternalCreateChatCompletionRequestModality internalModality in internalModalities ?? []) + { + if (internalModality == InternalCreateChatCompletionRequestModality.Text) + { + result |= ChatResponseModalities.Text; + } + else if (internalModality == InternalCreateChatCompletionRequestModality.Audio) + { + result |= ChatResponseModalities.Audio; + } + } + return result; + } +} \ No newline at end of file diff --git a/.dotnet/src/Custom/Chat/ChatResponseModalities.cs b/.dotnet/src/Custom/Chat/ChatResponseModalities.cs new file mode 100644 index 000000000..d6bb276da --- /dev/null +++ b/.dotnet/src/Custom/Chat/ChatResponseModalities.cs @@ -0,0 +1,30 @@ +using System; + +namespace OpenAI.Chat; + +/// +/// Specifies the types of output content the model should generate for a chat completion request. +/// +/// +/// Most models can generate text, which is the default . Some models like +/// gpt-4o-audio-preview can also generate audio, which can be requested by combining +/// +/// | . +/// +/// +[Flags] +public enum ChatResponseModalities : int +{ + /// + /// The value which specifies that the model should produce its default set of output content modalities. + /// + Default = 0, + /// + /// The flag that, if included, specifies that the model should generate text content in its response. + /// + Text = 1 << 0, + /// + /// The flag that, if included, specifies that the model should generate audio content in its response. + /// + Audio = 1 << 1, +} \ No newline at end of file diff --git a/.dotnet/src/Custom/Chat/Internal/GeneratorStubs.cs b/.dotnet/src/Custom/Chat/Internal/GeneratorStubs.cs index 660bc6d7e..f9ab32374 100644 --- a/.dotnet/src/Custom/Chat/Internal/GeneratorStubs.cs +++ b/.dotnet/src/Custom/Chat/Internal/GeneratorStubs.cs @@ -94,4 +94,16 @@ internal readonly partial struct InternalCreateChatCompletionStreamResponseServi internal partial class InternalCreateChatCompletionStreamResponseUsage { } [CodeGenModel("FunctionParameters")] -internal partial class InternalFunctionParameters { } \ No newline at end of file +internal partial class InternalFunctionParameters { } + +[CodeGenModel("CreateChatCompletionRequestModality")] +internal readonly partial struct InternalCreateChatCompletionRequestModality { } + +[CodeGenModel("ChatCompletionRequestMessageContentPartAudioType")] +internal readonly partial struct InternalChatCompletionRequestMessageContentPartAudioType { } + +[CodeGenModel("ChatCompletionRequestMessageContentPartAudio")] +internal partial class InternalChatCompletionRequestMessageContentPartAudio { } + +[CodeGenModel("ChatCompletionRequestMessageContentPartAudioInputAudio")] +internal partial class InternalChatCompletionRequestMessageContentPartAudioInputAudio { } diff --git a/.dotnet/src/Custom/Chat/OpenAIChatModelFactory.cs b/.dotnet/src/Custom/Chat/OpenAIChatModelFactory.cs index f4503f98e..d31de831b 100644 --- a/.dotnet/src/Custom/Chat/OpenAIChatModelFactory.cs +++ b/.dotnet/src/Custom/Chat/OpenAIChatModelFactory.cs @@ -22,7 +22,8 @@ public static ChatCompletion ChatCompletion( DateTimeOffset createdAt = default, string model = null, string systemFingerprint = null, - ChatTokenUsage usage = null) + ChatTokenUsage usage = null, + ChatOutputAudio outputAudio = null) { content ??= new ChatMessageContent(); toolCalls ??= new List(); @@ -32,6 +33,7 @@ public static ChatCompletion ChatCompletion( InternalChatCompletionResponseMessage message = new InternalChatCompletionResponseMessage( refusal, toolCalls.ToList(), + outputAudio, role, content, functionCall, @@ -63,6 +65,7 @@ public static ChatCompletion ChatCompletion( additionalBinaryDataProperties: null); } + /// Initializes a new instance of . /// A new instance for mocking. public static ChatTokenLogProbabilityDetails ChatTokenLogProbabilityDetails(string token = null, float logProbability = default, ReadOnlyMemory? utf8Bytes = null, IEnumerable topLogProbabilities = null) @@ -118,6 +121,16 @@ public static ChatOutputTokenUsageDetails ChatOutputTokenUsageDetails(int reason return new ChatOutputTokenUsageDetails(reasoningTokenCount, audioTokenCount, additionalBinaryDataProperties: null); } + public static ChatOutputAudio ChatOutputAudio(BinaryData audioBytes, string id = null, string transcript = null, DateTimeOffset expiresAt = default) + { + return new ChatOutputAudio( + id, + expiresAt, + transcript, + audioBytes, + additionalBinaryDataProperties: null); + } + /// Initializes a new instance of . /// A new instance for mocking. public static StreamingChatCompletionUpdate StreamingChatCompletionUpdate( @@ -133,7 +146,8 @@ public static StreamingChatCompletionUpdate StreamingChatCompletionUpdate( DateTimeOffset createdAt = default, string model = null, string systemFingerprint = null, - ChatTokenUsage usage = null) + ChatTokenUsage usage = null, + StreamingChatOutputAudioUpdate outputAudioUpdate = null) { contentUpdate ??= new ChatMessageContent(); toolCallUpdates ??= new List(); @@ -141,6 +155,7 @@ public static StreamingChatCompletionUpdate StreamingChatCompletionUpdate( refusalTokenLogProbabilities ??= new List(); InternalChatCompletionStreamResponseDelta delta = new InternalChatCompletionStreamResponseDelta( + outputAudioUpdate, functionCallUpdate, toolCallUpdates.ToList(), refusalUpdate, @@ -185,6 +200,28 @@ public static StreamingChatFunctionCallUpdate StreamingChatFunctionCallUpdate(st additionalBinaryDataProperties: null); } + /// + /// Initializes a new instance of . + /// + /// + /// + /// + /// + /// + public static StreamingChatOutputAudioUpdate StreamingChatOutputAudioUpdate( + string id = null, + DateTimeOffset? expiresAt = null, + string transcriptUpdate = null, + BinaryData audioBytesUpdate = null) + { + return new StreamingChatOutputAudioUpdate( + id, + expiresAt, + transcriptUpdate, + audioBytesUpdate, + additionalBinaryDataProperties: null); + } + /// Initializes a new instance of . /// A new instance for mocking. public static StreamingChatToolCallUpdate StreamingChatToolCallUpdate(int index = default, string toolCallId = null, ChatToolCallKind kind = default, string functionName = null, BinaryData functionArgumentsUpdate = null) diff --git a/.dotnet/src/Custom/Chat/Streaming/StreamingChatCompletionUpdate.cs b/.dotnet/src/Custom/Chat/Streaming/StreamingChatCompletionUpdate.cs index 54441795b..93d392013 100644 --- a/.dotnet/src/Custom/Chat/Streaming/StreamingChatCompletionUpdate.cs +++ b/.dotnet/src/Custom/Chat/Streaming/StreamingChatCompletionUpdate.cs @@ -93,10 +93,7 @@ public partial class StreamingChatCompletionUpdate /// Each streaming update contains only a small portion of tokens. To reconstitute the entire chat completion, /// all values across streaming updates must be combined. /// - public ChatMessageContent ContentUpdate => - _contentUpdate - ??= InternalChoiceDelta?.Content - ?? new ChatMessageContent(); + public ChatMessageContent ContentUpdate => _contentUpdate ??= InternalChoiceDelta?.Content ?? []; // CUSTOM: Flattened choice delta property. /// The tool calls generated by the model, such as function calls. @@ -112,4 +109,12 @@ public IReadOnlyList ToolCallUpdates // CUSTOM: Flattened choice delta property. [Obsolete($"This property is obsolete. Please use {nameof(ToolCallUpdates)} instead.")] public StreamingChatFunctionCallUpdate FunctionCallUpdate => InternalChoiceDelta?.FunctionCall; + + // CUSTOM: Flattened choice delta property. + /// + /// Incremental output audio generated by the model. Only expected when output audio has been requested via providing + /// to and only available with + /// supported models. + /// + public StreamingChatOutputAudioUpdate OutputAudioUpdate => InternalChoiceDelta?.Audio; } diff --git a/.dotnet/src/Custom/Chat/Streaming/StreamingChatOutputAudioUpdate.cs b/.dotnet/src/Custom/Chat/Streaming/StreamingChatOutputAudioUpdate.cs new file mode 100644 index 000000000..ef1058f99 --- /dev/null +++ b/.dotnet/src/Custom/Chat/Streaming/StreamingChatOutputAudioUpdate.cs @@ -0,0 +1,28 @@ +namespace OpenAI.Chat; + +using System; + +/// +/// Represents an audio update in a streaming chat response. +/// +[CodeGenModel("ChatCompletionMessageAudioChunk")] +public partial class StreamingChatOutputAudioUpdate +{ + // CUSTOM: Renamed for clarity of incremental data availability while streaming. + /// + /// The next, incremental audio transcript part from the streaming response. payloads + /// across all received instances should be concatenated to form the + /// full response audio transcript. + /// + [CodeGenMember("Transcript")] + public string TranscriptUpdate { get; } + + // CUSTOM: Renamed for clarity of incremental data availability while streaming. + /// + /// The next, incremental response audio data chunk from the streaming response. payloads + /// across all received instances should be concatenated to form the + /// full response audio. + /// + [CodeGenMember("Data")] + public BinaryData AudioBytesUpdate { get; } +} diff --git a/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.Serialization.cs b/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.Serialization.cs index 134a83526..742db33fd 100644 --- a/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.Serialization.cs +++ b/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.Serialization.cs @@ -9,7 +9,7 @@ internal static partial class ConversationContentModalitiesExtensions { internal static IList ToInternalModalities(this ConversationContentModalities modalities) { - List internalModalities = []; + ChangeTrackingList internalModalities = new(); if (modalities.HasFlag(ConversationContentModalities.Text)) { internalModalities.Add(InternalRealtimeRequestSessionModality.Text); diff --git a/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.cs b/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.cs index 4b7cecba6..e1f8dd4bc 100644 --- a/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.cs +++ b/.dotnet/src/Custom/RealtimeConversation/ConversationContentModalities.cs @@ -8,6 +8,7 @@ namespace OpenAI.RealtimeConversation; [Flags] public enum ConversationContentModalities : int { + Default = 0, Text = 1 << 0, Audio = 1 << 1, } \ No newline at end of file diff --git a/.dotnet/src/Custom/RealtimeConversation/ConversationItem.cs b/.dotnet/src/Custom/RealtimeConversation/ConversationItem.cs index 7d24f9543..96bc20804 100644 --- a/.dotnet/src/Custom/RealtimeConversation/ConversationItem.cs +++ b/.dotnet/src/Custom/RealtimeConversation/ConversationItem.cs @@ -5,7 +5,7 @@ namespace OpenAI.RealtimeConversation; [Experimental("OPENAI002")] -[CodeGenModel("RealtimeRequestItem")] +[CodeGenModel("RealtimeConversationRequestItem")] public partial class ConversationItem { public string FunctionCallId => (this as InternalRealtimeRequestFunctionCallItem)?.CallId; diff --git a/.dotnet/src/Custom/RealtimeConversation/Internal/GeneratorStubs.cs b/.dotnet/src/Custom/RealtimeConversation/Internal/GeneratorStubs.cs index 74c2eca27..3b1c192ec 100644 --- a/.dotnet/src/Custom/RealtimeConversation/Internal/GeneratorStubs.cs +++ b/.dotnet/src/Custom/RealtimeConversation/Internal/GeneratorStubs.cs @@ -12,6 +12,7 @@ namespace OpenAI.RealtimeConversation; [Experimental("OPENAI002")][CodeGenModel("RealtimeClientEventResponseCancel")] internal partial class InternalRealtimeClientEventResponseCancel { } [Experimental("OPENAI002")][CodeGenModel("RealtimeClientEventSessionUpdate")] internal partial class InternalRealtimeClientEventSessionUpdate { } [Experimental("OPENAI002")][CodeGenModel("RealtimeClientEventType")] internal readonly partial struct InternalRealtimeClientEventType { } +[Experimental("OPENAI002")][CodeGenModel("RealtimeConversationResponseItemObject")] internal readonly partial struct InternalRealtimeConversationResponseItemObject { } [Experimental("OPENAI002")][CodeGenModel("RealtimeItemType")] internal readonly partial struct InternalRealtimeItemType { } [Experimental("OPENAI002")][CodeGenModel("RealtimeRequestAudioContentPart")] internal partial class InternalRealtimeRequestAudioContentPart { } [Experimental("OPENAI002")][CodeGenModel("RealtimeRequestFunctionCallItem")] internal partial class InternalRealtimeRequestFunctionCallItem { } @@ -22,7 +23,6 @@ namespace OpenAI.RealtimeConversation; [Experimental("OPENAI002")][CodeGenModel("RealtimeResponseAudioContentPart")] internal partial class InternalRealtimeResponseAudioContentPart { } [Experimental("OPENAI002")][CodeGenModel("RealtimeResponseFunctionCallItem")] internal partial class InternalRealtimeResponseFunctionCallItem { } [Experimental("OPENAI002")][CodeGenModel("RealtimeResponseFunctionCallOutputItem")] internal partial class InternalRealtimeResponseFunctionCallOutputItem { } -[Experimental("OPENAI002")][CodeGenModel("RealtimeResponseItemObject")] internal readonly partial struct InternalRealtimeResponseItemObject { } [Experimental("OPENAI002")][CodeGenModel("RealtimeResponseObject")] internal readonly partial struct InternalRealtimeResponseObject { } [Experimental("OPENAI002")][CodeGenModel("RealtimeResponseSessionObject")] internal readonly partial struct InternalRealtimeResponseSessionObject { } [Experimental("OPENAI002")][CodeGenModel("RealtimeResponseTextContentPart")] internal partial class InternalRealtimeResponseTextContentPart { } @@ -35,9 +35,9 @@ namespace OpenAI.RealtimeConversation; [Experimental("OPENAI002")][CodeGenModel("RealtimeToolChoiceObject")] internal partial class InternalRealtimeToolChoiceObject { } [Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeClientEvent")] internal partial class UnknownRealtimeClientEvent { } [Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeContentPart")] internal partial class UnknownRealtimeContentPart { } -[Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeRequestItem")] internal partial class UnknownRealtimeRequestItem { } +[Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeConversationRequestItem")] internal partial class UnknownRealtimeRequestItem { } [Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeRequestMessageItem")] internal partial class UnknownRealtimeRequestMessageItem { } -[Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeResponseItem")] internal partial class UnknownRealtimeResponseItem { } +[Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeConversationResponseItem")] internal partial class UnknownRealtimeResponseItem { } [Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeResponseStatusDetails")] internal partial class UnknownRealtimeResponseStatusDetails { } [Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeServerEvent")] internal partial class UnknownRealtimeServerEvent { } [Experimental("OPENAI002")][CodeGenModel("UnknownRealtimeTool")] internal partial class UnknownRealtimeTool { } diff --git a/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeClientEventResponseCreateResponse.cs b/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeClientEventResponseCreateResponse.cs index b4338272c..e6e6112f5 100644 --- a/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeClientEventResponseCreateResponse.cs +++ b/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeClientEventResponseCreateResponse.cs @@ -8,13 +8,13 @@ namespace OpenAI.RealtimeConversation; [Experimental("OPENAI002")] -[CodeGenModel("RealtimeClientEventResponseCreateResponse")] -internal partial class InternalRealtimeClientEventResponseCreateResponse +[CodeGenModel("RealtimeResponseOptions")] +internal partial class InternalRealtimeResponseOptions { [CodeGenMember("ToolChoice")] public BinaryData ToolChoice { get; set; } - public static InternalRealtimeClientEventResponseCreateResponse FromSessionOptions( + public static InternalRealtimeResponseOptions FromSessionOptions( ConversationSessionOptions sessionOptions) { Argument.AssertNotNull(sessionOptions, nameof(sessionOptions)); @@ -28,17 +28,14 @@ public static InternalRealtimeClientEventResponseCreateResponse FromSessionOptio : null; IList internalModalities = sessionOptions.ContentModalities.ToInternalModalities(); - IList rawModalities = internalModalities.Count > 0 - ? internalModalities.Select(modality => modality.ToString()).ToList() - : new ChangeTrackingList(); BinaryData toolChoice = Optional.IsDefined(sessionOptions.ToolChoice) ? ModelReaderWriter.Write(sessionOptions.ToolChoice) : null; - InternalRealtimeClientEventResponseCreateResponse internalOptions = new( - modalities: rawModalities, + InternalRealtimeResponseOptions internalOptions = new( + modalities: internalModalities, instructions: sessionOptions.Instructions, - voice: sessionOptions.Voice?.ToString(), - outputAudioFormat: sessionOptions.OutputAudioFormat?.ToString(), + voice: sessionOptions.Voice, + outputAudioFormat: sessionOptions.OutputAudioFormat, tools: sessionOptions.Tools, toolChoice: toolChoice, temperature: sessionOptions.Temperature, diff --git a/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseItem.cs b/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseItem.cs index 0679556f0..6dd2e9143 100644 --- a/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseItem.cs +++ b/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseItem.cs @@ -4,8 +4,8 @@ namespace OpenAI.RealtimeConversation; [Experimental("OPENAI002")] -[CodeGenModel("RealtimeResponseItem")] -internal partial class InternalRealtimeResponseItem +[CodeGenModel("RealtimeConversationResponseItem")] +internal partial class InternalRealtimeConversationResponseItem { public string ResponseId => (this as InternalRealtimeResponseMessageItem)?.ResponseId diff --git a/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseMessageItem.cs b/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseMessageItem.cs index a69346981..25dc0c539 100644 --- a/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseMessageItem.cs +++ b/.dotnet/src/Custom/RealtimeConversation/Internal/InternalRealtimeResponseMessageItem.cs @@ -1,3 +1,4 @@ +using System.Collections.Generic; using System.Diagnostics.CodeAnalysis; namespace OpenAI.RealtimeConversation; @@ -6,6 +7,13 @@ namespace OpenAI.RealtimeConversation; [CodeGenModel("RealtimeResponseMessageItem")] internal partial class InternalRealtimeResponseMessageItem { + // CUSTOM: Use the available strong type for roles. + [CodeGenMember("Role")] public ConversationMessageRole Role { get; } + + // CUSTOM: Explicitly apply response model read-only. + + [CodeGenMember("Content")] + public IReadOnlyList Content { get; } } diff --git a/.dotnet/src/Custom/RealtimeConversation/RealtimeConversationSession.cs b/.dotnet/src/Custom/RealtimeConversation/RealtimeConversationSession.cs index 6ba3f6ab5..07f375398 100644 --- a/.dotnet/src/Custom/RealtimeConversation/RealtimeConversationSession.cs +++ b/.dotnet/src/Custom/RealtimeConversation/RealtimeConversationSession.cs @@ -278,14 +278,14 @@ public virtual void InterruptResponse(CancellationToken cancellationToken = defa public virtual async Task StartResponseAsync(CancellationToken cancellationToken = default) { - InternalRealtimeClientEventResponseCreateResponse internalOptions = new(); + InternalRealtimeResponseOptions internalOptions = new(); InternalRealtimeClientEventResponseCreate internalCommand = new(internalOptions); await SendCommandAsync(internalCommand, cancellationToken).ConfigureAwait(false); } public virtual void StartResponse(CancellationToken cancellationToken = default) { - InternalRealtimeClientEventResponseCreateResponse internalOptions = new(); + InternalRealtimeResponseOptions internalOptions = new(); InternalRealtimeClientEventResponseCreate internalCommand = new(internalOptions); SendCommand(internalCommand, cancellationToken); } @@ -293,8 +293,8 @@ public virtual void StartResponse(CancellationToken cancellationToken = default) public virtual async Task StartResponseAsync(ConversationSessionOptions sessionOptionOverrides, CancellationToken cancellationToken = default) { Argument.AssertNotNull(sessionOptionOverrides, nameof(sessionOptionOverrides)); - InternalRealtimeClientEventResponseCreateResponse internalOptions - = InternalRealtimeClientEventResponseCreateResponse.FromSessionOptions(sessionOptionOverrides); + InternalRealtimeResponseOptions internalOptions + = InternalRealtimeResponseOptions.FromSessionOptions(sessionOptionOverrides); InternalRealtimeClientEventResponseCreate internalCommand = new(internalOptions); await SendCommandAsync(internalCommand, cancellationToken).ConfigureAwait(false); } @@ -302,8 +302,8 @@ InternalRealtimeClientEventResponseCreateResponse internalOptions public virtual void StartResponse(ConversationSessionOptions sessionOptionOverrides, CancellationToken cancellationToken = default) { Argument.AssertNotNull(sessionOptionOverrides, nameof(sessionOptionOverrides)); - InternalRealtimeClientEventResponseCreateResponse internalOptions - = InternalRealtimeClientEventResponseCreateResponse.FromSessionOptions(sessionOptionOverrides); + InternalRealtimeResponseOptions internalOptions + = InternalRealtimeResponseOptions.FromSessionOptions(sessionOptionOverrides); InternalRealtimeClientEventResponseCreate internalCommand = new(internalOptions); SendCommand(internalCommand, cancellationToken); } diff --git a/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemCreatedUpdate.cs b/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemCreatedUpdate.cs index 9b98488e4..ca5600641 100644 --- a/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemCreatedUpdate.cs +++ b/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemCreatedUpdate.cs @@ -17,7 +17,7 @@ namespace OpenAI.RealtimeConversation; public partial class ConversationItemCreatedUpdate { [CodeGenMember("Item")] - private readonly InternalRealtimeResponseItem _internalItem; + private readonly InternalRealtimeConversationResponseItem _internalItem; public string ItemId => _internalItem.Id; diff --git a/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingFinishedUpdate.cs b/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingFinishedUpdate.cs index 86881012d..4f68f7d95 100644 --- a/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingFinishedUpdate.cs +++ b/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingFinishedUpdate.cs @@ -15,7 +15,7 @@ namespace OpenAI.RealtimeConversation; public partial class ConversationItemStreamingFinishedUpdate { [CodeGenMember("Item")] - private readonly InternalRealtimeResponseItem _internalItem; + private readonly InternalRealtimeConversationResponseItem _internalItem; public string ItemId => _internalItem.Id; diff --git a/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingStartedUpdate.cs b/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingStartedUpdate.cs index d1d4b538a..a5ce8974e 100644 --- a/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingStartedUpdate.cs +++ b/.dotnet/src/Custom/RealtimeConversation/ResponseUpdates/ConversationItemStreamingStartedUpdate.cs @@ -14,7 +14,7 @@ namespace OpenAI.RealtimeConversation; public partial class ConversationItemStreamingStartedUpdate { [CodeGenMember("Item")] - private readonly InternalRealtimeResponseItem _internalItem; + private readonly InternalRealtimeConversationResponseItem _internalItem; public string ItemId => _internalItem.Id; diff --git a/.dotnet/src/Generated/Models/AssistantChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/AssistantChatMessage.Serialization.cs index dd6564373..529601ef0 100644 --- a/.dotnet/src/Generated/Models/AssistantChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/AssistantChatMessage.Serialization.cs @@ -60,6 +60,18 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri writer.WriteNull("functionCall"u8); } } + if (Optional.IsDefined(OutputAudioReference) && _additionalBinaryDataProperties?.ContainsKey("audio") != true) + { + if (OutputAudioReference != null) + { + writer.WritePropertyName("audio"u8); + writer.WriteObjectValue(OutputAudioReference, options); + } + else + { + writer.WriteNull("audio"u8); + } + } } AssistantChatMessage IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => (AssistantChatMessage)JsonModelCreateCore(ref reader, options); @@ -81,23 +93,24 @@ internal static AssistantChatMessage DeserializeAssistantChatMessage(JsonElement { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string refusal = default; string participantName = default; IList toolCalls = default; ChatFunctionCall functionCall = default; + ChatOutputAudioReference outputAudioReference = default; foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (prop.NameEquals("refusal"u8)) @@ -139,6 +152,16 @@ internal static AssistantChatMessage DeserializeAssistantChatMessage(JsonElement functionCall = ChatFunctionCall.DeserializeChatFunctionCall(prop.Value, options); continue; } + if (prop.NameEquals("audio"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + outputAudioReference = null; + continue; + } + outputAudioReference = ChatOutputAudioReference.DeserializeChatOutputAudioReference(prop.Value, options); + continue; + } if (true) { additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); @@ -146,13 +169,14 @@ internal static AssistantChatMessage DeserializeAssistantChatMessage(JsonElement } // CUSTOM: Initialize Content collection property. return new AssistantChatMessage( - role, content ?? new ChatMessageContent(), + role, additionalBinaryDataProperties, refusal, participantName, toolCalls ?? new ChangeTrackingList(), - functionCall); + functionCall, + outputAudioReference); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/AssistantChatMessage.cs b/.dotnet/src/Generated/Models/AssistantChatMessage.cs index e93ddae80..0fd143349 100644 --- a/.dotnet/src/Generated/Models/AssistantChatMessage.cs +++ b/.dotnet/src/Generated/Models/AssistantChatMessage.cs @@ -9,12 +9,13 @@ namespace OpenAI.Chat { public partial class AssistantChatMessage : ChatMessage { - internal AssistantChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties, string refusal, string participantName, IList toolCalls, ChatFunctionCall functionCall) : base(role, content, additionalBinaryDataProperties) + internal AssistantChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties, string refusal, string participantName, IList toolCalls, ChatFunctionCall functionCall, ChatOutputAudioReference outputAudioReference) : base(content, role, additionalBinaryDataProperties) { Refusal = refusal; ParticipantName = participantName; ToolCalls = toolCalls; FunctionCall = functionCall; + OutputAudioReference = outputAudioReference; } public string Refusal { get; set; } diff --git a/.dotnet/src/Generated/Models/ChatAudioOptions.Serialization.cs b/.dotnet/src/Generated/Models/ChatAudioOptions.Serialization.cs new file mode 100644 index 000000000..1b6186096 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatAudioOptions.Serialization.cs @@ -0,0 +1,156 @@ +// + +#nullable disable + +using System; +using System.ClientModel; +using System.ClientModel.Primitives; +using System.Collections.Generic; +using System.Text.Json; +using OpenAI; + +namespace OpenAI.Chat +{ + public partial class ChatAudioOptions : IJsonModel + { + internal ChatAudioOptions() + { + } + + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + writer.WriteStartObject(); + JsonModelWriteCore(writer, options); + writer.WriteEndObject(); + } + + protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(ChatAudioOptions)} does not support writing '{format}' format."); + } + if (_additionalBinaryDataProperties?.ContainsKey("voice") != true) + { + writer.WritePropertyName("voice"u8); + writer.WriteStringValue(OutputAudioVoice.ToString()); + } + if (_additionalBinaryDataProperties?.ContainsKey("format") != true) + { + writer.WritePropertyName("format"u8); + writer.WriteStringValue(OutputAudioFormat.ToString()); + } + if (true && _additionalBinaryDataProperties != null) + { + foreach (var item in _additionalBinaryDataProperties) + { + if (ModelSerializationExtensions.IsSentinelValue(item.Value)) + { + continue; + } + writer.WritePropertyName(item.Key); +#if NET6_0_OR_GREATER + writer.WriteRawValue(item.Value); +#else + using (JsonDocument document = JsonDocument.Parse(item.Value)) + { + JsonSerializer.Serialize(writer, document.RootElement); + } +#endif + } + } + } + + ChatAudioOptions IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + + protected virtual ChatAudioOptions JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(ChatAudioOptions)} does not support reading '{format}' format."); + } + using JsonDocument document = JsonDocument.ParseValue(ref reader); + return DeserializeChatAudioOptions(document.RootElement, options); + } + + internal static ChatAudioOptions DeserializeChatAudioOptions(JsonElement element, ModelReaderWriterOptions options) + { + if (element.ValueKind == JsonValueKind.Null) + { + return null; + } + ChatOutputAudioVoice outputAudioVoice = default; + ChatOutputAudioFormat outputAudioFormat = default; + IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); + foreach (var prop in element.EnumerateObject()) + { + if (prop.NameEquals("voice"u8)) + { + outputAudioVoice = new ChatOutputAudioVoice(prop.Value.GetString()); + continue; + } + if (prop.NameEquals("format"u8)) + { + outputAudioFormat = new ChatOutputAudioFormat(prop.Value.GetString()); + continue; + } + if (true) + { + additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); + } + } + return new ChatAudioOptions(outputAudioVoice, outputAudioFormat, additionalBinaryDataProperties); + } + + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + + protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + return ModelReaderWriter.Write(this, options); + default: + throw new FormatException($"The model {nameof(ChatAudioOptions)} does not support writing '{options.Format}' format."); + } + } + + ChatAudioOptions IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + + protected virtual ChatAudioOptions PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + using (JsonDocument document = JsonDocument.Parse(data)) + { + return DeserializeChatAudioOptions(document.RootElement, options); + } + default: + throw new FormatException($"The model {nameof(ChatAudioOptions)} does not support reading '{options.Format}' format."); + } + } + + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + + public static implicit operator BinaryContent(ChatAudioOptions chatAudioOptions) + { + if (chatAudioOptions == null) + { + return null; + } + return BinaryContent.Create(chatAudioOptions, ModelSerializationExtensions.WireOptions); + } + + public static explicit operator ChatAudioOptions(ClientResult result) + { + using PipelineResponse response = result.GetRawResponse(); + using JsonDocument document = JsonDocument.Parse(response.Content); + return DeserializeChatAudioOptions(document.RootElement, ModelSerializationExtensions.WireOptions); + } + } +} diff --git a/.dotnet/src/Generated/Models/ChatAudioOptions.cs b/.dotnet/src/Generated/Models/ChatAudioOptions.cs new file mode 100644 index 000000000..18b018e26 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatAudioOptions.cs @@ -0,0 +1,33 @@ +// + +#nullable disable + +using System; +using System.Collections.Generic; + +namespace OpenAI.Chat +{ + public partial class ChatAudioOptions + { + private protected IDictionary _additionalBinaryDataProperties; + + public ChatAudioOptions(ChatOutputAudioVoice outputAudioVoice, ChatOutputAudioFormat outputAudioFormat) + { + OutputAudioVoice = outputAudioVoice; + OutputAudioFormat = outputAudioFormat; + } + + internal ChatAudioOptions(ChatOutputAudioVoice outputAudioVoice, ChatOutputAudioFormat outputAudioFormat, IDictionary additionalBinaryDataProperties) + { + OutputAudioVoice = outputAudioVoice; + OutputAudioFormat = outputAudioFormat; + _additionalBinaryDataProperties = additionalBinaryDataProperties; + } + + internal IDictionary SerializedAdditionalRawData + { + get => _additionalBinaryDataProperties; + set => _additionalBinaryDataProperties = value; + } + } +} diff --git a/.dotnet/src/Generated/Models/ChatCompletionOptions.Serialization.cs b/.dotnet/src/Generated/Models/ChatCompletionOptions.Serialization.cs index 8f1a29eca..42cdea309 100644 --- a/.dotnet/src/Generated/Models/ChatCompletionOptions.Serialization.cs +++ b/.dotnet/src/Generated/Models/ChatCompletionOptions.Serialization.cs @@ -297,6 +297,35 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit writer.WriteNull("serviceTier"u8); } } + if (Optional.IsCollectionDefined(_internalModalities) && _additionalBinaryDataProperties?.ContainsKey("modalities") != true) + { + if (_internalModalities != null) + { + writer.WritePropertyName("modalities"u8); + writer.WriteStartArray(); + foreach (InternalCreateChatCompletionRequestModality item in _internalModalities) + { + writer.WriteStringValue(item.ToString()); + } + writer.WriteEndArray(); + } + else + { + writer.WriteNull("modalities"u8); + } + } + if (Optional.IsDefined(_audioOptions) && _additionalBinaryDataProperties?.ContainsKey("audio") != true) + { + if (_audioOptions != null) + { + writer.WritePropertyName("audio"u8); + writer.WriteObjectValue(_audioOptions, options); + } + else + { + writer.WriteNull("audio"u8); + } + } if (true && _additionalBinaryDataProperties != null) { foreach (var item in _additionalBinaryDataProperties) @@ -363,6 +392,8 @@ internal static ChatCompletionOptions DeserializeChatCompletionOptions(JsonEleme IDictionary metadata = default; bool? storedOutputEnabled = default; InternalCreateChatCompletionRequestServiceTier? serviceTier = default; + IList internalModalities = default; + ChatAudioOptions audioOptions = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); foreach (var prop in element.EnumerateObject()) { @@ -621,6 +652,30 @@ internal static ChatCompletionOptions DeserializeChatCompletionOptions(JsonEleme serviceTier = new InternalCreateChatCompletionRequestServiceTier(prop.Value.GetString()); continue; } + if (prop.NameEquals("modalities"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + continue; + } + List array = new List(); + foreach (var item in prop.Value.EnumerateArray()) + { + array.Add(new InternalCreateChatCompletionRequestModality(item.GetString())); + } + internalModalities = array; + continue; + } + if (prop.NameEquals("audio"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + audioOptions = null; + continue; + } + audioOptions = ChatAudioOptions.DeserializeChatAudioOptions(prop.Value, options); + continue; + } if (true) { additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); @@ -653,6 +708,8 @@ internal static ChatCompletionOptions DeserializeChatCompletionOptions(JsonEleme metadata ?? new ChangeTrackingDictionary(), storedOutputEnabled, serviceTier, + internalModalities, + audioOptions, additionalBinaryDataProperties); } diff --git a/.dotnet/src/Generated/Models/ChatCompletionOptions.cs b/.dotnet/src/Generated/Models/ChatCompletionOptions.cs index 431b4bed0..b5e6dfe2b 100644 --- a/.dotnet/src/Generated/Models/ChatCompletionOptions.cs +++ b/.dotnet/src/Generated/Models/ChatCompletionOptions.cs @@ -11,7 +11,7 @@ public partial class ChatCompletionOptions { private protected IDictionary _additionalBinaryDataProperties; - internal ChatCompletionOptions(float? frequencyPenalty, float? presencePenalty, ChatResponseFormat responseFormat, float? temperature, float? topP, IList tools, IList messages, InternalCreateChatCompletionRequestModel model, int? n, bool? stream, InternalChatCompletionStreamOptions streamOptions, bool? includeLogProbabilities, int? topLogProbabilityCount, IList stopSequences, IDictionary logitBiases, ChatToolChoice toolChoice, ChatFunctionChoice functionChoice, bool? allowParallelToolCalls, string endUserId, long? seed, int? deprecatedMaxTokens, int? maxOutputTokenCount, IList functions, IDictionary metadata, bool? storedOutputEnabled, InternalCreateChatCompletionRequestServiceTier? serviceTier, IDictionary additionalBinaryDataProperties) + internal ChatCompletionOptions(float? frequencyPenalty, float? presencePenalty, ChatResponseFormat responseFormat, float? temperature, float? topP, IList tools, IList messages, InternalCreateChatCompletionRequestModel model, int? n, bool? stream, InternalChatCompletionStreamOptions streamOptions, bool? includeLogProbabilities, int? topLogProbabilityCount, IList stopSequences, IDictionary logitBiases, ChatToolChoice toolChoice, ChatFunctionChoice functionChoice, bool? allowParallelToolCalls, string endUserId, long? seed, int? deprecatedMaxTokens, int? maxOutputTokenCount, IList functions, IDictionary metadata, bool? storedOutputEnabled, InternalCreateChatCompletionRequestServiceTier? serviceTier, IList internalModalities, ChatAudioOptions audioOptions, IDictionary additionalBinaryDataProperties) { FrequencyPenalty = frequencyPenalty; PresencePenalty = presencePenalty; @@ -39,6 +39,8 @@ internal ChatCompletionOptions(float? frequencyPenalty, float? presencePenalty, Metadata = metadata; StoredOutputEnabled = storedOutputEnabled; _serviceTier = serviceTier; + _internalModalities = internalModalities; + _audioOptions = audioOptions; _additionalBinaryDataProperties = additionalBinaryDataProperties; } diff --git a/.dotnet/src/Generated/Models/ChatInputAudioFormat.cs b/.dotnet/src/Generated/Models/ChatInputAudioFormat.cs new file mode 100644 index 000000000..ab86472b5 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatInputAudioFormat.cs @@ -0,0 +1,44 @@ +// + +#nullable disable + +using System; +using System.ComponentModel; +using OpenAI; + +namespace OpenAI.Chat +{ + public readonly partial struct ChatInputAudioFormat : IEquatable + { + private readonly string _value; + private const string WavValue = "wav"; + private const string Mp3Value = "mp3"; + + public ChatInputAudioFormat(string value) + { + Argument.AssertNotNull(value, nameof(value)); + + _value = value; + } + + public static ChatInputAudioFormat Wav { get; } = new ChatInputAudioFormat(WavValue); + + public static ChatInputAudioFormat Mp3 { get; } = new ChatInputAudioFormat(Mp3Value); + + public static bool operator ==(ChatInputAudioFormat left, ChatInputAudioFormat right) => left.Equals(right); + + public static bool operator !=(ChatInputAudioFormat left, ChatInputAudioFormat right) => !left.Equals(right); + + public static implicit operator ChatInputAudioFormat(string value) => new ChatInputAudioFormat(value); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override bool Equals(object obj) => obj is ChatInputAudioFormat other && Equals(other); + + public bool Equals(ChatInputAudioFormat other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; + + public override string ToString() => _value; + } +} diff --git a/.dotnet/src/Generated/Models/ChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/ChatMessage.Serialization.cs index b39d4ef3b..0f27516c0 100644 --- a/.dotnet/src/Generated/Models/ChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/ChatMessage.Serialization.cs @@ -20,17 +20,17 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit { throw new FormatException($"The model {nameof(ChatMessage)} does not support writing '{format}' format."); } - if (_additionalBinaryDataProperties?.ContainsKey("role") != true) - { - writer.WritePropertyName("role"u8); - writer.WriteStringValue(Role.ToSerialString()); - } // CUSTOM: Check inner collection is defined. if (true && Optional.IsDefined(Content) && Content.IsInnerCollectionDefined() && _additionalBinaryDataProperties?.ContainsKey("content") != true) { writer.WritePropertyName("content"u8); this.SerializeContentValue(writer, options); } + if (_additionalBinaryDataProperties?.ContainsKey("role") != true) + { + writer.WritePropertyName("role"u8); + writer.WriteStringValue(Role.ToSerialString()); + } if (true && _additionalBinaryDataProperties != null) { foreach (var item in _additionalBinaryDataProperties) diff --git a/.dotnet/src/Generated/Models/ChatMessage.cs b/.dotnet/src/Generated/Models/ChatMessage.cs index 368ee1c8d..89391cdf9 100644 --- a/.dotnet/src/Generated/Models/ChatMessage.cs +++ b/.dotnet/src/Generated/Models/ChatMessage.cs @@ -11,10 +11,10 @@ public partial class ChatMessage { private protected IDictionary _additionalBinaryDataProperties; - internal ChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties) + internal ChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties) { - Role = role; Content = content; + Role = role; _additionalBinaryDataProperties = additionalBinaryDataProperties; } diff --git a/.dotnet/src/Generated/Models/ChatOutputAudio.Serialization.cs b/.dotnet/src/Generated/Models/ChatOutputAudio.Serialization.cs new file mode 100644 index 000000000..42f31f886 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatOutputAudio.Serialization.cs @@ -0,0 +1,178 @@ +// + +#nullable disable + +using System; +using System.ClientModel; +using System.ClientModel.Primitives; +using System.Collections.Generic; +using System.Text.Json; +using OpenAI; + +namespace OpenAI.Chat +{ + public partial class ChatOutputAudio : IJsonModel + { + internal ChatOutputAudio() + { + } + + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + writer.WriteStartObject(); + JsonModelWriteCore(writer, options); + writer.WriteEndObject(); + } + + protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(ChatOutputAudio)} does not support writing '{format}' format."); + } + if (_additionalBinaryDataProperties?.ContainsKey("id") != true) + { + writer.WritePropertyName("id"u8); + writer.WriteStringValue(Id); + } + if (_additionalBinaryDataProperties?.ContainsKey("expires_at") != true) + { + writer.WritePropertyName("expires_at"u8); + writer.WriteNumberValue(ExpiresAt, "U"); + } + if (_additionalBinaryDataProperties?.ContainsKey("transcript") != true) + { + writer.WritePropertyName("transcript"u8); + writer.WriteStringValue(Transcript); + } + if (_additionalBinaryDataProperties?.ContainsKey("data") != true) + { + writer.WritePropertyName("data"u8); + writer.WriteBase64StringValue(AudioBytes.ToArray(), "D"); + } + if (true && _additionalBinaryDataProperties != null) + { + foreach (var item in _additionalBinaryDataProperties) + { + if (ModelSerializationExtensions.IsSentinelValue(item.Value)) + { + continue; + } + writer.WritePropertyName(item.Key); +#if NET6_0_OR_GREATER + writer.WriteRawValue(item.Value); +#else + using (JsonDocument document = JsonDocument.Parse(item.Value)) + { + JsonSerializer.Serialize(writer, document.RootElement); + } +#endif + } + } + } + + ChatOutputAudio IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + + protected virtual ChatOutputAudio JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(ChatOutputAudio)} does not support reading '{format}' format."); + } + using JsonDocument document = JsonDocument.ParseValue(ref reader); + return DeserializeChatOutputAudio(document.RootElement, options); + } + + internal static ChatOutputAudio DeserializeChatOutputAudio(JsonElement element, ModelReaderWriterOptions options) + { + if (element.ValueKind == JsonValueKind.Null) + { + return null; + } + string id = default; + DateTimeOffset expiresAt = default; + string transcript = default; + BinaryData audioBytes = default; + IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); + foreach (var prop in element.EnumerateObject()) + { + if (prop.NameEquals("id"u8)) + { + id = prop.Value.GetString(); + continue; + } + if (prop.NameEquals("expires_at"u8)) + { + expiresAt = DateTimeOffset.FromUnixTimeSeconds(prop.Value.GetInt64()); + continue; + } + if (prop.NameEquals("transcript"u8)) + { + transcript = prop.Value.GetString(); + continue; + } + if (prop.NameEquals("data"u8)) + { + audioBytes = BinaryData.FromBytes(prop.Value.GetBytesFromBase64("D")); + continue; + } + if (true) + { + additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); + } + } + return new ChatOutputAudio(id, expiresAt, transcript, audioBytes, additionalBinaryDataProperties); + } + + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + + protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + return ModelReaderWriter.Write(this, options); + default: + throw new FormatException($"The model {nameof(ChatOutputAudio)} does not support writing '{options.Format}' format."); + } + } + + ChatOutputAudio IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + + protected virtual ChatOutputAudio PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + using (JsonDocument document = JsonDocument.Parse(data)) + { + return DeserializeChatOutputAudio(document.RootElement, options); + } + default: + throw new FormatException($"The model {nameof(ChatOutputAudio)} does not support reading '{options.Format}' format."); + } + } + + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + + public static implicit operator BinaryContent(ChatOutputAudio chatOutputAudio) + { + if (chatOutputAudio == null) + { + return null; + } + return BinaryContent.Create(chatOutputAudio, ModelSerializationExtensions.WireOptions); + } + + public static explicit operator ChatOutputAudio(ClientResult result) + { + using PipelineResponse response = result.GetRawResponse(); + using JsonDocument document = JsonDocument.Parse(response.Content); + return DeserializeChatOutputAudio(document.RootElement, ModelSerializationExtensions.WireOptions); + } + } +} diff --git a/.dotnet/src/Generated/Models/ChatOutputAudio.cs b/.dotnet/src/Generated/Models/ChatOutputAudio.cs new file mode 100644 index 000000000..cf03c955a --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatOutputAudio.cs @@ -0,0 +1,43 @@ +// + +#nullable disable + +using System; +using System.Collections.Generic; + +namespace OpenAI.Chat +{ + public partial class ChatOutputAudio + { + private protected IDictionary _additionalBinaryDataProperties; + + internal ChatOutputAudio(string id, DateTimeOffset expiresAt, string transcript, BinaryData audioBytes) + { + Id = id; + ExpiresAt = expiresAt; + Transcript = transcript; + AudioBytes = audioBytes; + } + + internal ChatOutputAudio(string id, DateTimeOffset expiresAt, string transcript, BinaryData audioBytes, IDictionary additionalBinaryDataProperties) + { + Id = id; + ExpiresAt = expiresAt; + Transcript = transcript; + AudioBytes = audioBytes; + _additionalBinaryDataProperties = additionalBinaryDataProperties; + } + + public string Id { get; } + + public DateTimeOffset ExpiresAt { get; } + + public string Transcript { get; } + + internal IDictionary SerializedAdditionalRawData + { + get => _additionalBinaryDataProperties; + set => _additionalBinaryDataProperties = value; + } + } +} diff --git a/.dotnet/src/Generated/Models/ChatOutputAudioFormat.cs b/.dotnet/src/Generated/Models/ChatOutputAudioFormat.cs new file mode 100644 index 000000000..bb9b98365 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatOutputAudioFormat.cs @@ -0,0 +1,53 @@ +// + +#nullable disable + +using System; +using System.ComponentModel; +using OpenAI; + +namespace OpenAI.Chat +{ + public readonly partial struct ChatOutputAudioFormat : IEquatable + { + private readonly string _value; + private const string WavValue = "wav"; + private const string Mp3Value = "mp3"; + private const string FlacValue = "flac"; + private const string OpusValue = "opus"; + private const string Pcm16Value = "pcm16"; + + public ChatOutputAudioFormat(string value) + { + Argument.AssertNotNull(value, nameof(value)); + + _value = value; + } + + public static ChatOutputAudioFormat Wav { get; } = new ChatOutputAudioFormat(WavValue); + + public static ChatOutputAudioFormat Mp3 { get; } = new ChatOutputAudioFormat(Mp3Value); + + public static ChatOutputAudioFormat Flac { get; } = new ChatOutputAudioFormat(FlacValue); + + public static ChatOutputAudioFormat Opus { get; } = new ChatOutputAudioFormat(OpusValue); + + public static ChatOutputAudioFormat Pcm16 { get; } = new ChatOutputAudioFormat(Pcm16Value); + + public static bool operator ==(ChatOutputAudioFormat left, ChatOutputAudioFormat right) => left.Equals(right); + + public static bool operator !=(ChatOutputAudioFormat left, ChatOutputAudioFormat right) => !left.Equals(right); + + public static implicit operator ChatOutputAudioFormat(string value) => new ChatOutputAudioFormat(value); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override bool Equals(object obj) => obj is ChatOutputAudioFormat other && Equals(other); + + public bool Equals(ChatOutputAudioFormat other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; + + public override string ToString() => _value; + } +} diff --git a/.dotnet/src/Generated/Models/ChatOutputAudioReference.Serialization.cs b/.dotnet/src/Generated/Models/ChatOutputAudioReference.Serialization.cs new file mode 100644 index 000000000..590c2dd88 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatOutputAudioReference.Serialization.cs @@ -0,0 +1,145 @@ +// + +#nullable disable + +using System; +using System.ClientModel; +using System.ClientModel.Primitives; +using System.Collections.Generic; +using System.Text.Json; +using OpenAI; + +namespace OpenAI.Chat +{ + public partial class ChatOutputAudioReference : IJsonModel + { + internal ChatOutputAudioReference() + { + } + + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + writer.WriteStartObject(); + JsonModelWriteCore(writer, options); + writer.WriteEndObject(); + } + + protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(ChatOutputAudioReference)} does not support writing '{format}' format."); + } + if (_additionalBinaryDataProperties?.ContainsKey("id") != true) + { + writer.WritePropertyName("id"u8); + writer.WriteStringValue(Id); + } + if (true && _additionalBinaryDataProperties != null) + { + foreach (var item in _additionalBinaryDataProperties) + { + if (ModelSerializationExtensions.IsSentinelValue(item.Value)) + { + continue; + } + writer.WritePropertyName(item.Key); +#if NET6_0_OR_GREATER + writer.WriteRawValue(item.Value); +#else + using (JsonDocument document = JsonDocument.Parse(item.Value)) + { + JsonSerializer.Serialize(writer, document.RootElement); + } +#endif + } + } + } + + ChatOutputAudioReference IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + + protected virtual ChatOutputAudioReference JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(ChatOutputAudioReference)} does not support reading '{format}' format."); + } + using JsonDocument document = JsonDocument.ParseValue(ref reader); + return DeserializeChatOutputAudioReference(document.RootElement, options); + } + + internal static ChatOutputAudioReference DeserializeChatOutputAudioReference(JsonElement element, ModelReaderWriterOptions options) + { + if (element.ValueKind == JsonValueKind.Null) + { + return null; + } + string id = default; + IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); + foreach (var prop in element.EnumerateObject()) + { + if (prop.NameEquals("id"u8)) + { + id = prop.Value.GetString(); + continue; + } + if (true) + { + additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); + } + } + return new ChatOutputAudioReference(id, additionalBinaryDataProperties); + } + + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + + protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + return ModelReaderWriter.Write(this, options); + default: + throw new FormatException($"The model {nameof(ChatOutputAudioReference)} does not support writing '{options.Format}' format."); + } + } + + ChatOutputAudioReference IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + + protected virtual ChatOutputAudioReference PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + using (JsonDocument document = JsonDocument.Parse(data)) + { + return DeserializeChatOutputAudioReference(document.RootElement, options); + } + default: + throw new FormatException($"The model {nameof(ChatOutputAudioReference)} does not support reading '{options.Format}' format."); + } + } + + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + + public static implicit operator BinaryContent(ChatOutputAudioReference chatOutputAudioReference) + { + if (chatOutputAudioReference == null) + { + return null; + } + return BinaryContent.Create(chatOutputAudioReference, ModelSerializationExtensions.WireOptions); + } + + public static explicit operator ChatOutputAudioReference(ClientResult result) + { + using PipelineResponse response = result.GetRawResponse(); + using JsonDocument document = JsonDocument.Parse(response.Content); + return DeserializeChatOutputAudioReference(document.RootElement, ModelSerializationExtensions.WireOptions); + } + } +} diff --git a/.dotnet/src/Generated/Models/ChatOutputAudioReference.cs b/.dotnet/src/Generated/Models/ChatOutputAudioReference.cs new file mode 100644 index 000000000..905a12475 --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatOutputAudioReference.cs @@ -0,0 +1,36 @@ +// + +#nullable disable + +using System; +using System.Collections.Generic; +using OpenAI; + +namespace OpenAI.Chat +{ + public partial class ChatOutputAudioReference + { + private protected IDictionary _additionalBinaryDataProperties; + + public ChatOutputAudioReference(string id) + { + Argument.AssertNotNull(id, nameof(id)); + + Id = id; + } + + internal ChatOutputAudioReference(string id, IDictionary additionalBinaryDataProperties) + { + Id = id; + _additionalBinaryDataProperties = additionalBinaryDataProperties; + } + + public string Id { get; } + + internal IDictionary SerializedAdditionalRawData + { + get => _additionalBinaryDataProperties; + set => _additionalBinaryDataProperties = value; + } + } +} diff --git a/.dotnet/src/Generated/Models/ChatOutputAudioVoice.cs b/.dotnet/src/Generated/Models/ChatOutputAudioVoice.cs new file mode 100644 index 000000000..8facf812d --- /dev/null +++ b/.dotnet/src/Generated/Models/ChatOutputAudioVoice.cs @@ -0,0 +1,56 @@ +// + +#nullable disable + +using System; +using System.ComponentModel; +using OpenAI; + +namespace OpenAI.Chat +{ + public readonly partial struct ChatOutputAudioVoice : IEquatable + { + private readonly string _value; + private const string AlloyValue = "alloy"; + private const string EchoValue = "echo"; + private const string FableValue = "fable"; + private const string OnyxValue = "onyx"; + private const string NovaValue = "nova"; + private const string ShimmerValue = "shimmer"; + + public ChatOutputAudioVoice(string value) + { + Argument.AssertNotNull(value, nameof(value)); + + _value = value; + } + + public static ChatOutputAudioVoice Alloy { get; } = new ChatOutputAudioVoice(AlloyValue); + + public static ChatOutputAudioVoice Echo { get; } = new ChatOutputAudioVoice(EchoValue); + + public static ChatOutputAudioVoice Fable { get; } = new ChatOutputAudioVoice(FableValue); + + public static ChatOutputAudioVoice Onyx { get; } = new ChatOutputAudioVoice(OnyxValue); + + public static ChatOutputAudioVoice Nova { get; } = new ChatOutputAudioVoice(NovaValue); + + public static ChatOutputAudioVoice Shimmer { get; } = new ChatOutputAudioVoice(ShimmerValue); + + public static bool operator ==(ChatOutputAudioVoice left, ChatOutputAudioVoice right) => left.Equals(right); + + public static bool operator !=(ChatOutputAudioVoice left, ChatOutputAudioVoice right) => !left.Equals(right); + + public static implicit operator ChatOutputAudioVoice(string value) => new ChatOutputAudioVoice(value); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override bool Equals(object obj) => obj is ChatOutputAudioVoice other && Equals(other); + + public bool Equals(ChatOutputAudioVoice other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; + + public override string ToString() => _value; + } +} diff --git a/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.Serialization.cs b/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.Serialization.cs index 57384093e..62b71004a 100644 --- a/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.Serialization.cs +++ b/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.Serialization.cs @@ -40,7 +40,7 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri if (_additionalBinaryDataProperties?.ContainsKey("item") != true) { writer.WritePropertyName("item"u8); - writer.WriteObjectValue(_internalItem, options); + writer.WriteObjectValue(_internalItem, options); } } @@ -67,7 +67,7 @@ internal static ConversationItemCreatedUpdate DeserializeConversationItemCreated RealtimeConversation.ConversationUpdateKind kind = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string previousItemId = default; - InternalRealtimeResponseItem internalItem = default; + InternalRealtimeConversationResponseItem internalItem = default; foreach (var prop in element.EnumerateObject()) { if (prop.NameEquals("event_id"u8)) @@ -87,7 +87,7 @@ internal static ConversationItemCreatedUpdate DeserializeConversationItemCreated } if (prop.NameEquals("item"u8)) { - internalItem = InternalRealtimeResponseItem.DeserializeInternalRealtimeResponseItem(prop.Value, options); + internalItem = InternalRealtimeConversationResponseItem.DeserializeInternalRealtimeConversationResponseItem(prop.Value, options); continue; } if (true) diff --git a/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.cs b/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.cs index ea9d33b3d..23da0c644 100644 --- a/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.cs +++ b/.dotnet/src/Generated/Models/ConversationItemCreatedUpdate.cs @@ -9,13 +9,13 @@ namespace OpenAI.RealtimeConversation { public partial class ConversationItemCreatedUpdate : ConversationUpdate { - internal ConversationItemCreatedUpdate(string eventId, string previousItemId, InternalRealtimeResponseItem internalItem) : base(eventId, RealtimeConversation.ConversationUpdateKind.ItemCreated) + internal ConversationItemCreatedUpdate(string eventId, string previousItemId, InternalRealtimeConversationResponseItem internalItem) : base(eventId, RealtimeConversation.ConversationUpdateKind.ItemCreated) { PreviousItemId = previousItemId; _internalItem = internalItem; } - internal ConversationItemCreatedUpdate(string eventId, RealtimeConversation.ConversationUpdateKind kind, IDictionary additionalBinaryDataProperties, string previousItemId, InternalRealtimeResponseItem internalItem) : base(eventId, kind, additionalBinaryDataProperties) + internal ConversationItemCreatedUpdate(string eventId, RealtimeConversation.ConversationUpdateKind kind, IDictionary additionalBinaryDataProperties, string previousItemId, InternalRealtimeConversationResponseItem internalItem) : base(eventId, kind, additionalBinaryDataProperties) { PreviousItemId = previousItemId; _internalItem = internalItem; diff --git a/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.Serialization.cs b/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.Serialization.cs index 2fc545199..6709cf10b 100644 --- a/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.Serialization.cs +++ b/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.Serialization.cs @@ -45,7 +45,7 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri if (_additionalBinaryDataProperties?.ContainsKey("item") != true) { writer.WritePropertyName("item"u8); - writer.WriteObjectValue(_internalItem, options); + writer.WriteObjectValue(_internalItem, options); } } @@ -73,7 +73,7 @@ internal static ConversationItemStreamingFinishedUpdate DeserializeConversationI IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string responseId = default; int outputIndex = default; - InternalRealtimeResponseItem internalItem = default; + InternalRealtimeConversationResponseItem internalItem = default; foreach (var prop in element.EnumerateObject()) { if (prop.NameEquals("event_id"u8)) @@ -98,7 +98,7 @@ internal static ConversationItemStreamingFinishedUpdate DeserializeConversationI } if (prop.NameEquals("item"u8)) { - internalItem = InternalRealtimeResponseItem.DeserializeInternalRealtimeResponseItem(prop.Value, options); + internalItem = InternalRealtimeConversationResponseItem.DeserializeInternalRealtimeConversationResponseItem(prop.Value, options); continue; } if (true) diff --git a/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.cs b/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.cs index a178fe402..ab9cf68c8 100644 --- a/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.cs +++ b/.dotnet/src/Generated/Models/ConversationItemStreamingFinishedUpdate.cs @@ -9,14 +9,14 @@ namespace OpenAI.RealtimeConversation { public partial class ConversationItemStreamingFinishedUpdate : ConversationUpdate { - internal ConversationItemStreamingFinishedUpdate(string eventId, string responseId, int outputIndex, InternalRealtimeResponseItem internalItem) : base(eventId, RealtimeConversation.ConversationUpdateKind.ItemStreamingFinished) + internal ConversationItemStreamingFinishedUpdate(string eventId, string responseId, int outputIndex, InternalRealtimeConversationResponseItem internalItem) : base(eventId, RealtimeConversation.ConversationUpdateKind.ItemStreamingFinished) { ResponseId = responseId; OutputIndex = outputIndex; _internalItem = internalItem; } - internal ConversationItemStreamingFinishedUpdate(string eventId, RealtimeConversation.ConversationUpdateKind kind, IDictionary additionalBinaryDataProperties, string responseId, int outputIndex, InternalRealtimeResponseItem internalItem) : base(eventId, kind, additionalBinaryDataProperties) + internal ConversationItemStreamingFinishedUpdate(string eventId, RealtimeConversation.ConversationUpdateKind kind, IDictionary additionalBinaryDataProperties, string responseId, int outputIndex, InternalRealtimeConversationResponseItem internalItem) : base(eventId, kind, additionalBinaryDataProperties) { ResponseId = responseId; OutputIndex = outputIndex; diff --git a/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.Serialization.cs b/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.Serialization.cs index e0e227030..61d947d1e 100644 --- a/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.Serialization.cs +++ b/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.Serialization.cs @@ -45,7 +45,7 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri if (_additionalBinaryDataProperties?.ContainsKey("item") != true) { writer.WritePropertyName("item"u8); - writer.WriteObjectValue(_internalItem, options); + writer.WriteObjectValue(_internalItem, options); } } @@ -73,7 +73,7 @@ internal static ConversationItemStreamingStartedUpdate DeserializeConversationIt IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string responseId = default; int itemIndex = default; - InternalRealtimeResponseItem internalItem = default; + InternalRealtimeConversationResponseItem internalItem = default; foreach (var prop in element.EnumerateObject()) { if (prop.NameEquals("event_id"u8)) @@ -98,7 +98,7 @@ internal static ConversationItemStreamingStartedUpdate DeserializeConversationIt } if (prop.NameEquals("item"u8)) { - internalItem = InternalRealtimeResponseItem.DeserializeInternalRealtimeResponseItem(prop.Value, options); + internalItem = InternalRealtimeConversationResponseItem.DeserializeInternalRealtimeConversationResponseItem(prop.Value, options); continue; } if (true) diff --git a/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.cs b/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.cs index 227fad3fd..ddb0b6e65 100644 --- a/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.cs +++ b/.dotnet/src/Generated/Models/ConversationItemStreamingStartedUpdate.cs @@ -9,14 +9,14 @@ namespace OpenAI.RealtimeConversation { public partial class ConversationItemStreamingStartedUpdate : ConversationUpdate { - internal ConversationItemStreamingStartedUpdate(string eventId, string responseId, int itemIndex, InternalRealtimeResponseItem internalItem) : base(eventId, RealtimeConversation.ConversationUpdateKind.ItemStreamingStarted) + internal ConversationItemStreamingStartedUpdate(string eventId, string responseId, int itemIndex, InternalRealtimeConversationResponseItem internalItem) : base(eventId, RealtimeConversation.ConversationUpdateKind.ItemStreamingStarted) { ResponseId = responseId; ItemIndex = itemIndex; _internalItem = internalItem; } - internal ConversationItemStreamingStartedUpdate(string eventId, RealtimeConversation.ConversationUpdateKind kind, IDictionary additionalBinaryDataProperties, string responseId, int itemIndex, InternalRealtimeResponseItem internalItem) : base(eventId, kind, additionalBinaryDataProperties) + internal ConversationItemStreamingStartedUpdate(string eventId, RealtimeConversation.ConversationUpdateKind kind, IDictionary additionalBinaryDataProperties, string responseId, int itemIndex, InternalRealtimeConversationResponseItem internalItem) : base(eventId, kind, additionalBinaryDataProperties) { ResponseId = responseId; ItemIndex = itemIndex; diff --git a/.dotnet/src/Generated/Models/FunctionChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/FunctionChatMessage.Serialization.cs index 0c452e997..ae92c684e 100644 --- a/.dotnet/src/Generated/Models/FunctionChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/FunctionChatMessage.Serialization.cs @@ -51,20 +51,20 @@ internal static FunctionChatMessage DeserializeFunctionChatMessage(JsonElement e { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string functionName = default; foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (prop.NameEquals("name"u8)) @@ -78,7 +78,7 @@ internal static FunctionChatMessage DeserializeFunctionChatMessage(JsonElement e } } // CUSTOM: Initialize Content collection property. - return new FunctionChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, functionName); + return new FunctionChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, functionName); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/FunctionChatMessage.cs b/.dotnet/src/Generated/Models/FunctionChatMessage.cs index 688e7abc6..4fa37e086 100644 --- a/.dotnet/src/Generated/Models/FunctionChatMessage.cs +++ b/.dotnet/src/Generated/Models/FunctionChatMessage.cs @@ -9,7 +9,7 @@ namespace OpenAI.Chat { public partial class FunctionChatMessage : ChatMessage { - internal FunctionChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties, string functionName) : base(role, content, additionalBinaryDataProperties) + internal FunctionChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties, string functionName) : base(content, role, additionalBinaryDataProperties) { FunctionName = functionName; } diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudio.Serialization.cs b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudio.Serialization.cs new file mode 100644 index 000000000..4066d42a3 --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudio.Serialization.cs @@ -0,0 +1,156 @@ +// + +#nullable disable + +using System; +using System.ClientModel; +using System.ClientModel.Primitives; +using System.Collections.Generic; +using System.Text.Json; +using OpenAI; + +namespace OpenAI.Chat +{ + internal partial class InternalChatCompletionRequestMessageContentPartAudio : IJsonModel + { + internal InternalChatCompletionRequestMessageContentPartAudio() + { + } + + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + writer.WriteStartObject(); + JsonModelWriteCore(writer, options); + writer.WriteEndObject(); + } + + protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudio)} does not support writing '{format}' format."); + } + if (_additionalBinaryDataProperties?.ContainsKey("type") != true) + { + writer.WritePropertyName("type"u8); + writer.WriteStringValue(Type.ToString()); + } + if (_additionalBinaryDataProperties?.ContainsKey("input_audio") != true) + { + writer.WritePropertyName("input_audio"u8); + writer.WriteObjectValue(InputAudio, options); + } + if (true && _additionalBinaryDataProperties != null) + { + foreach (var item in _additionalBinaryDataProperties) + { + if (ModelSerializationExtensions.IsSentinelValue(item.Value)) + { + continue; + } + writer.WritePropertyName(item.Key); +#if NET6_0_OR_GREATER + writer.WriteRawValue(item.Value); +#else + using (JsonDocument document = JsonDocument.Parse(item.Value)) + { + JsonSerializer.Serialize(writer, document.RootElement); + } +#endif + } + } + } + + InternalChatCompletionRequestMessageContentPartAudio IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + + protected virtual InternalChatCompletionRequestMessageContentPartAudio JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudio)} does not support reading '{format}' format."); + } + using JsonDocument document = JsonDocument.ParseValue(ref reader); + return DeserializeInternalChatCompletionRequestMessageContentPartAudio(document.RootElement, options); + } + + internal static InternalChatCompletionRequestMessageContentPartAudio DeserializeInternalChatCompletionRequestMessageContentPartAudio(JsonElement element, ModelReaderWriterOptions options) + { + if (element.ValueKind == JsonValueKind.Null) + { + return null; + } + InternalChatCompletionRequestMessageContentPartAudioType @type = default; + InternalChatCompletionRequestMessageContentPartAudioInputAudio inputAudio = default; + IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); + foreach (var prop in element.EnumerateObject()) + { + if (prop.NameEquals("type"u8)) + { + @type = new InternalChatCompletionRequestMessageContentPartAudioType(prop.Value.GetString()); + continue; + } + if (prop.NameEquals("input_audio"u8)) + { + inputAudio = InternalChatCompletionRequestMessageContentPartAudioInputAudio.DeserializeInternalChatCompletionRequestMessageContentPartAudioInputAudio(prop.Value, options); + continue; + } + if (true) + { + additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); + } + } + return new InternalChatCompletionRequestMessageContentPartAudio(@type, inputAudio, additionalBinaryDataProperties); + } + + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + + protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + return ModelReaderWriter.Write(this, options); + default: + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudio)} does not support writing '{options.Format}' format."); + } + } + + InternalChatCompletionRequestMessageContentPartAudio IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + + protected virtual InternalChatCompletionRequestMessageContentPartAudio PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + using (JsonDocument document = JsonDocument.Parse(data)) + { + return DeserializeInternalChatCompletionRequestMessageContentPartAudio(document.RootElement, options); + } + default: + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudio)} does not support reading '{options.Format}' format."); + } + } + + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + + public static implicit operator BinaryContent(InternalChatCompletionRequestMessageContentPartAudio internalChatCompletionRequestMessageContentPartAudio) + { + if (internalChatCompletionRequestMessageContentPartAudio == null) + { + return null; + } + return BinaryContent.Create(internalChatCompletionRequestMessageContentPartAudio, ModelSerializationExtensions.WireOptions); + } + + public static explicit operator InternalChatCompletionRequestMessageContentPartAudio(ClientResult result) + { + using PipelineResponse response = result.GetRawResponse(); + using JsonDocument document = JsonDocument.Parse(response.Content); + return DeserializeInternalChatCompletionRequestMessageContentPartAudio(document.RootElement, ModelSerializationExtensions.WireOptions); + } + } +} diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudio.cs b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudio.cs new file mode 100644 index 000000000..c1d6b3edf --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudio.cs @@ -0,0 +1,39 @@ +// + +#nullable disable + +using System; +using System.Collections.Generic; +using OpenAI; + +namespace OpenAI.Chat +{ + internal partial class InternalChatCompletionRequestMessageContentPartAudio + { + private protected IDictionary _additionalBinaryDataProperties; + + public InternalChatCompletionRequestMessageContentPartAudio(InternalChatCompletionRequestMessageContentPartAudioInputAudio inputAudio) + { + Argument.AssertNotNull(inputAudio, nameof(inputAudio)); + + InputAudio = inputAudio; + } + + internal InternalChatCompletionRequestMessageContentPartAudio(InternalChatCompletionRequestMessageContentPartAudioType @type, InternalChatCompletionRequestMessageContentPartAudioInputAudio inputAudio, IDictionary additionalBinaryDataProperties) + { + Type = @type; + InputAudio = inputAudio; + _additionalBinaryDataProperties = additionalBinaryDataProperties; + } + + public InternalChatCompletionRequestMessageContentPartAudioType Type { get; } = "input_audio"; + + public InternalChatCompletionRequestMessageContentPartAudioInputAudio InputAudio { get; } + + internal IDictionary SerializedAdditionalRawData + { + get => _additionalBinaryDataProperties; + set => _additionalBinaryDataProperties = value; + } + } +} diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioInputAudio.Serialization.cs b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioInputAudio.Serialization.cs new file mode 100644 index 000000000..99ed0481a --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioInputAudio.Serialization.cs @@ -0,0 +1,156 @@ +// + +#nullable disable + +using System; +using System.ClientModel; +using System.ClientModel.Primitives; +using System.Collections.Generic; +using System.Text.Json; +using OpenAI; + +namespace OpenAI.Chat +{ + internal partial class InternalChatCompletionRequestMessageContentPartAudioInputAudio : IJsonModel + { + internal InternalChatCompletionRequestMessageContentPartAudioInputAudio() + { + } + + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + writer.WriteStartObject(); + JsonModelWriteCore(writer, options); + writer.WriteEndObject(); + } + + protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudioInputAudio)} does not support writing '{format}' format."); + } + if (_additionalBinaryDataProperties?.ContainsKey("data") != true) + { + writer.WritePropertyName("data"u8); + writer.WriteBase64StringValue(Data.ToArray(), "D"); + } + if (_additionalBinaryDataProperties?.ContainsKey("format") != true) + { + writer.WritePropertyName("format"u8); + writer.WriteStringValue(Format.ToString()); + } + if (true && _additionalBinaryDataProperties != null) + { + foreach (var item in _additionalBinaryDataProperties) + { + if (ModelSerializationExtensions.IsSentinelValue(item.Value)) + { + continue; + } + writer.WritePropertyName(item.Key); +#if NET6_0_OR_GREATER + writer.WriteRawValue(item.Value); +#else + using (JsonDocument document = JsonDocument.Parse(item.Value)) + { + JsonSerializer.Serialize(writer, document.RootElement); + } +#endif + } + } + } + + InternalChatCompletionRequestMessageContentPartAudioInputAudio IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + + protected virtual InternalChatCompletionRequestMessageContentPartAudioInputAudio JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudioInputAudio)} does not support reading '{format}' format."); + } + using JsonDocument document = JsonDocument.ParseValue(ref reader); + return DeserializeInternalChatCompletionRequestMessageContentPartAudioInputAudio(document.RootElement, options); + } + + internal static InternalChatCompletionRequestMessageContentPartAudioInputAudio DeserializeInternalChatCompletionRequestMessageContentPartAudioInputAudio(JsonElement element, ModelReaderWriterOptions options) + { + if (element.ValueKind == JsonValueKind.Null) + { + return null; + } + BinaryData data = default; + ChatInputAudioFormat format = default; + IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); + foreach (var prop in element.EnumerateObject()) + { + if (prop.NameEquals("data"u8)) + { + data = BinaryData.FromBytes(prop.Value.GetBytesFromBase64("D")); + continue; + } + if (prop.NameEquals("format"u8)) + { + format = new ChatInputAudioFormat(prop.Value.GetString()); + continue; + } + if (true) + { + additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); + } + } + return new InternalChatCompletionRequestMessageContentPartAudioInputAudio(data, format, additionalBinaryDataProperties); + } + + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + + protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + return ModelReaderWriter.Write(this, options); + default: + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudioInputAudio)} does not support writing '{options.Format}' format."); + } + } + + InternalChatCompletionRequestMessageContentPartAudioInputAudio IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + + protected virtual InternalChatCompletionRequestMessageContentPartAudioInputAudio PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + using (JsonDocument document = JsonDocument.Parse(data)) + { + return DeserializeInternalChatCompletionRequestMessageContentPartAudioInputAudio(document.RootElement, options); + } + default: + throw new FormatException($"The model {nameof(InternalChatCompletionRequestMessageContentPartAudioInputAudio)} does not support reading '{options.Format}' format."); + } + } + + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + + public static implicit operator BinaryContent(InternalChatCompletionRequestMessageContentPartAudioInputAudio internalChatCompletionRequestMessageContentPartAudioInputAudio) + { + if (internalChatCompletionRequestMessageContentPartAudioInputAudio == null) + { + return null; + } + return BinaryContent.Create(internalChatCompletionRequestMessageContentPartAudioInputAudio, ModelSerializationExtensions.WireOptions); + } + + public static explicit operator InternalChatCompletionRequestMessageContentPartAudioInputAudio(ClientResult result) + { + using PipelineResponse response = result.GetRawResponse(); + using JsonDocument document = JsonDocument.Parse(response.Content); + return DeserializeInternalChatCompletionRequestMessageContentPartAudioInputAudio(document.RootElement, ModelSerializationExtensions.WireOptions); + } + } +} diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioInputAudio.cs b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioInputAudio.cs new file mode 100644 index 000000000..f490ce017 --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioInputAudio.cs @@ -0,0 +1,40 @@ +// + +#nullable disable + +using System; +using System.Collections.Generic; +using OpenAI; + +namespace OpenAI.Chat +{ + internal partial class InternalChatCompletionRequestMessageContentPartAudioInputAudio + { + private protected IDictionary _additionalBinaryDataProperties; + + public InternalChatCompletionRequestMessageContentPartAudioInputAudio(BinaryData data, ChatInputAudioFormat format) + { + Argument.AssertNotNull(data, nameof(data)); + + Data = data; + Format = format; + } + + internal InternalChatCompletionRequestMessageContentPartAudioInputAudio(BinaryData data, ChatInputAudioFormat format, IDictionary additionalBinaryDataProperties) + { + Data = data; + Format = format; + _additionalBinaryDataProperties = additionalBinaryDataProperties; + } + + public BinaryData Data { get; } + + public ChatInputAudioFormat Format { get; } + + internal IDictionary SerializedAdditionalRawData + { + get => _additionalBinaryDataProperties; + set => _additionalBinaryDataProperties = value; + } + } +} diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioType.cs b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioType.cs new file mode 100644 index 000000000..e7d2eb1b6 --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalChatCompletionRequestMessageContentPartAudioType.cs @@ -0,0 +1,41 @@ +// + +#nullable disable + +using System; +using System.ComponentModel; +using OpenAI; + +namespace OpenAI.Chat +{ + internal readonly partial struct InternalChatCompletionRequestMessageContentPartAudioType : IEquatable + { + private readonly string _value; + private const string InputAudioValue = "input_audio"; + + public InternalChatCompletionRequestMessageContentPartAudioType(string value) + { + Argument.AssertNotNull(value, nameof(value)); + + _value = value; + } + + public static InternalChatCompletionRequestMessageContentPartAudioType InputAudio { get; } = new InternalChatCompletionRequestMessageContentPartAudioType(InputAudioValue); + + public static bool operator ==(InternalChatCompletionRequestMessageContentPartAudioType left, InternalChatCompletionRequestMessageContentPartAudioType right) => left.Equals(right); + + public static bool operator !=(InternalChatCompletionRequestMessageContentPartAudioType left, InternalChatCompletionRequestMessageContentPartAudioType right) => !left.Equals(right); + + public static implicit operator InternalChatCompletionRequestMessageContentPartAudioType(string value) => new InternalChatCompletionRequestMessageContentPartAudioType(value); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override bool Equals(object obj) => obj is InternalChatCompletionRequestMessageContentPartAudioType other && Equals(other); + + public bool Equals(InternalChatCompletionRequestMessageContentPartAudioType other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; + + public override string ToString() => _value; + } +} diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.Serialization.cs b/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.Serialization.cs index 9f3af8c10..d9e55f613 100644 --- a/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.Serialization.cs @@ -53,6 +53,18 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit } writer.WriteEndArray(); } + if (Optional.IsDefined(Audio) && _additionalBinaryDataProperties?.ContainsKey("audio") != true) + { + if (Audio != null) + { + writer.WritePropertyName("audio"u8); + writer.WriteObjectValue(Audio, options); + } + else + { + writer.WriteNull("audio"u8); + } + } if (_additionalBinaryDataProperties?.ContainsKey("role") != true) { writer.WritePropertyName("role"u8); @@ -117,6 +129,7 @@ internal static InternalChatCompletionResponseMessage DeserializeInternalChatCom } string refusal = default; IReadOnlyList toolCalls = default; + ChatOutputAudio audio = default; Chat.ChatMessageRole role = default; ChatMessageContent content = default; ChatFunctionCall functionCall = default; @@ -147,6 +160,16 @@ internal static InternalChatCompletionResponseMessage DeserializeInternalChatCom toolCalls = array; continue; } + if (prop.NameEquals("audio"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + audio = null; + continue; + } + audio = ChatOutputAudio.DeserializeChatOutputAudio(prop.Value, options); + continue; + } if (prop.NameEquals("role"u8)) { role = prop.Value.GetString().ToChatMessageRole(); @@ -175,6 +198,7 @@ internal static InternalChatCompletionResponseMessage DeserializeInternalChatCom return new InternalChatCompletionResponseMessage( refusal, toolCalls ?? new ChangeTrackingList(), + audio, role, content ?? new ChatMessageContent(), functionCall, diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.cs b/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.cs index 97b362414..7b8f87e68 100644 --- a/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.cs +++ b/.dotnet/src/Generated/Models/InternalChatCompletionResponseMessage.cs @@ -19,10 +19,11 @@ internal InternalChatCompletionResponseMessage(string refusal, ChatMessageConten Content = content; } - internal InternalChatCompletionResponseMessage(string refusal, IReadOnlyList toolCalls, Chat.ChatMessageRole role, ChatMessageContent content, ChatFunctionCall functionCall, IDictionary additionalBinaryDataProperties) + internal InternalChatCompletionResponseMessage(string refusal, IReadOnlyList toolCalls, ChatOutputAudio audio, Chat.ChatMessageRole role, ChatMessageContent content, ChatFunctionCall functionCall, IDictionary additionalBinaryDataProperties) { Refusal = refusal; ToolCalls = toolCalls; + Audio = audio; Role = role; Content = content; FunctionCall = functionCall; @@ -33,6 +34,8 @@ internal InternalChatCompletionResponseMessage(string refusal, IReadOnlyList ToolCalls { get; } + public ChatOutputAudio Audio { get; } + internal IDictionary SerializedAdditionalRawData { get => _additionalBinaryDataProperties; diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.Serialization.cs b/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.Serialization.cs index e67571f75..de9f8afcb 100644 --- a/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.Serialization.cs @@ -27,6 +27,11 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit { throw new FormatException($"The model {nameof(InternalChatCompletionStreamResponseDelta)} does not support writing '{format}' format."); } + if (Optional.IsDefined(Audio) && _additionalBinaryDataProperties?.ContainsKey("audio") != true) + { + writer.WritePropertyName("audio"u8); + writer.WriteObjectValue(Audio, options); + } if (Optional.IsDefined(FunctionCall) && _additionalBinaryDataProperties?.ContainsKey("function_call") != true) { writer.WritePropertyName("function_call"u8); @@ -112,6 +117,7 @@ internal static InternalChatCompletionStreamResponseDelta DeserializeInternalCha { return null; } + StreamingChatOutputAudioUpdate audio = default; StreamingChatFunctionCallUpdate functionCall = default; IReadOnlyList toolCalls = default; string refusal = default; @@ -120,6 +126,15 @@ internal static InternalChatCompletionStreamResponseDelta DeserializeInternalCha IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); foreach (var prop in element.EnumerateObject()) { + if (prop.NameEquals("audio"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + continue; + } + audio = StreamingChatOutputAudioUpdate.DeserializeStreamingChatOutputAudioUpdate(prop.Value, options); + continue; + } if (prop.NameEquals("function_call"u8)) { if (prop.Value.ValueKind == JsonValueKind.Null) @@ -174,6 +189,7 @@ internal static InternalChatCompletionStreamResponseDelta DeserializeInternalCha } // CUSTOM: Initialize Content collection property. return new InternalChatCompletionStreamResponseDelta( + audio, functionCall, toolCalls ?? new ChangeTrackingList(), refusal, diff --git a/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.cs b/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.cs index dba391eaf..1cdd4603e 100644 --- a/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.cs +++ b/.dotnet/src/Generated/Models/InternalChatCompletionStreamResponseDelta.cs @@ -11,8 +11,9 @@ internal partial class InternalChatCompletionStreamResponseDelta { private protected IDictionary _additionalBinaryDataProperties; - internal InternalChatCompletionStreamResponseDelta(StreamingChatFunctionCallUpdate functionCall, IReadOnlyList toolCalls, string refusal, Chat.ChatMessageRole? role, ChatMessageContent content, IDictionary additionalBinaryDataProperties) + internal InternalChatCompletionStreamResponseDelta(StreamingChatOutputAudioUpdate audio, StreamingChatFunctionCallUpdate functionCall, IReadOnlyList toolCalls, string refusal, Chat.ChatMessageRole? role, ChatMessageContent content, IDictionary additionalBinaryDataProperties) { + Audio = audio; FunctionCall = functionCall; ToolCalls = toolCalls; Refusal = refusal; @@ -21,6 +22,8 @@ internal InternalChatCompletionStreamResponseDelta(StreamingChatFunctionCallUpda _additionalBinaryDataProperties = additionalBinaryDataProperties; } + public StreamingChatOutputAudioUpdate Audio { get; } + public StreamingChatFunctionCallUpdate FunctionCall { get; } public IReadOnlyList ToolCalls { get; } diff --git a/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModality.cs b/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModality.cs new file mode 100644 index 000000000..a0cb97dcc --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModality.cs @@ -0,0 +1,44 @@ +// + +#nullable disable + +using System; +using System.ComponentModel; +using OpenAI; + +namespace OpenAI.Chat +{ + internal readonly partial struct InternalCreateChatCompletionRequestModality : IEquatable + { + private readonly string _value; + private const string TextValue = "text"; + private const string AudioValue = "audio"; + + public InternalCreateChatCompletionRequestModality(string value) + { + Argument.AssertNotNull(value, nameof(value)); + + _value = value; + } + + public static InternalCreateChatCompletionRequestModality Text { get; } = new InternalCreateChatCompletionRequestModality(TextValue); + + public static InternalCreateChatCompletionRequestModality Audio { get; } = new InternalCreateChatCompletionRequestModality(AudioValue); + + public static bool operator ==(InternalCreateChatCompletionRequestModality left, InternalCreateChatCompletionRequestModality right) => left.Equals(right); + + public static bool operator !=(InternalCreateChatCompletionRequestModality left, InternalCreateChatCompletionRequestModality right) => !left.Equals(right); + + public static implicit operator InternalCreateChatCompletionRequestModality(string value) => new InternalCreateChatCompletionRequestModality(value); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override bool Equals(object obj) => obj is InternalCreateChatCompletionRequestModality other && Equals(other); + + public bool Equals(InternalCreateChatCompletionRequestModality other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; + + public override string ToString() => _value; + } +} diff --git a/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModel.cs b/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModel.cs index f8e62d274..4afaa705b 100644 --- a/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModel.cs +++ b/.dotnet/src/Generated/Models/InternalCreateChatCompletionRequestModel.cs @@ -20,6 +20,8 @@ namespace OpenAI.Chat private const string Gpt4o20240513Value = "gpt-4o-2024-05-13"; private const string Gpt4oRealtimePreviewValue = "gpt-4o-realtime-preview"; private const string Gpt4oRealtimePreview20241001Value = "gpt-4o-realtime-preview-2024-10-01"; + private const string Gpt4oAudioPreviewValue = "gpt-4o-audio-preview"; + private const string Gpt4oAudioPreview20241001Value = "gpt-4o-audio-preview-2024-10-01"; private const string Chatgpt4oLatestValue = "chatgpt-4o-latest"; private const string Gpt4oMiniValue = "gpt-4o-mini"; private const string Gpt4oMini20240718Value = "gpt-4o-mini-2024-07-18"; @@ -68,6 +70,10 @@ public InternalCreateChatCompletionRequestModel(string value) public static InternalCreateChatCompletionRequestModel Gpt4oRealtimePreview20241001 { get; } = new InternalCreateChatCompletionRequestModel(Gpt4oRealtimePreview20241001Value); + public static InternalCreateChatCompletionRequestModel Gpt4oAudioPreview { get; } = new InternalCreateChatCompletionRequestModel(Gpt4oAudioPreviewValue); + + public static InternalCreateChatCompletionRequestModel Gpt4oAudioPreview20241001 { get; } = new InternalCreateChatCompletionRequestModel(Gpt4oAudioPreview20241001Value); + public static InternalCreateChatCompletionRequestModel Chatgpt4oLatest { get; } = new InternalCreateChatCompletionRequestModel(Chatgpt4oLatestValue); public static InternalCreateChatCompletionRequestModel Gpt4oMini { get; } = new InternalCreateChatCompletionRequestModel(Gpt4oMiniValue); diff --git a/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.Serialization.cs b/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.Serialization.cs index 98cf117e0..0003c9a54 100644 --- a/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.Serialization.cs @@ -50,23 +50,24 @@ internal static InternalFineTuneChatCompletionRequestAssistantMessage Deserializ { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string refusal = default; string participantName = default; IList toolCalls = default; ChatFunctionCall functionCall = default; + ChatOutputAudioReference outputAudioReference = default; foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (prop.NameEquals("refusal"u8)) @@ -108,6 +109,16 @@ internal static InternalFineTuneChatCompletionRequestAssistantMessage Deserializ functionCall = ChatFunctionCall.DeserializeChatFunctionCall(prop.Value, options); continue; } + if (prop.NameEquals("audio"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + outputAudioReference = null; + continue; + } + outputAudioReference = ChatOutputAudioReference.DeserializeChatOutputAudioReference(prop.Value, options); + continue; + } if (true) { additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); @@ -115,13 +126,14 @@ internal static InternalFineTuneChatCompletionRequestAssistantMessage Deserializ } // CUSTOM: Initialize Content collection property. return new InternalFineTuneChatCompletionRequestAssistantMessage( - role, content ?? new ChatMessageContent(), + role, additionalBinaryDataProperties, refusal, participantName, toolCalls ?? new ChangeTrackingList(), - functionCall); + functionCall, + outputAudioReference); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.cs b/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.cs index 868aa7ef7..0df80d87f 100644 --- a/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.cs +++ b/.dotnet/src/Generated/Models/InternalFineTuneChatCompletionRequestAssistantMessage.cs @@ -14,7 +14,7 @@ public InternalFineTuneChatCompletionRequestAssistantMessage() { } - internal InternalFineTuneChatCompletionRequestAssistantMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties, string refusal, string participantName, IList toolCalls, ChatFunctionCall functionCall) : base(role, content, additionalBinaryDataProperties, refusal, participantName, toolCalls, functionCall) + internal InternalFineTuneChatCompletionRequestAssistantMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties, string refusal, string participantName, IList toolCalls, ChatFunctionCall functionCall, ChatOutputAudioReference outputAudioReference) : base(content, role, additionalBinaryDataProperties, refusal, participantName, toolCalls, functionCall, outputAudioReference) { } } diff --git a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.Serialization.cs b/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.Serialization.cs index f1c98f3c9..862ee53db 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.Serialization.cs @@ -61,7 +61,7 @@ internal static InternalRealtimeClientEventResponseCreate DeserializeInternalRea InternalRealtimeClientEventType kind = default; string eventId = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); - InternalRealtimeClientEventResponseCreateResponse response = default; + InternalRealtimeResponseOptions response = default; foreach (var prop in element.EnumerateObject()) { if (prop.NameEquals("type"u8)) @@ -76,7 +76,7 @@ internal static InternalRealtimeClientEventResponseCreate DeserializeInternalRea } if (prop.NameEquals("response"u8)) { - response = InternalRealtimeClientEventResponseCreateResponse.DeserializeInternalRealtimeClientEventResponseCreateResponse(prop.Value, options); + response = InternalRealtimeResponseOptions.DeserializeInternalRealtimeResponseOptions(prop.Value, options); continue; } if (true) diff --git a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.cs b/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.cs index 349d8e180..abdc82c41 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreate.cs @@ -10,18 +10,18 @@ namespace OpenAI.RealtimeConversation { internal partial class InternalRealtimeClientEventResponseCreate : InternalRealtimeClientEvent { - public InternalRealtimeClientEventResponseCreate(InternalRealtimeClientEventResponseCreateResponse response) : base(InternalRealtimeClientEventType.ResponseCreate) + public InternalRealtimeClientEventResponseCreate(InternalRealtimeResponseOptions response) : base(InternalRealtimeClientEventType.ResponseCreate) { Argument.AssertNotNull(response, nameof(response)); Response = response; } - internal InternalRealtimeClientEventResponseCreate(InternalRealtimeClientEventType kind, string eventId, IDictionary additionalBinaryDataProperties, InternalRealtimeClientEventResponseCreateResponse response) : base(kind, eventId, additionalBinaryDataProperties) + internal InternalRealtimeClientEventResponseCreate(InternalRealtimeClientEventType kind, string eventId, IDictionary additionalBinaryDataProperties, InternalRealtimeResponseOptions response) : base(kind, eventId, additionalBinaryDataProperties) { Response = response; } - public InternalRealtimeClientEventResponseCreateResponse Response { get; } + public InternalRealtimeResponseOptions Response { get; } } } diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseItem.Serialization.cs b/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItem.Serialization.cs similarity index 61% rename from .dotnet/src/Generated/Models/InternalRealtimeResponseItem.Serialization.cs rename to .dotnet/src/Generated/Models/InternalRealtimeConversationResponseItem.Serialization.cs index 8511388a3..214e631d8 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseItem.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItem.Serialization.cs @@ -11,13 +11,13 @@ namespace OpenAI.RealtimeConversation { [PersistableModelProxy(typeof(UnknownRealtimeResponseItem))] - internal abstract partial class InternalRealtimeResponseItem : IJsonModel + internal abstract partial class InternalRealtimeConversationResponseItem : IJsonModel { - internal InternalRealtimeResponseItem() + internal InternalRealtimeConversationResponseItem() { } - void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) { writer.WriteStartObject(); JsonModelWriteCore(writer, options); @@ -26,10 +26,10 @@ void IJsonModel.Write(Utf8JsonWriter writer, Model protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") { - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support writing '{format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support writing '{format}' format."); } if (_additionalBinaryDataProperties?.ContainsKey("object") != true) { @@ -74,20 +74,20 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit } } - InternalRealtimeResponseItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + InternalRealtimeConversationResponseItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); - protected virtual InternalRealtimeResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + protected virtual InternalRealtimeConversationResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") { - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support reading '{format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support reading '{format}' format."); } using JsonDocument document = JsonDocument.ParseValue(ref reader); - return DeserializeInternalRealtimeResponseItem(document.RootElement, options); + return DeserializeInternalRealtimeConversationResponseItem(document.RootElement, options); } - internal static InternalRealtimeResponseItem DeserializeInternalRealtimeResponseItem(JsonElement element, ModelReaderWriterOptions options) + internal static InternalRealtimeConversationResponseItem DeserializeInternalRealtimeConversationResponseItem(JsonElement element, ModelReaderWriterOptions options) { if (element.ValueKind == JsonValueKind.Null) { @@ -108,53 +108,53 @@ internal static InternalRealtimeResponseItem DeserializeInternalRealtimeResponse return UnknownRealtimeResponseItem.DeserializeUnknownRealtimeResponseItem(element, options); } - BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) { case "J": return ModelReaderWriter.Write(this, options); default: - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support writing '{options.Format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support writing '{options.Format}' format."); } } - InternalRealtimeResponseItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + InternalRealtimeConversationResponseItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); - protected virtual InternalRealtimeResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + protected virtual InternalRealtimeConversationResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) { case "J": using (JsonDocument document = JsonDocument.Parse(data)) { - return DeserializeInternalRealtimeResponseItem(document.RootElement, options); + return DeserializeInternalRealtimeConversationResponseItem(document.RootElement, options); } default: - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support reading '{options.Format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support reading '{options.Format}' format."); } } - string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; - public static implicit operator BinaryContent(InternalRealtimeResponseItem internalRealtimeResponseItem) + public static implicit operator BinaryContent(InternalRealtimeConversationResponseItem internalRealtimeConversationResponseItem) { - if (internalRealtimeResponseItem == null) + if (internalRealtimeConversationResponseItem == null) { return null; } - return BinaryContent.Create(internalRealtimeResponseItem, ModelSerializationExtensions.WireOptions); + return BinaryContent.Create(internalRealtimeConversationResponseItem, ModelSerializationExtensions.WireOptions); } - public static explicit operator InternalRealtimeResponseItem(ClientResult result) + public static explicit operator InternalRealtimeConversationResponseItem(ClientResult result) { using PipelineResponse response = result.GetRawResponse(); using JsonDocument document = JsonDocument.Parse(response.Content); - return DeserializeInternalRealtimeResponseItem(document.RootElement, ModelSerializationExtensions.WireOptions); + return DeserializeInternalRealtimeConversationResponseItem(document.RootElement, ModelSerializationExtensions.WireOptions); } } } diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseItem.cs b/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItem.cs similarity index 60% rename from .dotnet/src/Generated/Models/InternalRealtimeResponseItem.cs rename to .dotnet/src/Generated/Models/InternalRealtimeConversationResponseItem.cs index a8ed46d55..c9ba6aa6e 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseItem.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItem.cs @@ -7,17 +7,17 @@ namespace OpenAI.RealtimeConversation { - internal abstract partial class InternalRealtimeResponseItem + internal abstract partial class InternalRealtimeConversationResponseItem { private protected IDictionary _additionalBinaryDataProperties; - private protected InternalRealtimeResponseItem(InternalRealtimeItemType @type, string id) + private protected InternalRealtimeConversationResponseItem(InternalRealtimeItemType @type, string id) { Type = @type; Id = id; } - internal InternalRealtimeResponseItem(InternalRealtimeResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties) + internal InternalRealtimeConversationResponseItem(InternalRealtimeConversationResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties) { Object = @object; Type = @type; @@ -25,7 +25,7 @@ internal InternalRealtimeResponseItem(InternalRealtimeResponseItemObject @object _additionalBinaryDataProperties = additionalBinaryDataProperties; } - public InternalRealtimeResponseItemObject Object { get; } = "realtime.item"; + public InternalRealtimeConversationResponseItemObject Object { get; } = "realtime.item"; internal InternalRealtimeItemType Type { get; set; } diff --git a/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItemObject.cs b/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItemObject.cs new file mode 100644 index 000000000..1d2497d7c --- /dev/null +++ b/.dotnet/src/Generated/Models/InternalRealtimeConversationResponseItemObject.cs @@ -0,0 +1,41 @@ +// + +#nullable disable + +using System; +using System.ComponentModel; +using OpenAI; + +namespace OpenAI.RealtimeConversation +{ + internal readonly partial struct InternalRealtimeConversationResponseItemObject : IEquatable + { + private readonly string _value; + private const string RealtimeItemValue = "realtime.item"; + + public InternalRealtimeConversationResponseItemObject(string value) + { + Argument.AssertNotNull(value, nameof(value)); + + _value = value; + } + + public static InternalRealtimeConversationResponseItemObject RealtimeItem { get; } = new InternalRealtimeConversationResponseItemObject(RealtimeItemValue); + + public static bool operator ==(InternalRealtimeConversationResponseItemObject left, InternalRealtimeConversationResponseItemObject right) => left.Equals(right); + + public static bool operator !=(InternalRealtimeConversationResponseItemObject left, InternalRealtimeConversationResponseItemObject right) => !left.Equals(right); + + public static implicit operator InternalRealtimeConversationResponseItemObject(string value) => new InternalRealtimeConversationResponseItemObject(value); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override bool Equals(object obj) => obj is InternalRealtimeConversationResponseItemObject other && Equals(other); + + public bool Equals(InternalRealtimeConversationResponseItemObject other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); + + [EditorBrowsable(EditorBrowsableState.Never)] + public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; + + public override string ToString() => _value; + } +} diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.Serialization.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.Serialization.cs index 14c3d3388..5af5037d7 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.Serialization.cs @@ -56,7 +56,7 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri InternalRealtimeResponseFunctionCallItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => (InternalRealtimeResponseFunctionCallItem)JsonModelCreateCore(ref reader, options); - protected override InternalRealtimeResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) { string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") @@ -73,7 +73,7 @@ internal static InternalRealtimeResponseFunctionCallItem DeserializeInternalReal { return null; } - InternalRealtimeResponseItemObject @object = default; + InternalRealtimeConversationResponseItemObject @object = default; InternalRealtimeItemType @type = default; string id = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); @@ -85,7 +85,7 @@ internal static InternalRealtimeResponseFunctionCallItem DeserializeInternalReal { if (prop.NameEquals("object"u8)) { - @object = new InternalRealtimeResponseItemObject(prop.Value.GetString()); + @object = new InternalRealtimeConversationResponseItemObject(prop.Value.GetString()); continue; } if (prop.NameEquals("type"u8)) @@ -155,7 +155,7 @@ protected override BinaryData PersistableModelWriteCore(ModelReaderWriterOptions InternalRealtimeResponseFunctionCallItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => (InternalRealtimeResponseFunctionCallItem)PersistableModelCreateCore(data, options); - protected override InternalRealtimeResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) { string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.cs index e1d10fa80..f89f19c94 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallItem.cs @@ -7,7 +7,7 @@ namespace OpenAI.RealtimeConversation { - internal partial class InternalRealtimeResponseFunctionCallItem : InternalRealtimeResponseItem + internal partial class InternalRealtimeResponseFunctionCallItem : InternalRealtimeConversationResponseItem { internal InternalRealtimeResponseFunctionCallItem(string id, string name, string callId, string arguments, ConversationItemStatus status) : base(InternalRealtimeItemType.FunctionCall, id) { @@ -17,7 +17,7 @@ internal InternalRealtimeResponseFunctionCallItem(string id, string name, string Status = status; } - internal InternalRealtimeResponseFunctionCallItem(InternalRealtimeResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties, string name, string callId, string arguments, ConversationItemStatus status) : base(@object, @type, id, additionalBinaryDataProperties) + internal InternalRealtimeResponseFunctionCallItem(InternalRealtimeConversationResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties, string name, string callId, string arguments, ConversationItemStatus status) : base(@object, @type, id, additionalBinaryDataProperties) { Name = name; CallId = callId; diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.Serialization.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.Serialization.cs index c5b563a3d..d7e539a5d 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.Serialization.cs @@ -46,7 +46,7 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri InternalRealtimeResponseFunctionCallOutputItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => (InternalRealtimeResponseFunctionCallOutputItem)JsonModelCreateCore(ref reader, options); - protected override InternalRealtimeResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) { string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") @@ -63,7 +63,7 @@ internal static InternalRealtimeResponseFunctionCallOutputItem DeserializeIntern { return null; } - InternalRealtimeResponseItemObject @object = default; + InternalRealtimeConversationResponseItemObject @object = default; InternalRealtimeItemType @type = default; string id = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); @@ -73,7 +73,7 @@ internal static InternalRealtimeResponseFunctionCallOutputItem DeserializeIntern { if (prop.NameEquals("object"u8)) { - @object = new InternalRealtimeResponseItemObject(prop.Value.GetString()); + @object = new InternalRealtimeConversationResponseItemObject(prop.Value.GetString()); continue; } if (prop.NameEquals("type"u8)) @@ -131,7 +131,7 @@ protected override BinaryData PersistableModelWriteCore(ModelReaderWriterOptions InternalRealtimeResponseFunctionCallOutputItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => (InternalRealtimeResponseFunctionCallOutputItem)PersistableModelCreateCore(data, options); - protected override InternalRealtimeResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) { string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.cs index 7295ba6f7..babadf19c 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseFunctionCallOutputItem.cs @@ -7,7 +7,7 @@ namespace OpenAI.RealtimeConversation { - internal partial class InternalRealtimeResponseFunctionCallOutputItem : InternalRealtimeResponseItem + internal partial class InternalRealtimeResponseFunctionCallOutputItem : InternalRealtimeConversationResponseItem { internal InternalRealtimeResponseFunctionCallOutputItem(string id, string callId, string output) : base(InternalRealtimeItemType.FunctionCallOutput, id) { @@ -15,7 +15,7 @@ internal InternalRealtimeResponseFunctionCallOutputItem(string id, string callId Output = output; } - internal InternalRealtimeResponseFunctionCallOutputItem(InternalRealtimeResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties, string callId, string output) : base(@object, @type, id, additionalBinaryDataProperties) + internal InternalRealtimeResponseFunctionCallOutputItem(InternalRealtimeConversationResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties, string callId, string output) : base(@object, @type, id, additionalBinaryDataProperties) { CallId = callId; Output = output; diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseItemObject.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseItemObject.cs deleted file mode 100644 index d5c5d6c55..000000000 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseItemObject.cs +++ /dev/null @@ -1,41 +0,0 @@ -// - -#nullable disable - -using System; -using System.ComponentModel; -using OpenAI; - -namespace OpenAI.RealtimeConversation -{ - internal readonly partial struct InternalRealtimeResponseItemObject : IEquatable - { - private readonly string _value; - private const string RealtimeItemValue = "realtime.item"; - - public InternalRealtimeResponseItemObject(string value) - { - Argument.AssertNotNull(value, nameof(value)); - - _value = value; - } - - public static InternalRealtimeResponseItemObject RealtimeItem { get; } = new InternalRealtimeResponseItemObject(RealtimeItemValue); - - public static bool operator ==(InternalRealtimeResponseItemObject left, InternalRealtimeResponseItemObject right) => left.Equals(right); - - public static bool operator !=(InternalRealtimeResponseItemObject left, InternalRealtimeResponseItemObject right) => !left.Equals(right); - - public static implicit operator InternalRealtimeResponseItemObject(string value) => new InternalRealtimeResponseItemObject(value); - - [EditorBrowsable(EditorBrowsableState.Never)] - public override bool Equals(object obj) => obj is InternalRealtimeResponseItemObject other && Equals(other); - - public bool Equals(InternalRealtimeResponseItemObject other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase); - - [EditorBrowsable(EditorBrowsableState.Never)] - public override int GetHashCode() => _value != null ? StringComparer.InvariantCultureIgnoreCase.GetHashCode(_value) : 0; - - public override string ToString() => _value; - } -} diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.Serialization.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.Serialization.cs index fc7df83e3..d0fa51478 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.Serialization.cs @@ -32,16 +32,6 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri throw new FormatException($"The model {nameof(InternalRealtimeResponseMessageItem)} does not support writing '{format}' format."); } base.JsonModelWriteCore(writer, options); - if (true && _additionalBinaryDataProperties?.ContainsKey("content") != true) - { - writer.WritePropertyName("content"u8); - writer.WriteStartArray(); - foreach (ConversationContentPart item in Content) - { - writer.WriteObjectValue(item, options); - } - writer.WriteEndArray(); - } if (_additionalBinaryDataProperties?.ContainsKey("status") != true) { writer.WritePropertyName("status"u8); @@ -52,11 +42,21 @@ protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWri writer.WritePropertyName("role"u8); writer.WriteStringValue(Role.ToString()); } + if (true && _additionalBinaryDataProperties?.ContainsKey("content") != true) + { + writer.WritePropertyName("content"u8); + writer.WriteStartArray(); + foreach (ConversationContentPart item in Content) + { + writer.WriteObjectValue(item, options); + } + writer.WriteEndArray(); + } } InternalRealtimeResponseMessageItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => (InternalRealtimeResponseMessageItem)JsonModelCreateCore(ref reader, options); - protected override InternalRealtimeResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) { string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") @@ -73,18 +73,18 @@ internal static InternalRealtimeResponseMessageItem DeserializeInternalRealtimeR { return null; } - InternalRealtimeResponseItemObject @object = default; + InternalRealtimeConversationResponseItemObject @object = default; InternalRealtimeItemType @type = default; string id = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); - IReadOnlyList content = default; ConversationItemStatus status = default; ConversationMessageRole role = default; + IReadOnlyList content = default; foreach (var prop in element.EnumerateObject()) { if (prop.NameEquals("object"u8)) { - @object = new InternalRealtimeResponseItemObject(prop.Value.GetString()); + @object = new InternalRealtimeConversationResponseItemObject(prop.Value.GetString()); continue; } if (prop.NameEquals("type"u8)) @@ -102,16 +102,6 @@ internal static InternalRealtimeResponseMessageItem DeserializeInternalRealtimeR id = prop.Value.GetString(); continue; } - if (prop.NameEquals("content"u8)) - { - List array = new List(); - foreach (var item in prop.Value.EnumerateArray()) - { - array.Add(ConversationContentPart.DeserializeConversationContentPart(item, options)); - } - content = array; - continue; - } if (prop.NameEquals("status"u8)) { status = new ConversationItemStatus(prop.Value.GetString()); @@ -122,6 +112,16 @@ internal static InternalRealtimeResponseMessageItem DeserializeInternalRealtimeR role = new ConversationMessageRole(prop.Value.GetString()); continue; } + if (prop.NameEquals("content"u8)) + { + List array = new List(); + foreach (var item in prop.Value.EnumerateArray()) + { + array.Add(ConversationContentPart.DeserializeConversationContentPart(item, options)); + } + content = array; + continue; + } if (true) { additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); @@ -132,9 +132,9 @@ internal static InternalRealtimeResponseMessageItem DeserializeInternalRealtimeR @type, id, additionalBinaryDataProperties, - content, status, - role); + role, + content); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); @@ -153,7 +153,7 @@ protected override BinaryData PersistableModelWriteCore(ModelReaderWriterOptions InternalRealtimeResponseMessageItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => (InternalRealtimeResponseMessageItem)PersistableModelCreateCore(data, options); - protected override InternalRealtimeResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) { string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) diff --git a/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.cs index 827580df0..87bf3c1ad 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseMessageItem.cs @@ -8,24 +8,22 @@ namespace OpenAI.RealtimeConversation { - internal partial class InternalRealtimeResponseMessageItem : InternalRealtimeResponseItem + internal partial class InternalRealtimeResponseMessageItem : InternalRealtimeConversationResponseItem { internal InternalRealtimeResponseMessageItem(string id, ConversationItemStatus status, ConversationMessageRole role) : base(InternalRealtimeItemType.Message, id) { - Content = new ChangeTrackingList(); Status = status; Role = role; + Content = new ChangeTrackingList(); } - internal InternalRealtimeResponseMessageItem(InternalRealtimeResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties, IReadOnlyList content, ConversationItemStatus status, ConversationMessageRole role) : base(@object, @type, id, additionalBinaryDataProperties) + internal InternalRealtimeResponseMessageItem(InternalRealtimeConversationResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties, ConversationItemStatus status, ConversationMessageRole role, IReadOnlyList content) : base(@object, @type, id, additionalBinaryDataProperties) { - Content = content; Status = status; Role = role; + Content = content; } - public IReadOnlyList Content { get; } - public ConversationItemStatus Status { get; } } } diff --git a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreateResponse.Serialization.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseOptions.Serialization.cs similarity index 68% rename from .dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreateResponse.Serialization.cs rename to .dotnet/src/Generated/Models/InternalRealtimeResponseOptions.Serialization.cs index e5feb25ba..7689a5e57 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreateResponse.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseOptions.Serialization.cs @@ -11,9 +11,9 @@ namespace OpenAI.RealtimeConversation { - internal partial class InternalRealtimeClientEventResponseCreateResponse : IJsonModel + internal partial class InternalRealtimeResponseOptions : IJsonModel { - void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) { writer.WriteStartObject(); JsonModelWriteCore(writer, options); @@ -22,23 +22,18 @@ void IJsonModel.Write(Utf8Jso protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") { - throw new FormatException($"The model {nameof(InternalRealtimeClientEventResponseCreateResponse)} does not support writing '{format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeResponseOptions)} does not support writing '{format}' format."); } if (Optional.IsCollectionDefined(Modalities) && _additionalBinaryDataProperties?.ContainsKey("modalities") != true) { writer.WritePropertyName("modalities"u8); writer.WriteStartArray(); - foreach (string item in Modalities) + foreach (InternalRealtimeRequestSessionModality item in Modalities) { - if (item == null) - { - writer.WriteNullValue(); - continue; - } - writer.WriteStringValue(item); + writer.WriteStringValue(item.ToString()); } writer.WriteEndArray(); } @@ -50,12 +45,12 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit if (Optional.IsDefined(Voice) && _additionalBinaryDataProperties?.ContainsKey("voice") != true) { writer.WritePropertyName("voice"u8); - writer.WriteStringValue(Voice); + writer.WriteStringValue(Voice.Value.ToString()); } if (Optional.IsDefined(OutputAudioFormat) && _additionalBinaryDataProperties?.ContainsKey("output_audio_format") != true) { writer.WritePropertyName("output_audio_format"u8); - writer.WriteStringValue(OutputAudioFormat); + writer.WriteStringValue(OutputAudioFormat.Value.ToString()); } if (Optional.IsCollectionDefined(Tools) && _additionalBinaryDataProperties?.ContainsKey("tools") != true) { @@ -117,29 +112,29 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit } } - InternalRealtimeClientEventResponseCreateResponse IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + InternalRealtimeResponseOptions IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); - protected virtual InternalRealtimeClientEventResponseCreateResponse JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + protected virtual InternalRealtimeResponseOptions JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") { - throw new FormatException($"The model {nameof(InternalRealtimeClientEventResponseCreateResponse)} does not support reading '{format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeResponseOptions)} does not support reading '{format}' format."); } using JsonDocument document = JsonDocument.ParseValue(ref reader); - return DeserializeInternalRealtimeClientEventResponseCreateResponse(document.RootElement, options); + return DeserializeInternalRealtimeResponseOptions(document.RootElement, options); } - internal static InternalRealtimeClientEventResponseCreateResponse DeserializeInternalRealtimeClientEventResponseCreateResponse(JsonElement element, ModelReaderWriterOptions options) + internal static InternalRealtimeResponseOptions DeserializeInternalRealtimeResponseOptions(JsonElement element, ModelReaderWriterOptions options) { if (element.ValueKind == JsonValueKind.Null) { return null; } - IList modalities = default; + IList modalities = default; string instructions = default; - string voice = default; - string outputAudioFormat = default; + ConversationVoice? voice = default; + ConversationAudioFormat? outputAudioFormat = default; IList tools = default; float? temperature = default; BinaryData maxOutputTokens = default; @@ -153,17 +148,10 @@ internal static InternalRealtimeClientEventResponseCreateResponse DeserializeInt { continue; } - List array = new List(); + List array = new List(); foreach (var item in prop.Value.EnumerateArray()) { - if (item.ValueKind == JsonValueKind.Null) - { - array.Add(null); - } - else - { - array.Add(item.GetString()); - } + array.Add(new InternalRealtimeRequestSessionModality(item.GetString())); } modalities = array; continue; @@ -175,12 +163,20 @@ internal static InternalRealtimeClientEventResponseCreateResponse DeserializeInt } if (prop.NameEquals("voice"u8)) { - voice = prop.Value.GetString(); + if (prop.Value.ValueKind == JsonValueKind.Null) + { + continue; + } + voice = new ConversationVoice(prop.Value.GetString()); continue; } if (prop.NameEquals("output_audio_format"u8)) { - outputAudioFormat = prop.Value.GetString(); + if (prop.Value.ValueKind == JsonValueKind.Null) + { + continue; + } + outputAudioFormat = new ConversationAudioFormat(prop.Value.GetString()); continue; } if (prop.NameEquals("tools"u8)) @@ -229,8 +225,8 @@ internal static InternalRealtimeClientEventResponseCreateResponse DeserializeInt additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); } } - return new InternalRealtimeClientEventResponseCreateResponse( - modalities ?? new ChangeTrackingList(), + return new InternalRealtimeResponseOptions( + modalities ?? new ChangeTrackingList(), instructions, voice, outputAudioFormat, @@ -241,53 +237,53 @@ internal static InternalRealtimeClientEventResponseCreateResponse DeserializeInt additionalBinaryDataProperties); } - BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) { case "J": return ModelReaderWriter.Write(this, options); default: - throw new FormatException($"The model {nameof(InternalRealtimeClientEventResponseCreateResponse)} does not support writing '{options.Format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeResponseOptions)} does not support writing '{options.Format}' format."); } } - InternalRealtimeClientEventResponseCreateResponse IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + InternalRealtimeResponseOptions IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); - protected virtual InternalRealtimeClientEventResponseCreateResponse PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + protected virtual InternalRealtimeResponseOptions PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) { case "J": using (JsonDocument document = JsonDocument.Parse(data)) { - return DeserializeInternalRealtimeClientEventResponseCreateResponse(document.RootElement, options); + return DeserializeInternalRealtimeResponseOptions(document.RootElement, options); } default: - throw new FormatException($"The model {nameof(InternalRealtimeClientEventResponseCreateResponse)} does not support reading '{options.Format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeResponseOptions)} does not support reading '{options.Format}' format."); } } - string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; - public static implicit operator BinaryContent(InternalRealtimeClientEventResponseCreateResponse internalRealtimeClientEventResponseCreateResponse) + public static implicit operator BinaryContent(InternalRealtimeResponseOptions internalRealtimeResponseOptions) { - if (internalRealtimeClientEventResponseCreateResponse == null) + if (internalRealtimeResponseOptions == null) { return null; } - return BinaryContent.Create(internalRealtimeClientEventResponseCreateResponse, ModelSerializationExtensions.WireOptions); + return BinaryContent.Create(internalRealtimeResponseOptions, ModelSerializationExtensions.WireOptions); } - public static explicit operator InternalRealtimeClientEventResponseCreateResponse(ClientResult result) + public static explicit operator InternalRealtimeResponseOptions(ClientResult result) { using PipelineResponse response = result.GetRawResponse(); using JsonDocument document = JsonDocument.Parse(response.Content); - return DeserializeInternalRealtimeClientEventResponseCreateResponse(document.RootElement, ModelSerializationExtensions.WireOptions); + return DeserializeInternalRealtimeResponseOptions(document.RootElement, ModelSerializationExtensions.WireOptions); } } } diff --git a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreateResponse.cs b/.dotnet/src/Generated/Models/InternalRealtimeResponseOptions.cs similarity index 60% rename from .dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreateResponse.cs rename to .dotnet/src/Generated/Models/InternalRealtimeResponseOptions.cs index 652404768..9d27614fb 100644 --- a/.dotnet/src/Generated/Models/InternalRealtimeClientEventResponseCreateResponse.cs +++ b/.dotnet/src/Generated/Models/InternalRealtimeResponseOptions.cs @@ -8,17 +8,17 @@ namespace OpenAI.RealtimeConversation { - internal partial class InternalRealtimeClientEventResponseCreateResponse + internal partial class InternalRealtimeResponseOptions { private protected IDictionary _additionalBinaryDataProperties; - public InternalRealtimeClientEventResponseCreateResponse() + public InternalRealtimeResponseOptions() { - Modalities = new ChangeTrackingList(); + Modalities = new ChangeTrackingList(); Tools = new ChangeTrackingList(); } - internal InternalRealtimeClientEventResponseCreateResponse(IList modalities, string instructions, string voice, string outputAudioFormat, IList tools, float? temperature, BinaryData maxOutputTokens, BinaryData toolChoice, IDictionary additionalBinaryDataProperties) + internal InternalRealtimeResponseOptions(IList modalities, string instructions, ConversationVoice? voice, ConversationAudioFormat? outputAudioFormat, IList tools, float? temperature, BinaryData maxOutputTokens, BinaryData toolChoice, IDictionary additionalBinaryDataProperties) { Modalities = modalities; Instructions = instructions; @@ -31,13 +31,13 @@ internal InternalRealtimeClientEventResponseCreateResponse(IList modalit _additionalBinaryDataProperties = additionalBinaryDataProperties; } - public IList Modalities { get; } + public IList Modalities { get; } public string Instructions { get; set; } - public string Voice { get; set; } + public ConversationVoice? Voice { get; set; } - public string OutputAudioFormat { get; set; } + public ConversationAudioFormat? OutputAudioFormat { get; set; } public IList Tools { get; } diff --git a/.dotnet/src/Generated/Models/InternalUnknownChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/InternalUnknownChatMessage.Serialization.cs index 1dbb31bef..70a0deb09 100644 --- a/.dotnet/src/Generated/Models/InternalUnknownChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/InternalUnknownChatMessage.Serialization.cs @@ -45,19 +45,19 @@ internal static InternalUnknownChatMessage DeserializeInternalUnknownChatMessage { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (true) @@ -66,7 +66,7 @@ internal static InternalUnknownChatMessage DeserializeInternalUnknownChatMessage } } // CUSTOM: Initialize Content collection property. - return new InternalUnknownChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties); + return new InternalUnknownChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/InternalUnknownChatMessage.cs b/.dotnet/src/Generated/Models/InternalUnknownChatMessage.cs index 994ab4d56..82bb59b2c 100644 --- a/.dotnet/src/Generated/Models/InternalUnknownChatMessage.cs +++ b/.dotnet/src/Generated/Models/InternalUnknownChatMessage.cs @@ -9,7 +9,7 @@ namespace OpenAI.Chat { internal partial class InternalUnknownChatMessage : ChatMessage { - internal InternalUnknownChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties) : base(role, content, additionalBinaryDataProperties) + internal InternalUnknownChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties) : base(content, role, additionalBinaryDataProperties) { } } diff --git a/.dotnet/src/Generated/Models/StreamingChatCompletionUpdate.Serialization.cs b/.dotnet/src/Generated/Models/StreamingChatCompletionUpdate.Serialization.cs index ccb709a79..db571acc7 100644 --- a/.dotnet/src/Generated/Models/StreamingChatCompletionUpdate.Serialization.cs +++ b/.dotnet/src/Generated/Models/StreamingChatCompletionUpdate.Serialization.cs @@ -80,8 +80,15 @@ protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWrit } if (Optional.IsDefined(Usage) && _additionalBinaryDataProperties?.ContainsKey("usage") != true) { - writer.WritePropertyName("usage"u8); - writer.WriteObjectValue(Usage, options); + if (Usage != null) + { + writer.WritePropertyName("usage"u8); + writer.WriteObjectValue(Usage, options); + } + else + { + writer.WriteNull("usage"u8); + } } if (true && _additionalBinaryDataProperties != null) { @@ -183,6 +190,7 @@ internal static StreamingChatCompletionUpdate DeserializeStreamingChatCompletion { if (prop.Value.ValueKind == JsonValueKind.Null) { + usage = null; continue; } usage = ChatTokenUsage.DeserializeChatTokenUsage(prop.Value, options); diff --git a/.dotnet/src/Generated/Models/StreamingChatOutputAudioUpdate.Serialization.cs b/.dotnet/src/Generated/Models/StreamingChatOutputAudioUpdate.Serialization.cs new file mode 100644 index 000000000..080cfa4c5 --- /dev/null +++ b/.dotnet/src/Generated/Models/StreamingChatOutputAudioUpdate.Serialization.cs @@ -0,0 +1,182 @@ +// + +#nullable disable + +using System; +using System.ClientModel; +using System.ClientModel.Primitives; +using System.Collections.Generic; +using System.Text.Json; +using OpenAI; + +namespace OpenAI.Chat +{ + public partial class StreamingChatOutputAudioUpdate : IJsonModel + { + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + writer.WriteStartObject(); + JsonModelWriteCore(writer, options); + writer.WriteEndObject(); + } + + protected virtual void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(StreamingChatOutputAudioUpdate)} does not support writing '{format}' format."); + } + if (Optional.IsDefined(Id) && _additionalBinaryDataProperties?.ContainsKey("id") != true) + { + writer.WritePropertyName("id"u8); + writer.WriteStringValue(Id); + } + if (Optional.IsDefined(ExpiresAt) && _additionalBinaryDataProperties?.ContainsKey("expires_at") != true) + { + writer.WritePropertyName("expires_at"u8); + writer.WriteNumberValue(ExpiresAt.Value, "U"); + } + if (Optional.IsDefined(TranscriptUpdate) && _additionalBinaryDataProperties?.ContainsKey("transcript") != true) + { + writer.WritePropertyName("transcript"u8); + writer.WriteStringValue(TranscriptUpdate); + } + if (Optional.IsDefined(AudioBytesUpdate) && _additionalBinaryDataProperties?.ContainsKey("data") != true) + { + writer.WritePropertyName("data"u8); + writer.WriteBase64StringValue(AudioBytesUpdate.ToArray(), "D"); + } + if (true && _additionalBinaryDataProperties != null) + { + foreach (var item in _additionalBinaryDataProperties) + { + if (ModelSerializationExtensions.IsSentinelValue(item.Value)) + { + continue; + } + writer.WritePropertyName(item.Key); +#if NET6_0_OR_GREATER + writer.WriteRawValue(item.Value); +#else + using (JsonDocument document = JsonDocument.Parse(item.Value)) + { + JsonSerializer.Serialize(writer, document.RootElement); + } +#endif + } + } + } + + StreamingChatOutputAudioUpdate IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + + protected virtual StreamingChatOutputAudioUpdate JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + if (format != "J") + { + throw new FormatException($"The model {nameof(StreamingChatOutputAudioUpdate)} does not support reading '{format}' format."); + } + using JsonDocument document = JsonDocument.ParseValue(ref reader); + return DeserializeStreamingChatOutputAudioUpdate(document.RootElement, options); + } + + internal static StreamingChatOutputAudioUpdate DeserializeStreamingChatOutputAudioUpdate(JsonElement element, ModelReaderWriterOptions options) + { + if (element.ValueKind == JsonValueKind.Null) + { + return null; + } + string id = default; + DateTimeOffset? expiresAt = default; + string transcriptUpdate = default; + BinaryData audioBytesUpdate = default; + IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); + foreach (var prop in element.EnumerateObject()) + { + if (prop.NameEquals("id"u8)) + { + id = prop.Value.GetString(); + continue; + } + if (prop.NameEquals("expires_at"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + continue; + } + expiresAt = DateTimeOffset.FromUnixTimeSeconds(prop.Value.GetInt64()); + continue; + } + if (prop.NameEquals("transcript"u8)) + { + transcriptUpdate = prop.Value.GetString(); + continue; + } + if (prop.NameEquals("data"u8)) + { + if (prop.Value.ValueKind == JsonValueKind.Null) + { + continue; + } + audioBytesUpdate = BinaryData.FromBytes(prop.Value.GetBytesFromBase64("D")); + continue; + } + if (true) + { + additionalBinaryDataProperties.Add(prop.Name, BinaryData.FromString(prop.Value.GetRawText())); + } + } + return new StreamingChatOutputAudioUpdate(id, expiresAt, transcriptUpdate, audioBytesUpdate, additionalBinaryDataProperties); + } + + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + + protected virtual BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + return ModelReaderWriter.Write(this, options); + default: + throw new FormatException($"The model {nameof(StreamingChatOutputAudioUpdate)} does not support writing '{options.Format}' format."); + } + } + + StreamingChatOutputAudioUpdate IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + + protected virtual StreamingChatOutputAudioUpdate PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + { + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + switch (format) + { + case "J": + using (JsonDocument document = JsonDocument.Parse(data)) + { + return DeserializeStreamingChatOutputAudioUpdate(document.RootElement, options); + } + default: + throw new FormatException($"The model {nameof(StreamingChatOutputAudioUpdate)} does not support reading '{options.Format}' format."); + } + } + + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + + public static implicit operator BinaryContent(StreamingChatOutputAudioUpdate streamingChatOutputAudioUpdate) + { + if (streamingChatOutputAudioUpdate == null) + { + return null; + } + return BinaryContent.Create(streamingChatOutputAudioUpdate, ModelSerializationExtensions.WireOptions); + } + + public static explicit operator StreamingChatOutputAudioUpdate(ClientResult result) + { + using PipelineResponse response = result.GetRawResponse(); + using JsonDocument document = JsonDocument.Parse(response.Content); + return DeserializeStreamingChatOutputAudioUpdate(document.RootElement, ModelSerializationExtensions.WireOptions); + } + } +} diff --git a/.dotnet/src/Generated/Models/StreamingChatOutputAudioUpdate.cs b/.dotnet/src/Generated/Models/StreamingChatOutputAudioUpdate.cs new file mode 100644 index 000000000..9abebcebd --- /dev/null +++ b/.dotnet/src/Generated/Models/StreamingChatOutputAudioUpdate.cs @@ -0,0 +1,37 @@ +// + +#nullable disable + +using System; +using System.Collections.Generic; + +namespace OpenAI.Chat +{ + public partial class StreamingChatOutputAudioUpdate + { + private protected IDictionary _additionalBinaryDataProperties; + + internal StreamingChatOutputAudioUpdate() + { + } + + internal StreamingChatOutputAudioUpdate(string id, DateTimeOffset? expiresAt, string transcriptUpdate, BinaryData audioBytesUpdate, IDictionary additionalBinaryDataProperties) + { + Id = id; + ExpiresAt = expiresAt; + TranscriptUpdate = transcriptUpdate; + AudioBytesUpdate = audioBytesUpdate; + _additionalBinaryDataProperties = additionalBinaryDataProperties; + } + + public string Id { get; } + + public DateTimeOffset? ExpiresAt { get; } + + internal IDictionary SerializedAdditionalRawData + { + get => _additionalBinaryDataProperties; + set => _additionalBinaryDataProperties = value; + } + } +} diff --git a/.dotnet/src/Generated/Models/SystemChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/SystemChatMessage.Serialization.cs index 620f4d85d..9f11bed16 100644 --- a/.dotnet/src/Generated/Models/SystemChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/SystemChatMessage.Serialization.cs @@ -47,20 +47,20 @@ internal static SystemChatMessage DeserializeSystemChatMessage(JsonElement eleme { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string participantName = default; foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (prop.NameEquals("name"u8)) @@ -74,7 +74,7 @@ internal static SystemChatMessage DeserializeSystemChatMessage(JsonElement eleme } } // CUSTOM: Initialize Content collection property. - return new SystemChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, participantName); + return new SystemChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, participantName); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/SystemChatMessage.cs b/.dotnet/src/Generated/Models/SystemChatMessage.cs index 0785ec94e..0f8453e0e 100644 --- a/.dotnet/src/Generated/Models/SystemChatMessage.cs +++ b/.dotnet/src/Generated/Models/SystemChatMessage.cs @@ -9,7 +9,7 @@ namespace OpenAI.Chat { public partial class SystemChatMessage : ChatMessage { - internal SystemChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties, string participantName) : base(role, content, additionalBinaryDataProperties) + internal SystemChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties, string participantName) : base(content, role, additionalBinaryDataProperties) { ParticipantName = participantName; } diff --git a/.dotnet/src/Generated/Models/ToolChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/ToolChatMessage.Serialization.cs index c3072ce6b..6762d530b 100644 --- a/.dotnet/src/Generated/Models/ToolChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/ToolChatMessage.Serialization.cs @@ -51,20 +51,20 @@ internal static ToolChatMessage DeserializeToolChatMessage(JsonElement element, { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string toolCallId = default; foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (prop.NameEquals("tool_call_id"u8)) @@ -78,7 +78,7 @@ internal static ToolChatMessage DeserializeToolChatMessage(JsonElement element, } } // CUSTOM: Initialize Content collection property. - return new ToolChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, toolCallId); + return new ToolChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, toolCallId); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/ToolChatMessage.cs b/.dotnet/src/Generated/Models/ToolChatMessage.cs index 6e671b9c4..c0c786560 100644 --- a/.dotnet/src/Generated/Models/ToolChatMessage.cs +++ b/.dotnet/src/Generated/Models/ToolChatMessage.cs @@ -9,7 +9,7 @@ namespace OpenAI.Chat { public partial class ToolChatMessage : ChatMessage { - internal ToolChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties, string toolCallId) : base(role, content, additionalBinaryDataProperties) + internal ToolChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties, string toolCallId) : base(content, role, additionalBinaryDataProperties) { ToolCallId = toolCallId; } diff --git a/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.Serialization.cs b/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.Serialization.cs index f618f94fd..5a0d718b7 100644 --- a/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.Serialization.cs +++ b/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.Serialization.cs @@ -10,13 +10,13 @@ namespace OpenAI.RealtimeConversation { - internal partial class UnknownRealtimeResponseItem : IJsonModel + internal partial class UnknownRealtimeResponseItem : IJsonModel { internal UnknownRealtimeResponseItem() { } - void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) + void IJsonModel.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options) { writer.WriteStartObject(); JsonModelWriteCore(writer, options); @@ -25,25 +25,25 @@ void IJsonModel.Write(Utf8JsonWriter writer, Model protected override void JsonModelWriteCore(Utf8JsonWriter writer, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") { - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support writing '{format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support writing '{format}' format."); } base.JsonModelWriteCore(writer, options); } - InternalRealtimeResponseItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); + InternalRealtimeConversationResponseItem IJsonModel.Create(ref Utf8JsonReader reader, ModelReaderWriterOptions options) => JsonModelCreateCore(ref reader, options); - protected override InternalRealtimeResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem JsonModelCreateCore(ref Utf8JsonReader reader, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; if (format != "J") { - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support reading '{format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support reading '{format}' format."); } using JsonDocument document = JsonDocument.ParseValue(ref reader); - return DeserializeInternalRealtimeResponseItem(document.RootElement, options); + return DeserializeInternalRealtimeConversationResponseItem(document.RootElement, options); } internal static UnknownRealtimeResponseItem DeserializeUnknownRealtimeResponseItem(JsonElement element, ModelReaderWriterOptions options) @@ -52,7 +52,7 @@ internal static UnknownRealtimeResponseItem DeserializeUnknownRealtimeResponseIt { return null; } - InternalRealtimeResponseItemObject @object = default; + InternalRealtimeConversationResponseItemObject @object = default; InternalRealtimeItemType @type = default; string id = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); @@ -60,7 +60,7 @@ internal static UnknownRealtimeResponseItem DeserializeUnknownRealtimeResponseIt { if (prop.NameEquals("object"u8)) { - @object = new InternalRealtimeResponseItemObject(prop.Value.GetString()); + @object = new InternalRealtimeConversationResponseItemObject(prop.Value.GetString()); continue; } if (prop.NameEquals("type"u8)) @@ -86,37 +86,37 @@ internal static UnknownRealtimeResponseItem DeserializeUnknownRealtimeResponseIt return new UnknownRealtimeResponseItem(@object, @type, id, additionalBinaryDataProperties); } - BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); + BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); protected override BinaryData PersistableModelWriteCore(ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) { case "J": return ModelReaderWriter.Write(this, options); default: - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support writing '{options.Format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support writing '{options.Format}' format."); } } - InternalRealtimeResponseItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); + InternalRealtimeConversationResponseItem IPersistableModel.Create(BinaryData data, ModelReaderWriterOptions options) => PersistableModelCreateCore(data, options); - protected override InternalRealtimeResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) + protected override InternalRealtimeConversationResponseItem PersistableModelCreateCore(BinaryData data, ModelReaderWriterOptions options) { - string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; + string format = options.Format == "W" ? ((IPersistableModel)this).GetFormatFromOptions(options) : options.Format; switch (format) { case "J": using (JsonDocument document = JsonDocument.Parse(data)) { - return DeserializeInternalRealtimeResponseItem(document.RootElement, options); + return DeserializeInternalRealtimeConversationResponseItem(document.RootElement, options); } default: - throw new FormatException($"The model {nameof(InternalRealtimeResponseItem)} does not support reading '{options.Format}' format."); + throw new FormatException($"The model {nameof(InternalRealtimeConversationResponseItem)} does not support reading '{options.Format}' format."); } } - string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; + string IPersistableModel.GetFormatFromOptions(ModelReaderWriterOptions options) => "J"; } } diff --git a/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.cs b/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.cs index bb08f7ea4..1f77d4c21 100644 --- a/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.cs +++ b/.dotnet/src/Generated/Models/UnknownRealtimeResponseItem.cs @@ -7,9 +7,9 @@ namespace OpenAI.RealtimeConversation { - internal partial class UnknownRealtimeResponseItem : InternalRealtimeResponseItem + internal partial class UnknownRealtimeResponseItem : InternalRealtimeConversationResponseItem { - internal UnknownRealtimeResponseItem(InternalRealtimeResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties) : base(@object, @type != default ? @type : "unknown", id, additionalBinaryDataProperties) + internal UnknownRealtimeResponseItem(InternalRealtimeConversationResponseItemObject @object, InternalRealtimeItemType @type, string id, IDictionary additionalBinaryDataProperties) : base(@object, @type != default ? @type : "unknown", id, additionalBinaryDataProperties) { } } diff --git a/.dotnet/src/Generated/Models/UserChatMessage.Serialization.cs b/.dotnet/src/Generated/Models/UserChatMessage.Serialization.cs index 300edf218..c53bf0f9d 100644 --- a/.dotnet/src/Generated/Models/UserChatMessage.Serialization.cs +++ b/.dotnet/src/Generated/Models/UserChatMessage.Serialization.cs @@ -47,20 +47,20 @@ internal static UserChatMessage DeserializeUserChatMessage(JsonElement element, { return null; } - Chat.ChatMessageRole role = default; ChatMessageContent content = default; + Chat.ChatMessageRole role = default; IDictionary additionalBinaryDataProperties = new ChangeTrackingDictionary(); string participantName = default; foreach (var prop in element.EnumerateObject()) { - if (prop.NameEquals("role"u8)) + if (prop.NameEquals("content"u8)) { - role = prop.Value.GetString().ToChatMessageRole(); + DeserializeContentValue(prop, ref content); continue; } - if (prop.NameEquals("content"u8)) + if (prop.NameEquals("role"u8)) { - DeserializeContentValue(prop, ref content); + role = prop.Value.GetString().ToChatMessageRole(); continue; } if (prop.NameEquals("name"u8)) @@ -74,7 +74,7 @@ internal static UserChatMessage DeserializeUserChatMessage(JsonElement element, } } // CUSTOM: Initialize Content collection property. - return new UserChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, participantName); + return new UserChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, participantName); } BinaryData IPersistableModel.Write(ModelReaderWriterOptions options) => PersistableModelWriteCore(options); diff --git a/.dotnet/src/Generated/Models/UserChatMessage.cs b/.dotnet/src/Generated/Models/UserChatMessage.cs index 259270b7f..4d7bda7a0 100644 --- a/.dotnet/src/Generated/Models/UserChatMessage.cs +++ b/.dotnet/src/Generated/Models/UserChatMessage.cs @@ -9,7 +9,7 @@ namespace OpenAI.Chat { public partial class UserChatMessage : ChatMessage { - internal UserChatMessage(Chat.ChatMessageRole role, ChatMessageContent content, IDictionary additionalBinaryDataProperties, string participantName) : base(role, content, additionalBinaryDataProperties) + internal UserChatMessage(ChatMessageContent content, Chat.ChatMessageRole role, IDictionary additionalBinaryDataProperties, string participantName) : base(content, role, additionalBinaryDataProperties) { ParticipantName = participantName; } diff --git a/.dotnet/src/Generated/OpenAIModelFactory.cs b/.dotnet/src/Generated/OpenAIModelFactory.cs index 10419b84a..bb173a37f 100644 --- a/.dotnet/src/Generated/OpenAIModelFactory.cs +++ b/.dotnet/src/Generated/OpenAIModelFactory.cs @@ -523,7 +523,7 @@ public static ConversationInputSpeechFinishedUpdate ConversationInputSpeechFinis return new ConversationInputSpeechFinishedUpdate(eventId, RealtimeConversation.ConversationUpdateKind.InputSpeechStopped, additionalBinaryDataProperties: null, itemId, audioEndMs); } - public static ConversationItemCreatedUpdate ConversationItemCreatedUpdate(string eventId = default, string previousItemId = default, InternalRealtimeResponseItem internalItem = default) + public static ConversationItemCreatedUpdate ConversationItemCreatedUpdate(string eventId = default, string previousItemId = default, InternalRealtimeConversationResponseItem internalItem = default) { return new ConversationItemCreatedUpdate(eventId, RealtimeConversation.ConversationUpdateKind.ItemCreated, additionalBinaryDataProperties: null, previousItemId, internalItem); @@ -613,7 +613,7 @@ public static ConversationResponseFinishedUpdate ConversationResponseFinishedUpd return new ConversationResponseFinishedUpdate(eventId, RealtimeConversation.ConversationUpdateKind.ResponseFinished, additionalBinaryDataProperties: null, internalResponse); } - public static ConversationItemStreamingStartedUpdate ConversationItemStreamingStartedUpdate(string eventId = default, string responseId = default, int itemIndex = default, InternalRealtimeResponseItem internalItem = default) + public static ConversationItemStreamingStartedUpdate ConversationItemStreamingStartedUpdate(string eventId = default, string responseId = default, int itemIndex = default, InternalRealtimeConversationResponseItem internalItem = default) { return new ConversationItemStreamingStartedUpdate( @@ -625,7 +625,7 @@ public static ConversationItemStreamingStartedUpdate ConversationItemStreamingSt internalItem); } - public static ConversationItemStreamingFinishedUpdate ConversationItemStreamingFinishedUpdate(string eventId = default, string responseId = default, int outputIndex = default, InternalRealtimeResponseItem internalItem = default) + public static ConversationItemStreamingFinishedUpdate ConversationItemStreamingFinishedUpdate(string eventId = default, string responseId = default, int outputIndex = default, InternalRealtimeConversationResponseItem internalItem = default) { return new ConversationItemStreamingFinishedUpdate( @@ -890,7 +890,7 @@ public static ChatInputTokenUsageDetails ChatInputTokenUsageDetails(int audioTok return new ChatInputTokenUsageDetails(audioTokenCount, cachedTokenCount, additionalBinaryDataProperties: null); } - public static ChatCompletionOptions ChatCompletionOptions(float? frequencyPenalty = default, float? presencePenalty = default, ChatResponseFormat responseFormat = default, float? temperature = default, float? topP = default, IEnumerable tools = default, IEnumerable messages = default, InternalCreateChatCompletionRequestModel model = default, int? n = default, bool? stream = default, InternalChatCompletionStreamOptions streamOptions = default, bool? includeLogProbabilities = default, int? topLogProbabilityCount = default, IEnumerable stopSequences = default, IDictionary logitBiases = default, ChatToolChoice toolChoice = default, ChatFunctionChoice functionChoice = default, bool? allowParallelToolCalls = default, string endUserId = default, long? seed = default, int? deprecatedMaxTokens = default, int? maxOutputTokenCount = default, IEnumerable functions = default, IDictionary metadata = default, bool? storedOutputEnabled = default, InternalCreateChatCompletionRequestServiceTier? serviceTier = default) + public static ChatCompletionOptions ChatCompletionOptions(float? frequencyPenalty = default, float? presencePenalty = default, ChatResponseFormat responseFormat = default, float? temperature = default, float? topP = default, IEnumerable tools = default, IEnumerable messages = default, InternalCreateChatCompletionRequestModel model = default, int? n = default, bool? stream = default, InternalChatCompletionStreamOptions streamOptions = default, bool? includeLogProbabilities = default, int? topLogProbabilityCount = default, IEnumerable stopSequences = default, IDictionary logitBiases = default, ChatToolChoice toolChoice = default, ChatFunctionChoice functionChoice = default, bool? allowParallelToolCalls = default, string endUserId = default, long? seed = default, int? deprecatedMaxTokens = default, int? maxOutputTokenCount = default, IEnumerable functions = default, IDictionary metadata = default, bool? storedOutputEnabled = default, InternalCreateChatCompletionRequestServiceTier? serviceTier = default, IEnumerable internalModalities = default, ChatAudioOptions audioOptions = default) { tools ??= new ChangeTrackingList(); messages ??= new ChangeTrackingList(); @@ -898,6 +898,7 @@ public static ChatCompletionOptions ChatCompletionOptions(float? frequencyPenalt logitBiases ??= new ChangeTrackingDictionary(); functions ??= new ChangeTrackingList(); metadata ??= new ChangeTrackingDictionary(); + internalModalities ??= new ChangeTrackingList(); return new ChatCompletionOptions( frequencyPenalty, @@ -926,39 +927,48 @@ public static ChatCompletionOptions ChatCompletionOptions(float? frequencyPenalt metadata, storedOutputEnabled, serviceTier, + internalModalities?.ToList(), + audioOptions, additionalBinaryDataProperties: null); } - public static ChatMessage ChatMessage(string role = default, ChatMessageContent content = default) + public static ChatMessage ChatMessage(ChatMessageContent content = default, string role = default) { - return new InternalUnknownChatMessage(role.ToChatMessageRole(), content, additionalBinaryDataProperties: null); + return new InternalUnknownChatMessage(content, role.ToChatMessageRole(), additionalBinaryDataProperties: null); } public static SystemChatMessage SystemChatMessage(ChatMessageContent content = default, string participantName = default) { - return new SystemChatMessage(Chat.ChatMessageRole.System, content, additionalBinaryDataProperties: null, participantName); + return new SystemChatMessage(content, Chat.ChatMessageRole.System, additionalBinaryDataProperties: null, participantName); } public static UserChatMessage UserChatMessage(ChatMessageContent content = default, string participantName = default) { - return new UserChatMessage(Chat.ChatMessageRole.User, content, additionalBinaryDataProperties: null, participantName); + return new UserChatMessage(content, Chat.ChatMessageRole.User, additionalBinaryDataProperties: null, participantName); } - public static AssistantChatMessage AssistantChatMessage(ChatMessageContent content = default, string refusal = default, string participantName = default, IEnumerable toolCalls = default, ChatFunctionCall functionCall = default) + public static AssistantChatMessage AssistantChatMessage(ChatMessageContent content = default, string refusal = default, string participantName = default, IEnumerable toolCalls = default, ChatFunctionCall functionCall = default, ChatOutputAudioReference outputAudioReference = default) { toolCalls ??= new ChangeTrackingList(); return new AssistantChatMessage( - Chat.ChatMessageRole.Assistant, content, + Chat.ChatMessageRole.Assistant, additionalBinaryDataProperties: null, refusal, participantName, toolCalls?.ToList(), - functionCall); + functionCall, + outputAudioReference); + } + + public static ChatOutputAudioReference ChatOutputAudioReference(string id = default) + { + + return new ChatOutputAudioReference(id, additionalBinaryDataProperties: null); } public static ChatToolCall ChatToolCall(string id = default, InternalChatCompletionMessageToolCallFunction function = default, Chat.ChatToolCallKind kind = default) @@ -976,13 +986,19 @@ public static ChatFunctionCall ChatFunctionCall(string functionName = default, B public static ToolChatMessage ToolChatMessage(ChatMessageContent content = default, string toolCallId = default) { - return new ToolChatMessage(Chat.ChatMessageRole.Tool, content, additionalBinaryDataProperties: null, toolCallId); + return new ToolChatMessage(content, Chat.ChatMessageRole.Tool, additionalBinaryDataProperties: null, toolCallId); } public static FunctionChatMessage FunctionChatMessage(ChatMessageContent content = default, string functionName = default) { - return new FunctionChatMessage(Chat.ChatMessageRole.Function, content, additionalBinaryDataProperties: null, functionName); + return new FunctionChatMessage(content, Chat.ChatMessageRole.Function, additionalBinaryDataProperties: null, functionName); + } + + public static ChatAudioOptions ChatAudioOptions(ChatOutputAudioVoice outputAudioVoice = default, ChatOutputAudioFormat outputAudioFormat = default) + { + + return new ChatAudioOptions(outputAudioVoice, outputAudioFormat, additionalBinaryDataProperties: null); } public static ChatResponseFormat ChatResponseFormat(string @type = default) @@ -1019,6 +1035,12 @@ public static ChatCompletion ChatCompletion(string id = default, string model = additionalBinaryDataProperties: null); } + public static ChatOutputAudio ChatOutputAudio(string id = default, DateTimeOffset expiresAt = default, string transcript = default, BinaryData audioBytes = default) + { + + return new ChatOutputAudio(id, expiresAt, transcript, audioBytes, additionalBinaryDataProperties: null); + } + public static ChatTokenLogProbabilityDetails ChatTokenLogProbabilityDetails(string token = default, float logProbability = default, ReadOnlyMemory? utf8Bytes = default, IEnumerable topLogProbabilities = default) { topLogProbabilities ??= new ChangeTrackingList(); @@ -1225,10 +1247,22 @@ public static ChatMessageContent ChatMessageContent() return new ChatMessageContent(additionalBinaryDataProperties: null); } - public static ChatMessageContentPart ChatMessageContentPart(Chat.ChatMessageContentPartKind kind = default, string text = default, InternalChatCompletionRequestMessageContentPartImageImageUrl imageUri = default, string refusal = default) + public static ChatMessageContentPart ChatMessageContentPart(Chat.ChatMessageContentPartKind kind = default, string text = default, InternalChatCompletionRequestMessageContentPartImageImageUrl imageUri = default, string refusal = default, InternalChatCompletionRequestMessageContentPartAudioInputAudio inputAudio = default) + { + + return new ChatMessageContentPart( + kind, + text, + imageUri, + refusal, + inputAudio, + serializedAdditionalRawData: null); + } + + public static StreamingChatOutputAudioUpdate StreamingChatOutputAudioUpdate(string id = default, DateTimeOffset? expiresAt = default, string transcriptUpdate = default, BinaryData audioBytesUpdate = default) { - return new ChatMessageContentPart(kind, text, imageUri, refusal, serializedAdditionalRawData: null); + return new StreamingChatOutputAudioUpdate(id, expiresAt, transcriptUpdate, audioBytesUpdate, additionalBinaryDataProperties: null); } public static StreamingChatFunctionCallUpdate StreamingChatFunctionCallUpdate(string functionName = default, BinaryData functionArgumentsUpdate = default) diff --git a/.dotnet/tests/Chat/ChatSmokeTests.cs b/.dotnet/tests/Chat/ChatSmokeTests.cs index 9d1678080..c59e7dbc2 100644 --- a/.dotnet/tests/Chat/ChatSmokeTests.cs +++ b/.dotnet/tests/Chat/ChatSmokeTests.cs @@ -533,6 +533,119 @@ public void SerializeRefusalMessages() Assert.That(serialized, Does.Not.Contain("content")); } + [Test] + public void SerializeAudioThings() + { + // User audio input: wire-correlated ("real") content parts should cleanly serialize/deserialize + ChatMessageContentPart inputAudioContentPart = ChatMessageContentPart.CreateInputAudioPart( + BinaryData.FromBytes([0x4, 0x2]), + ChatInputAudioFormat.Mp3); + Assert.That(inputAudioContentPart, Is.Not.Null); + BinaryData serializedInputAudioContentPart = ModelReaderWriter.Write(inputAudioContentPart); + Assert.That(serializedInputAudioContentPart.ToString(), Does.Contain(@"""format"":""mp3""")); + ChatMessageContentPart deserializedInputAudioContentPart = ModelReaderWriter.Read(serializedInputAudioContentPart); + Assert.That(deserializedInputAudioContentPart.InputAudioBytes.ToArray()[1], Is.EqualTo(0x2)); + + AssistantChatMessage message = ModelReaderWriter.Read(BinaryData.FromBytes(""" + { + "role": "assistant", + "audio": { + "id": "audio_correlated_id_1234" + } + } + """u8.ToArray())); + Assert.That(message.Content, Has.Count.EqualTo(0)); + Assert.That(message.OutputAudioReference, Is.Not.Null); + Assert.That(message.OutputAudioReference.Id, Is.EqualTo("audio_correlated_id_1234")); + string serializedMessage = ModelReaderWriter.Write(message).ToString(); + Assert.That(serializedMessage, Does.Contain(@"""audio"":{""id"":""audio_correlated_id_1234""}")); + + AssistantChatMessage ordinaryTextAssistantMessage = new(["This was a message from the assistant"]); + ordinaryTextAssistantMessage.OutputAudioReference = new("extra-audio-id"); + BinaryData serializedLateAudioMessage = ModelReaderWriter.Write(ordinaryTextAssistantMessage); + Assert.That(serializedLateAudioMessage.ToString(), Does.Contain("was a message")); + Assert.That(serializedLateAudioMessage.ToString(), Does.Contain("extra-audio-id")); + + BinaryData rawAudioResponse = BinaryData.FromBytes(""" + { + "id": "chatcmpl-AOqyHuhjVDeGVbCZXJZ8mCLyl5nBq", + "object": "chat.completion", + "created": 1730486857, + "model": "gpt-4o-audio-preview-2024-10-01", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": null, + "refusal": null, + "audio": { + "id": "audio_6725224ac62481908ab55dc283289d87", + "data": "dHJ1bmNhdGVk", + "expires_at": 1730490458, + "transcript": "Hello there! How can I assist you with your test today?" + } + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 28, + "completion_tokens": 97, + "total_tokens": 125, + "prompt_tokens_details": { + "cached_tokens": 0, + "text_tokens": 11, + "image_tokens": 0, + "audio_tokens": 17 + }, + "completion_tokens_details": { + "reasoning_tokens": 0, + "text_tokens": 23, + "audio_tokens": 74, + "accepted_prediction_tokens": 0, + "rejected_prediction_tokens": 0 + } + }, + "system_fingerprint": "fp_49254d0e9b" + } + """u8.ToArray()); + ChatCompletion audioCompletion = ModelReaderWriter.Read(rawAudioResponse); + Assert.That(audioCompletion, Is.Not.Null); + Assert.That(audioCompletion.Content, Has.Count.EqualTo(0)); + Assert.That(audioCompletion.OutputAudio, Is.Not.Null); + Assert.That(audioCompletion.OutputAudio.Id, Is.EqualTo("audio_6725224ac62481908ab55dc283289d87")); + Assert.That(audioCompletion.OutputAudio.AudioBytes, Is.Not.Null); + Assert.That(audioCompletion.OutputAudio.Transcript, Is.Not.Null.And.Not.Empty); + + AssistantChatMessage audioHistoryMessage = new(audioCompletion); + Assert.That(audioHistoryMessage.OutputAudioReference?.Id, Is.EqualTo(audioCompletion.OutputAudio.Id)); + + foreach (KeyValuePair modalitiesValueToKeyTextAndAudioPresenceItem + in new List>() + { + new(ChatResponseModalities.Default, (false, false, false)), + new(ChatResponseModalities.Default | ChatResponseModalities.Text, (true, true, false)), + new(ChatResponseModalities.Default | ChatResponseModalities.Audio, (true, false, true)), + new(ChatResponseModalities.Default | ChatResponseModalities.Text | ChatResponseModalities.Audio, (true, true, true)), + new(ChatResponseModalities.Text, (true, true, false)), + new(ChatResponseModalities.Audio, (true, false, true)), + new(ChatResponseModalities.Text | ChatResponseModalities.Audio, (true, true, true)), + }) + { + ChatResponseModalities modalitiesValue = modalitiesValueToKeyTextAndAudioPresenceItem.Key; + (bool keyExpected, bool textExpected, bool audioExpected) = modalitiesValueToKeyTextAndAudioPresenceItem.Value; + ChatCompletionOptions testOptions = new() + { + ResponseModalities = modalitiesValue, + }; + string serializedOptions = ModelReaderWriter.Write(testOptions).ToString().ToLower(); + Assert.That(serializedOptions.Contains("modalities"), Is.EqualTo(keyExpected)); + Assert.That(serializedOptions.Contains("text"), Is.EqualTo(textExpected)); + Assert.That(serializedOptions.Contains("audio"), Is.EqualTo(audioExpected)); + } + } + [Test] [TestCase(true)] [TestCase(false)] diff --git a/.dotnet/tests/Chat/ChatTests.cs b/.dotnet/tests/Chat/ChatTests.cs index b49439678..6a4120237 100644 --- a/.dotnet/tests/Chat/ChatTests.cs +++ b/.dotnet/tests/Chat/ChatTests.cs @@ -93,8 +93,6 @@ public void StreamingChat() latestTokenReceiptTime = stopwatch.Elapsed; usage ??= chatUpdate.Usage; updateCount++; - - Console.WriteLine(stopwatch.Elapsed.TotalMilliseconds); } stopwatch.Stop(); @@ -368,6 +366,88 @@ public async Task ChatWithVision() Assert.That(result.Value.Content[0].Text.ToLowerInvariant(), Does.Contain("dog").Or.Contain("cat").IgnoreCase); } + [Test] + public async Task ChatWithAudio() + { + ChatClient client = GetTestClient(TestScenario.Chat, "gpt-4o-audio-preview"); + + string helloWorldAudioPath = Path.Join("Assets", "audio_hello_world.mp3"); + BinaryData helloWorldAudioBytes = BinaryData.FromBytes(File.ReadAllBytes(helloWorldAudioPath)); + ChatMessageContentPart helloWorldAudioContentPart = ChatMessageContentPart.CreateInputAudioPart( + helloWorldAudioBytes, + ChatInputAudioFormat.Mp3); + string whatsTheWeatherAudioPath = Path.Join("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav"); + BinaryData whatsTheWeatherAudioBytes = BinaryData.FromBytes(File.ReadAllBytes(whatsTheWeatherAudioPath)); + ChatMessageContentPart whatsTheWeatherAudioContentPart = ChatMessageContentPart.CreateInputAudioPart( + whatsTheWeatherAudioBytes, + ChatInputAudioFormat.Wav); + + List messages = [new UserChatMessage([helloWorldAudioContentPart])]; + + ChatCompletionOptions options = new() + { + ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio, + AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Pcm16) + }; + + ChatCompletion completion = await client.CompleteChatAsync(messages, options); + Assert.That(completion, Is.Not.Null); + Assert.That(completion.Content, Has.Count.EqualTo(0)); + + ChatOutputAudio outputAudio = completion.OutputAudio; + Assert.That(outputAudio, Is.Not.Null); + Assert.That(outputAudio.Id, Is.Not.Null.And.Not.Empty); + Assert.That(outputAudio.AudioBytes, Is.Not.Null); + Assert.That(outputAudio.Transcript, Is.Not.Null.And.Not.Empty); + + AssistantChatMessage audioHistoryMessage = ChatMessage.CreateAssistantMessage(completion); + Assert.That(audioHistoryMessage, Is.InstanceOf()); + Assert.That(audioHistoryMessage.Content, Has.Count.EqualTo(0)); + + Assert.That(audioHistoryMessage.OutputAudioReference?.Id, Is.EqualTo(completion.OutputAudio.Id)); + messages.Add(audioHistoryMessage); + + messages.Add( + new UserChatMessage( + [ + "Please answer the following spoken question:", + ChatMessageContentPart.CreateInputAudioPart(whatsTheWeatherAudioBytes, ChatInputAudioFormat.Wav), + ])); + + string streamedCorrelationId = null; + DateTimeOffset? streamedExpiresAt = null; + StringBuilder streamedTranscriptBuilder = new(); + using MemoryStream outputAudioStream = new(); + await foreach (StreamingChatCompletionUpdate update in client.CompleteChatStreamingAsync(messages, options)) + { + Assert.That(update.ContentUpdate, Has.Count.EqualTo(0)); + StreamingChatOutputAudioUpdate outputAudioUpdate = update.OutputAudioUpdate; + + if (outputAudioUpdate is not null) + { + string serializedOutputAudioUpdate = ModelReaderWriter.Write(outputAudioUpdate).ToString(); + Assert.That(serializedOutputAudioUpdate, Is.Not.Null.And.Not.Empty); + + if (outputAudioUpdate.Id is not null) + { + Assert.That(streamedCorrelationId, Is.Null.Or.EqualTo(streamedCorrelationId)); + streamedCorrelationId ??= outputAudioUpdate.Id; + } + if (outputAudioUpdate.ExpiresAt.HasValue) + { + Assert.That(streamedExpiresAt.HasValue, Is.False); + streamedExpiresAt = outputAudioUpdate.ExpiresAt; + } + streamedTranscriptBuilder.Append(outputAudioUpdate.TranscriptUpdate); + outputAudioStream.Write(outputAudioUpdate.AudioBytesUpdate); + } + } + Assert.That(streamedCorrelationId, Is.Not.Null.And.Not.Empty); + Assert.That(streamedExpiresAt.HasValue, Is.True); + Assert.That(streamedTranscriptBuilder.ToString(), Is.Not.Null.And.Not.Empty); + Assert.That(outputAudioStream.Length, Is.GreaterThan(9000)); + } + [Test] public async Task AuthFailure() { diff --git a/.dotnet/tests/Utility/TestHelpers.cs b/.dotnet/tests/Utility/TestHelpers.cs index a781a91a2..2bd540023 100644 --- a/.dotnet/tests/Utility/TestHelpers.cs +++ b/.dotnet/tests/Utility/TestHelpers.cs @@ -17,6 +17,7 @@ using System.Collections.Generic; using System.IO; using System.Linq; +using System.Text.RegularExpressions; [assembly: LevelOfParallelism(8)] @@ -49,7 +50,7 @@ public static T GetTestClient(TestScenario scenario, string overrideModel = n { options ??= new(); ApiKeyCredential credential = new(Environment.GetEnvironmentVariable("OPENAI_API_KEY")); - options.AddPolicy(GetDumpPolicy(), PipelinePosition.PerTry); + options.AddPolicy(GetDumpPolicy(), PipelinePosition.BeforeTransport); object clientObject = scenario switch { #pragma warning disable OPENAI001 @@ -81,36 +82,48 @@ private static PipelinePolicy GetDumpPolicy() { return new TestPipelinePolicy((message) => { - Console.WriteLine($"--- New request ---"); - IEnumerable headerPairs = message?.Request?.Headers?.Select(header => $"{header.Key}={(header.Key.ToLower().Contains("auth") ? "***" : header.Value)}"); - string headers = string.Join(',', headerPairs); - Console.WriteLine($"Headers: {headers}"); - Console.WriteLine($"{message?.Request?.Method} URI: {message?.Request?.Uri}"); - if (message.Request?.Content != null) + if (message.Request is not null && message.Response is null) { - string contentType = "Unknown Content Type"; - if (message.Request.Headers?.TryGetValue("Content-Type", out contentType) == true - && contentType == "application/json") + Console.WriteLine($"--- New request ---"); + IEnumerable headerPairs = message?.Request?.Headers?.Select(header => $"{header.Key}={(header.Key.ToLower().Contains("auth") ? "***" : header.Value)}"); + string headers = string.Join(',', headerPairs); + Console.WriteLine($"Headers: {headers}"); + Console.WriteLine($"{message?.Request?.Method} URI: {message?.Request?.Uri}"); + if (message.Request?.Content != null) { - using MemoryStream stream = new(); - message.Request.Content.WriteTo(stream, default); - stream.Position = 0; - using StreamReader reader = new(stream); - Console.WriteLine(reader.ReadToEnd()); - } - else - { - string length = message.Request.Content.TryComputeLength(out long numberLength) - ? $"{numberLength} bytes" - : "unknown length"; - Console.WriteLine($"<< Non-JSON content: {contentType} >> {length}"); + string contentType = "Unknown Content Type"; + if (message.Request.Headers?.TryGetValue("Content-Type", out contentType) == true + && contentType == "application/json") + { + using MemoryStream stream = new(); + message.Request.Content.WriteTo(stream, default); + stream.Position = 0; + using StreamReader reader = new(stream); + string requestDump = reader.ReadToEnd(); + requestDump = Regex.Replace(requestDump, @"""data"":[\\w\\r\\n]*""[^""]*""", @"""data"":""..."""); + Console.WriteLine(requestDump); + } + else + { + string length = message.Request.Content.TryComputeLength(out long numberLength) + ? $"{numberLength} bytes" + : "unknown length"; + Console.WriteLine($"<< Non-JSON content: {contentType} >> {length}"); + } } } if (message.Response != null) { - Console.WriteLine("--- Begin response content ---"); - Console.WriteLine(message.Response.Content?.ToString()); - Console.WriteLine("--- End of response content ---"); + if (message.BufferResponse) + { + Console.WriteLine("--- Begin response content ---"); + Console.WriteLine(message.Response.Content?.ToString()); + Console.WriteLine("--- End of response content ---"); + } + else + { + Console.WriteLine("--- Response (unbuffered, content not rendered) ---"); + } } }); } diff --git a/.dotnet/tests/Utility/TestPipelinePolicy.cs b/.dotnet/tests/Utility/TestPipelinePolicy.cs index 688c407ed..c596295f4 100644 --- a/.dotnet/tests/Utility/TestPipelinePolicy.cs +++ b/.dotnet/tests/Utility/TestPipelinePolicy.cs @@ -17,19 +17,15 @@ public TestPipelinePolicy(Action processMessageAction) public override void Process(PipelineMessage message, IReadOnlyList pipeline, int currentIndex) { - _processMessageAction(message); - if (currentIndex < pipeline.Count - 1) - { - pipeline[currentIndex + 1].Process(message, pipeline, currentIndex + 1); - } + _processMessageAction(message); // for request + ProcessNext(message, pipeline, currentIndex); + _processMessageAction(message); // for response } public override async ValueTask ProcessAsync(PipelineMessage message, IReadOnlyList pipeline, int currentIndex) { - _processMessageAction(message); - if (currentIndex < pipeline.Count - 1) - { - await pipeline[currentIndex + 1].ProcessAsync(message, pipeline, currentIndex + 1); - } + _processMessageAction(message); // for request + await ProcessNextAsync(message, pipeline, currentIndex); + _processMessageAction(message); // for response } } \ No newline at end of file diff --git a/.openapi3/openapi3-openai.yaml b/.openapi3/openapi3-openai.yaml index faf6a1a53..2f373d995 100644 --- a/.openapi3/openapi3-openai.yaml +++ b/.openapi3/openapi3-openai.yaml @@ -3820,6 +3820,19 @@ components: parameters: $ref: '#/components/schemas/FunctionParameters' deprecated: true + ChatCompletionMessageAudioChunk: + type: object + properties: + id: + type: string + transcript: + type: string + data: + type: string + format: base64 + expires_at: + type: integer + format: unixtime ChatCompletionMessageToolCall: type: object required: @@ -3878,6 +3891,24 @@ components: items: $ref: '#/components/schemas/ChatCompletionMessageToolCall' description: The tool calls generated by the model, such as function calls. + ChatCompletionModalities: + type: array + items: + type: string + enum: + - text + - audio + description: |- + Output types that you would like the model to generate for this request. + Most models are capable of generating text, which is the default: + + `["text"]` + + The `gpt-4o-audio-preview` model can also be used to [generate audio](/docs/guides/audio). To + request that this model generate both text and audio responses, you can + use: + + `["text", "audio"]` ChatCompletionNamedToolChoice: type: object required: @@ -3924,6 +3955,19 @@ components: name: type: string description: An optional name for the participant. Provides the model information to differentiate between participants of the same role. + audio: + type: object + properties: + id: + type: string + description: Unique identifier for a previous audio response from the model. + required: + - id + nullable: true + description: |- + Data about a previous audio response from the model. + [Learn more](/docs/guides/audio). + x-oaiExpandable: true tool_calls: $ref: '#/components/schemas/ChatCompletionMessageToolCallsItem' function_call: @@ -3985,6 +4029,34 @@ components: tool: '#/components/schemas/ChatCompletionRequestToolMessage' function: '#/components/schemas/ChatCompletionRequestFunctionMessage' x-oaiExpandable: true + ChatCompletionRequestMessageContentPartAudio: + type: object + required: + - type + - input_audio + properties: + type: + type: string + enum: + - input_audio + description: The type of the content part. Always `input_audio`. + input_audio: + type: object + properties: + data: + type: string + format: base64 + description: Base64 encoded audio data. + format: + type: string + enum: + - wav + - mp3 + description: The format of the encoded audio data. Currently supports "wav" and "mp3". + required: + - data + - format + description: Learn about [audio inputs](/docs/guides/audio). ChatCompletionRequestMessageContentPartImage: type: object required: @@ -4013,6 +4085,7 @@ components: default: auto required: - url + description: Learn about [image inputs](/docs/guides/vision). ChatCompletionRequestMessageContentPartRefusal: type: object required: @@ -4041,6 +4114,7 @@ components: text: type: string description: The text content. + description: Learn about [text inputs](/docs/guides/text-generation). ChatCompletionRequestSystemMessage: type: object required: @@ -4126,6 +4200,7 @@ components: anyOf: - $ref: '#/components/schemas/ChatCompletionRequestMessageContentPartText' - $ref: '#/components/schemas/ChatCompletionRequestMessageContentPartImage' + - $ref: '#/components/schemas/ChatCompletionRequestMessageContentPartAudio' x-oaiExpandable: true ChatCompletionResponseMessage: type: object @@ -4162,6 +4237,38 @@ components: - arguments description: Deprecated and replaced by `tool_calls`. The name and arguments of a function that should be called, as generated by the model. deprecated: true + audio: + type: object + properties: + id: + type: string + description: Unique identifier for this audio response. + expires_at: + type: integer + format: unixtime + description: |- + The Unix timestamp (in seconds) for when this audio response will + no longer be accessible on the server for use in multi-turn + conversations. + data: + type: string + format: base64 + description: |- + Base64 encoded audio bytes generated by the model, in the format + specified in the request. + transcript: + type: string + description: Transcript of the audio generated by the model. + required: + - id + - expires_at + - data + - transcript + nullable: true + description: |- + If the audio output modality is requested, this object contains data + about the audio response from the model. [Learn more](/docs/guides/audio). + x-oaiExpandable: true description: A chat completion message generated by the model. ChatCompletionRole: type: string @@ -4182,6 +4289,10 @@ components: ChatCompletionStreamResponseDelta: type: object properties: + audio: + allOf: + - $ref: '#/components/schemas/ChatCompletionMessageAudioChunk' + description: Response audio associated with the streaming chat delta payload. content: type: string nullable: true @@ -4609,7 +4720,11 @@ components: items: $ref: '#/components/schemas/ChatCompletionRequestMessage' minItems: 1 - description: A list of messages comprising the conversation so far. Depending on the [model](/docs/models) you use, different message types (modalities) are supported, like [text](/docs/guides/text-generation), [images](/docs/guides/vision), and audio. + description: |- + A list of messages comprising the conversation so far. Depending on the + [model](/docs/models) you use, different message types (modalities) are + supported, like [text](/docs/guides/text-generation), + [images](/docs/guides/vision), and [audio](/docs/guides/audio). model: anyOf: - type: string @@ -4624,6 +4739,8 @@ components: - gpt-4o-2024-05-13 - gpt-4o-realtime-preview - gpt-4o-realtime-preview-2024-10-01 + - gpt-4o-audio-preview + - gpt-4o-audio-preview-2024-10-01 - chatgpt-4o-latest - gpt-4o-mini - gpt-4o-mini-2024-07-18 @@ -4720,6 +4837,45 @@ components: maximum: 128 description: How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep `n` as `1` to minimize costs. default: 1 + modalities: + type: object + allOf: + - $ref: '#/components/schemas/ChatCompletionModalities' + nullable: true + audio: + type: object + properties: + voice: + type: string + enum: + - alloy + - echo + - fable + - onyx + - nova + - shimmer + description: |- + Specifies the voice type. Supported voices are `alloy`, `echo`, + `fable`, `onyx`, `nova`, and `shimmer`. + format: + type: string + enum: + - wav + - mp3 + - flac + - opus + - pcm16 + description: |- + Specifies the output audio format. Must be one of `wav`, `mp3`, `flac`, + `opus`, or `pcm16`. + required: + - voice + - format + nullable: true + description: |- + Parameters for audio output. Required when audio output is requested with + `modalities: ["audio"]`. [Learn more](/docs/guides/audio). + x-oaiExpandable: true presence_penalty: type: number format: float @@ -5053,6 +5209,7 @@ components: - completion_tokens - prompt_tokens - total_tokens + nullable: true description: |- An optional field that will only be present when you set `stream_options: {"include_usage": true}` in your request. When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request. @@ -9061,17 +9218,17 @@ components: type: string enum: - conversation.item.create - description: The event type, must be "conversation.item.create". + description: The event type, must be `conversation.item.create`. previous_item_id: type: string - description: The ID of the preceding item after which the new item will be inserted. + description: The ID of the preceding item after which the new item will be inserted. If not set, the new item will be appended to the end of the conversation. If set, it allows an item to be inserted mid-conversation. If the ID cannot be found, an error will be returned and the item will not be added. item: - allOf: - - $ref: '#/components/schemas/RealtimeRequestItem' - description: The item to add to the conversation. + $ref: '#/components/schemas/RealtimeConversationRequestItem' allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event when adding an item to the conversation. + description: |- + Add a new Item to the Conversation's context, including messages, function calls, and function call responses. This event can be used both to populate a "history" of the conversation and to add new items mid-stream, but has the current limitation that it cannot populate assistant audio messages. + If successful, the server will respond with a `conversation.item.created` event, otherwise an `error` event will be sent. RealtimeClientEventConversationItemDelete: type: object required: @@ -9088,7 +9245,7 @@ components: description: The ID of the item to delete. allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event when you want to remove any item from the conversation history. + description: Send this event when you want to remove any item from the conversation history. The server will respond with a `conversation.item.deleted` event, unless the item does not exist in the conversation history, in which case the server will respond with an error. RealtimeClientEventConversationItemTruncate: type: object required: @@ -9104,18 +9261,21 @@ components: description: The event type, must be "conversation.item.truncate". item_id: type: string - description: The ID of the assistant message item to truncate. + description: The ID of the assistant message item to truncate. Only assistant message items can be truncated. content_index: type: integer format: int32 - description: The index of the content part to truncate. + description: The index of the content part to truncate. Set this to 0. audio_end_ms: type: integer format: int32 - description: Inclusive duration up to which audio is truncated, in milliseconds. + description: Inclusive duration up to which audio is truncated, in milliseconds. If the audio_end_ms is greater than the actual audio duration, the server will respond with an error. allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event when you want to truncate a previous assistant message’s audio. + description: |- + Send this event to truncate a previous assistant message’s audio. The server will produce audio faster than realtime, so this event is useful when the user interrupts to truncate audio that has already been sent to the client but not yet played. This will synchronize the server's understanding of the audio with the client's playback. + Truncating audio will delete the server-side text transcript to ensure there is not text in the context that hasn't been heard by the user. + If successful, the server will respond with a `conversation.item.truncated` event. RealtimeClientEventInputAudioBufferAppend: type: object required: @@ -9130,10 +9290,12 @@ components: audio: type: string format: base64 - description: Base64-encoded audio bytes. + description: Base64-encoded audio bytes. This must be in the format specified by the `input_audio_format` field in the session configuration. allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event to append audio bytes to the input audio buffer. + description: |- + Send this event to append audio bytes to the input audio buffer. The audio buffer is temporary storage you can write to and later commit. In Server VAD mode, the audio buffer is used to detect speech and the server will decide when to commit. When Server VAD is disabled, you must commit the audio buffer manually. + The client may choose how much audio to place in each event up to a maximum of 15 MiB, for example streaming smaller chunks from the client may allow the VAD to be more responsive. Unlike made other client events, the server will not send a confirmation response to this event. RealtimeClientEventInputAudioBufferClear: type: object required: @@ -9146,7 +9308,7 @@ components: description: The event type, must be "input_audio_buffer.clear". allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event to clear the audio bytes in the buffer. + description: Send this event to clear the audio bytes in the buffer. The server will respond with an `input_audio_buffer.cleared` event. RealtimeClientEventInputAudioBufferCommit: type: object required: @@ -9159,7 +9321,9 @@ components: description: The event type, must be "input_audio_buffer.commit". allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event to commit audio bytes to a user message. + description: |- + Send this event to commit the user input audio buffer, which will create a new user message item in the conversation. This event will produce an error if the input audio buffer is empty. When in Server VAD mode, the client does not need to send this event, the server will commit the audio buffer automatically. + Committing the input audio buffer will trigger input audio transcription (if enabled in session configuration), but it will not create a response from the model. The server will respond with an `input_audio_buffer.committed` event. RealtimeClientEventResponseCancel: type: object required: @@ -9169,10 +9333,10 @@ components: type: string enum: - response.cancel - description: The event type, must be "response.cancel". + description: The event type, must be `response.cancel`. allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event to cancel an in-progress response. + description: Send this event to cancel an in-progress response. The server will respond with a `response.cancelled` event or an error if there is no response to cancel. RealtimeClientEventResponseCreate: type: object required: @@ -9183,48 +9347,16 @@ components: type: string enum: - response.create - description: The event type, must be "response.create". + description: The event type, must be `response.create`. response: - type: object - properties: - modalities: - type: array - items: - type: string - description: The modalities for the response. - instructions: - type: string - description: Instructions for the model. - voice: - type: string - description: The voice the model uses to respond - one of `alloy`, `echo`, or `shimmer`. - output_audio_format: - type: string - description: The format of output audio. - tools: - type: array - items: - $ref: '#/components/schemas/RealtimeTool' - description: Tools (functions) available to the model. - tool_choice: - type: string - description: How the model chooses tools. - temperature: - type: number - format: float - description: Sampling temperature. - max_output_tokens: - anyOf: - - type: integer - format: int32 - - type: string - enum: - - inf - description: Maximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or "inf" for the maximum available tokens for a given model. Defaults to "inf". - description: Configuration for the response. + $ref: '#/components/schemas/RealtimeResponseOptions' allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event to trigger a response generation. + description: |- + This event instructs the server to create a Response, which means triggering model inference. When in Server VAD mode, the server will create Responses automatically. + A Response will include at least one Item, and may have two, in which case the second will be a function call. These Items will be appended to the conversation history. + The server will respond with a `response.created` event, events for Items and content created, and finally a `response.done` event to indicate the Response is complete. + The `response.create` event includes inference configuration like `instructions`, and `temperature`. These fields will override the Session's configuration for this Response only. RealtimeClientEventSessionUpdate: type: object required: @@ -9237,12 +9369,10 @@ components: - session.update description: The event type, must be "session.update". session: - allOf: - - $ref: '#/components/schemas/RealtimeRequestSession' - description: Session configuration to update. + $ref: '#/components/schemas/RealtimeRequestSession' allOf: - $ref: '#/components/schemas/RealtimeClientEvent' - description: Send this event to update the session’s default configuration. + description: Send this event to update the session’s default configuration. The client may send this event at any time to update the session configuration, and any field may be updated at any time, except for "voice". The server will respond with a `session.updated` event that shows the full effective configuration. Only fields that are present are updated, thus the correct way to clear a field like "instructions" is to pass an empty string. RealtimeClientEventType: anyOf: - type: string @@ -9280,6 +9410,46 @@ components: - input_audio - text - audio + RealtimeConversationItemBase: + type: object + description: The item to add to the conversation. + RealtimeConversationRequestItem: + type: object + required: + - type + properties: + type: + $ref: '#/components/schemas/RealtimeItemType' + id: + type: string + discriminator: + propertyName: type + mapping: + message: '#/components/schemas/RealtimeRequestMessageItem' + function_call: '#/components/schemas/RealtimeRequestFunctionCallItem' + function_call_output: '#/components/schemas/RealtimeRequestFunctionCallOutputItem' + RealtimeConversationResponseItem: + type: object + required: + - object + - type + - id + properties: + object: + type: string + enum: + - realtime.item + type: + $ref: '#/components/schemas/RealtimeItemType' + id: + type: string + nullable: true + discriminator: + propertyName: type + mapping: + message: '#/components/schemas/RealtimeResponseMessageItem' + function_call: '#/components/schemas/RealtimeResponseFunctionCallItem' + function_call_output: '#/components/schemas/RealtimeResponseFunctionCallOutputItem' RealtimeFunctionTool: type: object required: @@ -9372,7 +9542,7 @@ components: status: $ref: '#/components/schemas/RealtimeItemStatus' allOf: - - $ref: '#/components/schemas/RealtimeRequestItem' + - $ref: '#/components/schemas/RealtimeConversationRequestItem' RealtimeRequestFunctionCallOutputItem: type: object required: @@ -9389,22 +9559,7 @@ components: output: type: string allOf: - - $ref: '#/components/schemas/RealtimeRequestItem' - RealtimeRequestItem: - type: object - required: - - type - properties: - type: - $ref: '#/components/schemas/RealtimeItemType' - id: - type: string - discriminator: - propertyName: type - mapping: - message: '#/components/schemas/RealtimeRequestMessageItem' - function_call: '#/components/schemas/RealtimeRequestFunctionCallItem' - function_call_output: '#/components/schemas/RealtimeRequestFunctionCallOutputItem' + - $ref: '#/components/schemas/RealtimeConversationRequestItem' RealtimeRequestMessageItem: type: object required: @@ -9426,7 +9581,7 @@ components: user: '#/components/schemas/RealtimeRequestUserMessageItem' assistant: '#/components/schemas/RealtimeRequestAssistantMessageItem' allOf: - - $ref: '#/components/schemas/RealtimeRequestItem' + - $ref: '#/components/schemas/RealtimeConversationRequestItem' RealtimeRequestMessageReferenceItem: type: object required: @@ -9559,7 +9714,7 @@ components: output: type: array items: - $ref: '#/components/schemas/RealtimeResponseItem' + $ref: '#/components/schemas/RealtimeConversationResponseItem' usage: type: object properties: @@ -9621,6 +9776,9 @@ components: nullable: true allOf: - $ref: '#/components/schemas/RealtimeContentPart' + RealtimeResponseBase: + type: object + description: The response resource. RealtimeResponseFunctionCallItem: type: object required: @@ -9643,7 +9801,7 @@ components: status: $ref: '#/components/schemas/RealtimeItemStatus' allOf: - - $ref: '#/components/schemas/RealtimeResponseItem' + - $ref: '#/components/schemas/RealtimeConversationResponseItem' RealtimeResponseFunctionCallOutputItem: type: object required: @@ -9660,29 +9818,7 @@ components: output: type: string allOf: - - $ref: '#/components/schemas/RealtimeResponseItem' - RealtimeResponseItem: - type: object - required: - - object - - type - - id - properties: - object: - type: string - enum: - - realtime.item - type: - $ref: '#/components/schemas/RealtimeItemType' - id: - type: string - nullable: true - discriminator: - propertyName: type - mapping: - message: '#/components/schemas/RealtimeResponseMessageItem' - function_call: '#/components/schemas/RealtimeResponseFunctionCallItem' - function_call_output: '#/components/schemas/RealtimeResponseFunctionCallOutputItem' + - $ref: '#/components/schemas/RealtimeConversationResponseItem' RealtimeResponseMessageItem: type: object required: @@ -9705,7 +9841,50 @@ components: status: $ref: '#/components/schemas/RealtimeItemStatus' allOf: - - $ref: '#/components/schemas/RealtimeResponseItem' + - $ref: '#/components/schemas/RealtimeConversationResponseItem' + RealtimeResponseOptions: + type: object + properties: + modalities: + type: array + items: + type: string + enum: + - text + - audio + description: The modalities for the response. + instructions: + type: string + description: Instructions for the model. + voice: + allOf: + - $ref: '#/components/schemas/RealtimeVoice' + description: The voice the model uses to respond - one of `alloy`, `echo`, or `shimmer`. + output_audio_format: + allOf: + - $ref: '#/components/schemas/RealtimeAudioFormat' + description: The format of output audio. + tools: + type: array + items: + $ref: '#/components/schemas/RealtimeTool' + description: Tools (functions) available to the model. + tool_choice: + allOf: + - $ref: '#/components/schemas/RealtimeToolChoice' + description: How the model chooses tools. + temperature: + type: number + format: float + description: Sampling temperature. + max_output_tokens: + anyOf: + - type: integer + format: int32 + - type: string + enum: + - inf + description: Maximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or "inf" for the maximum available tokens for a given model. Defaults to "inf". RealtimeResponseSession: type: object required: @@ -9881,17 +10060,19 @@ components: type: string enum: - conversation.item.created - description: The event type, must be "conversation.item.created". + description: The event type, must be `conversation.item.created`. previous_item_id: type: string - description: The ID of the preceding item. + description: The ID of the preceding item in the Conversation context, allows the client to understand the order of the conversation. item: - allOf: - - $ref: '#/components/schemas/RealtimeResponseItem' - description: The item that was created. + $ref: '#/components/schemas/RealtimeConversationResponseItem' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when a conversation item is created. + description: |- + Returned when a conversation item is created. There are several scenarios that produce this event: + - The server is generating a Response, which if successful will produce either one or two Items, which will be of type `message` (role `assistant`) or type `function_call`. + - The input audio buffer has been committed, either by the client or the server (in `server_vad` mode). The server will take the content of the input audio buffer and add it to a new user message Item. + - The client has sent a `conversation.item.create` event to add a new Item to the Conversation. RealtimeServerEventConversationItemDeleted: type: object required: @@ -9902,13 +10083,13 @@ components: type: string enum: - conversation.item.deleted - description: The event type, must be "conversation.item.deleted". + description: The event type, must be `conversation.item.deleted`. item_id: type: string description: The ID of the item that was deleted. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when an item in the conversation is deleted. + description: Returned when an item in the conversation is deleted by the client with a `conversation.item.delete` event. This event is used to synchronize the server's understanding of the conversation history with the client's view. RealtimeServerEventConversationItemInputAudioTranscriptionCompleted: type: object required: @@ -9921,10 +10102,10 @@ components: type: string enum: - conversation.item.input_audio_transcription.completed - description: The event type, must be "conversation.item.input_audio_transcription.completed". + description: The event type, must be `conversation.item.input_audio_transcription.completed`. item_id: type: string - description: The ID of the user message item. + description: The ID of the user message item containing the audio. content_index: type: integer format: int32 @@ -9934,7 +10115,9 @@ components: description: The transcribed text. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when input audio transcription is enabled and a transcription succeeds. + description: |- + This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in `server_vad` mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. + Realtime API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model, currently always `whisper-1`. Thus the transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide. RealtimeServerEventConversationItemInputAudioTranscriptionFailed: type: object required: @@ -9947,7 +10130,7 @@ components: type: string enum: - conversation.item.input_audio_transcription.failed - description: The event type, must be "conversation.item.input_audio_transcription.failed". + description: The event type, must be `conversation.item.input_audio_transcription.failed`. item_id: type: string description: The ID of the user message item. @@ -9973,7 +10156,7 @@ components: description: Details of the transcription error. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when input audio transcription is configured, and a transcription request for a user message failed. + description: Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other `error` events so that the client can identify the related Item. RealtimeServerEventConversationItemTruncated: type: object required: @@ -9986,7 +10169,7 @@ components: type: string enum: - conversation.item.truncated - description: The event type, must be "conversation.item.truncated". + description: The event type, must be `conversation.item.truncated`. item_id: type: string description: The ID of the assistant message item that was truncated. @@ -10000,7 +10183,9 @@ components: description: The duration up to which the audio was truncated, in milliseconds. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when an earlier assistant audio message item is truncated by the client. + description: |- + Returned when an earlier assistant audio message item is truncated by the client with a `conversation.item.truncate` event. This event is used to synchronize the server's understanding of the audio with the client's playback. + This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user. RealtimeServerEventError: type: object required: @@ -10033,7 +10218,7 @@ components: description: Details of the error. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when an error occurs. + description: Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default. RealtimeServerEventInputAudioBufferCleared: type: object required: @@ -10043,10 +10228,10 @@ components: type: string enum: - input_audio_buffer.cleared - description: The event type, must be "input_audio_buffer.cleared". + description: The event type, must be `input_audio_buffer.cleared`. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when the input audio buffer is cleared by the client. + description: Returned when the input audio buffer is cleared by the client with a `input_audio_buffer.clear` event. RealtimeServerEventInputAudioBufferCommitted: type: object required: @@ -10058,7 +10243,7 @@ components: type: string enum: - input_audio_buffer.committed - description: The event type, must be "input_audio_buffer.committed". + description: The event type, must be `input_audio_buffer.committed`. previous_item_id: type: string description: The ID of the preceding item after which the new item will be inserted. @@ -10067,7 +10252,7 @@ components: description: The ID of the user message item that will be created. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. + description: Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The `item_id` property is the ID of the user message item that will be created, thus a `conversation.item.created` event will also be sent to the client. RealtimeServerEventInputAudioBufferSpeechStarted: type: object required: @@ -10079,17 +10264,17 @@ components: type: string enum: - input_audio_buffer.speech_started - description: The event type, must be "input_audio_buffer.speech_started". + description: The event type, must be `input_audio_buffer.speech_started`. audio_start_ms: type: integer format: int32 - description: Milliseconds since the session started when speech was detected. + description: Milliseconds from the start of all audio written to the buffer during the session when speech was first detected. This will correspond to the beginning of audio sent to the model, and thus includes the `prefix_padding_ms` configured in the Session. item_id: type: string description: The ID of the user message item that will be created when speech stops. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned in server turn detection mode when speech is detected. + description: Sent by the server when in `server_vad` mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user. The client should expect to receive a `input_audio_buffer.speech_stopped` event when speech stops. The `item_id` property is the ID of the user message item that will be created when speech stops and will also be included in the `input_audio_buffer.speech_stopped` event (unless the client manually commits the audio buffer during VAD activation). RealtimeServerEventInputAudioBufferSpeechStopped: type: object required: @@ -10101,17 +10286,17 @@ components: type: string enum: - input_audio_buffer.speech_stopped - description: The event type, must be "input_audio_buffer.speech_stopped". + description: The event type, must be `input_audio_buffer.speech_stopped`. audio_end_ms: type: integer format: int32 - description: Milliseconds since the session started when speech stopped. + description: Milliseconds since the session started when speech stopped. This will correspond to the end of audio sent to the model, and thus includes the `min_silence_duration_ms` configured in the Session. item_id: type: string description: The ID of the user message item that will be created. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned in server turn detection mode when speech stops. + description: Returned in `server_vad` mode when the server detects the end of speech in the audio buffer. The server will also send an `conversation.item.created` event with the user message item that is created from the audio buffer. RealtimeServerEventRateLimitsUpdated: type: object required: @@ -10122,7 +10307,7 @@ components: type: string enum: - rate_limits.updated - description: The event type, must be "rate_limits.updated". + description: The event type, must be `rate_limits.updated`. rate_limits: type: array items: @@ -10130,7 +10315,7 @@ components: description: List of rate limit information. allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Emitted after every "response.done" event to indicate the updated rate limits. + description: Emitted at the beginning of a Response to indicate the updated rate limits. When a Response is created some tokens will be "reserved" for the output tokens, the rate limits shown here reflect that reservation, which is then adjusted accordingly once the Response is completed. RealtimeServerEventRateLimitsUpdatedRateLimitsItem: type: object required: @@ -10373,14 +10558,12 @@ components: type: string enum: - response.created - description: The event type, must be "response.created". + description: The event type, must be `response.created`. response: - allOf: - - $ref: '#/components/schemas/RealtimeResponse' - description: The response resource. + $ref: '#/components/schemas/RealtimeResponse' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when a new Response is created. The first event of response creation, where the response is in an initial state of "in_progress". + description: Returned when a new Response is created. The first event of response creation, where the response is in an initial state of `in_progress`. RealtimeServerEventResponseDone: type: object required: @@ -10393,12 +10576,10 @@ components: - response.done description: The event type, must be "response.done". response: - allOf: - - $ref: '#/components/schemas/RealtimeResponse' - description: The response resource. + $ref: '#/components/schemas/RealtimeResponse' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when a Response is done streaming. Always emitted, no matter the final state. + description: Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the `response.done` event will include all output Items in the Response but will omit the raw audio data. RealtimeServerEventResponseFunctionCallArgumentsDelta: type: object required: @@ -10479,21 +10660,19 @@ components: type: string enum: - response.output_item.added - description: The event type, must be "response.output_item.added". + description: The event type, must be `response.output_item.added`. response_id: type: string - description: The ID of the response to which the item belongs. + description: The ID of the Response to which the item belongs. output_index: type: integer format: int32 - description: The index of the output item in the response. + description: The index of the output item in the Response. item: - allOf: - - $ref: '#/components/schemas/RealtimeResponseItem' - description: The item that was added. + $ref: '#/components/schemas/RealtimeConversationResponseItem' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when a new Item is created during response generation. + description: Returned when a new Item is created during Response generation. RealtimeServerEventResponseOutputItemDone: type: object required: @@ -10506,18 +10685,16 @@ components: type: string enum: - response.output_item.done - description: The event type, must be "response.output_item.done". + description: The event type, must be `response.output_item.done`. response_id: type: string - description: The ID of the response to which the item belongs. + description: The ID of the Response to which the item belongs. output_index: type: integer format: int32 - description: The index of the output item in the response. + description: The index of the output item in the Response. item: - allOf: - - $ref: '#/components/schemas/RealtimeResponseItem' - description: The completed item. + $ref: '#/components/schemas/RealtimeConversationResponseItem' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' description: Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled. @@ -10601,14 +10778,12 @@ components: type: string enum: - session.created - description: The event type, must be "session.created". + description: The event type, must be `session.created`. session: - allOf: - - $ref: '#/components/schemas/RealtimeResponseSession' - description: The session resource. + $ref: '#/components/schemas/RealtimeResponseSession' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when a session is created. Emitted automatically when a new connection is established. + description: Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration. RealtimeServerEventSessionUpdated: type: object required: @@ -10621,12 +10796,10 @@ components: - session.updated description: The event type, must be "session.updated". session: - allOf: - - $ref: '#/components/schemas/RealtimeResponseSession' - description: The updated session resource. + $ref: '#/components/schemas/RealtimeResponseSession' allOf: - $ref: '#/components/schemas/RealtimeServerEvent' - description: Returned when a session is updated. + description: Returned when a session is updated with a `session.update` event, unless there is an error. RealtimeServerEventType: anyOf: - type: string @@ -10681,6 +10854,9 @@ components: format: duration allOf: - $ref: '#/components/schemas/RealtimeTurnDetection' + RealtimeSessionBase: + type: object + description: Realtime session object configuration. RealtimeTool: type: object required: diff --git a/.scripts/Edit-Serialization.ps1 b/.scripts/Edit-Serialization.ps1 index 3a74cbeaa..0d1c06ffb 100644 --- a/.scripts/Edit-Serialization.ps1 +++ b/.scripts/Edit-Serialization.ps1 @@ -9,6 +9,7 @@ Update-In-File-With-Retry ` "return new InternalChatCompletionResponseMessage\(" " refusal," " toolCalls \?\? new ChangeTrackingList\(\)," + " audio," " role," " content," " functionCall," @@ -19,6 +20,7 @@ Update-In-File-With-Retry ` "return new InternalChatCompletionResponseMessage(" " refusal," " toolCalls ?? new ChangeTrackingList()," + " audio," " role," " content ?? new ChatMessageContent()," " functionCall," @@ -43,6 +45,7 @@ Update-In-File-With-Retry ` -FilePath "$directory\InternalChatCompletionStreamResponseDelta.Serialization.cs" ` -SearchPatternLines @( "return new InternalChatCompletionStreamResponseDelta\(" + " audio," " functionCall," " toolCalls \?\? new ChangeTrackingList\(\)," " refusal," @@ -53,6 +56,7 @@ Update-In-File-With-Retry ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." "return new InternalChatCompletionStreamResponseDelta(" + " audio," " functionCall," " toolCalls ?? new ChangeTrackingList()," " refusal," @@ -79,24 +83,26 @@ Update-In-File-With-Retry ` -FilePath "$directory\AssistantChatMessage.Serialization.cs" ` -SearchPatternLines @( "return new AssistantChatMessage\(" - " role," " content," + " role," " additionalBinaryDataProperties," " refusal," " participantName," " toolCalls \?\? new ChangeTrackingList\(\)," - " functionCall\);" + " functionCall," + " outputAudioReference\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." "return new AssistantChatMessage(" - " role," " content ?? new ChatMessageContent()," + " role," " additionalBinaryDataProperties," " refusal," " participantName," " toolCalls ?? new ChangeTrackingList()," - " functionCall);" + " functionCall," + " outputAudioReference);" ) ` -OutputIndentation 12 ` -RequirePresence @@ -104,11 +110,11 @@ Update-In-File-With-Retry ` Update-In-File-With-Retry ` -FilePath "$directory\FunctionChatMessage.Serialization.cs" ` -SearchPatternLines @( - "return new FunctionChatMessage\(role, content, additionalBinaryDataProperties, functionName\);" + "return new FunctionChatMessage\(content, role, additionalBinaryDataProperties, functionName\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." - "return new FunctionChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, functionName);" + "return new FunctionChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, functionName);" ) ` -OutputIndentation 12 ` -RequirePresence @@ -116,11 +122,11 @@ Update-In-File-With-Retry ` Update-In-File-With-Retry ` -FilePath "$directory\SystemChatMessage.Serialization.cs" ` -SearchPatternLines @( - "return new SystemChatMessage\(role, content, additionalBinaryDataProperties, participantName\);" + "return new SystemChatMessage\(content, role, additionalBinaryDataProperties, participantName\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." - "return new SystemChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, participantName);" + "return new SystemChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, participantName);" ) ` -OutputIndentation 12 ` -RequirePresence @@ -128,11 +134,11 @@ Update-In-File-With-Retry ` Update-In-File-With-Retry ` -FilePath "$directory\ToolChatMessage.Serialization.cs" ` -SearchPatternLines @( - "return new ToolChatMessage\(role, content, additionalBinaryDataProperties, toolCallId\);" + "return new ToolChatMessage\(content, role, additionalBinaryDataProperties, toolCallId\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." - "return new ToolChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, toolCallId);" + "return new ToolChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, toolCallId);" ) ` -OutputIndentation 12 ` -RequirePresence @@ -140,11 +146,11 @@ Update-In-File-With-Retry ` Update-In-File-With-Retry ` -FilePath "$directory\UserChatMessage.Serialization.cs" ` -SearchPatternLines @( - "return new UserChatMessage\(role, content, additionalBinaryDataProperties, participantName\);" + "return new UserChatMessage\(content, role, additionalBinaryDataProperties, participantName\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." - "return new UserChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties, participantName);" + "return new UserChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties, participantName);" ) ` -OutputIndentation 12 ` -RequirePresence @@ -152,11 +158,11 @@ Update-In-File-With-Retry ` Update-In-File-With-Retry ` -FilePath "$directory\InternalUnknownChatMessage.Serialization.cs" ` -SearchPatternLines @( - "return new InternalUnknownChatMessage\(role, content, additionalBinaryDataProperties\);" + "return new InternalUnknownChatMessage\(content, role, additionalBinaryDataProperties\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." - "return new InternalUnknownChatMessage(role, content ?? new ChatMessageContent(), additionalBinaryDataProperties);" + "return new InternalUnknownChatMessage(content ?? new ChatMessageContent(), role, additionalBinaryDataProperties);" ) ` -OutputIndentation 12 ` -RequirePresence @@ -165,24 +171,26 @@ Update-In-File-With-Retry ` -FilePath "$directory\InternalFineTuneChatCompletionRequestAssistantMessage.Serialization.cs" ` -SearchPatternLines @( "return new InternalFineTuneChatCompletionRequestAssistantMessage\(" - " role," " content," + " role," " additionalBinaryDataProperties," " refusal," " participantName," " toolCalls \?\? new ChangeTrackingList\(\)," - " functionCall\);" + " functionCall," + " outputAudioReference\);" ) ` -ReplacePatternLines @( "// CUSTOM: Initialize Content collection property." "return new InternalFineTuneChatCompletionRequestAssistantMessage(" - " role," " content ?? new ChatMessageContent()," + " role," " additionalBinaryDataProperties," " refusal," " participantName," " toolCalls ?? new ChangeTrackingList()," - " functionCall);" + " functionCall," + " outputAudioReference);" ) ` -OutputIndentation 12 ` - -RequirePresence \ No newline at end of file + -RequirePresence diff --git a/.typespec/chat/custom.tsp b/.typespec/chat/custom.tsp index b3db01297..ea6cc0684 100644 --- a/.typespec/chat/custom.tsp +++ b/.typespec/chat/custom.tsp @@ -29,3 +29,14 @@ model ChatCompletionToolChoice {} model ChatMessageContent {} model ChatMessageContentPart {} + +model ChatCompletionMessageAudioChunk { + id?: string; + transcript?: string; + + @encode("base64") + data?: bytes; + + @encode("unixTimestamp", int32) + expires_at?: utcDateTime; +} diff --git a/.typespec/chat/models.tsp b/.typespec/chat/models.tsp index 058a341e5..7563d5fe5 100644 --- a/.typespec/chat/models.tsp +++ b/.typespec/chat/models.tsp @@ -14,7 +14,12 @@ namespace OpenAI; model ChatCompletionTokenLogprobBytes is int32[]; model CreateChatCompletionRequest { - /** A list of messages comprising the conversation so far. Depending on the [model](/docs/models) you use, different message types (modalities) are supported, like [text](/docs/guides/text-generation), [images](/docs/guides/vision), and audio. */ + /** + * A list of messages comprising the conversation so far. Depending on the + * [model](/docs/models) you use, different message types (modalities) are + * supported, like [text](/docs/guides/text-generation), + * [images](/docs/guides/vision), and [audio](/docs/guides/audio). + */ @minItems(1) messages: ChatCompletionRequestMessage[]; @@ -31,6 +36,8 @@ model CreateChatCompletionRequest { | "gpt-4o-2024-05-13" | "gpt-4o-realtime-preview" | "gpt-4o-realtime-preview-2024-10-01" + | "gpt-4o-audio-preview" + | "gpt-4o-audio-preview-2024-10-01" | "chatgpt-4o-latest" | "gpt-4o-mini" | "gpt-4o-mini-2024-07-18" @@ -113,6 +120,27 @@ model CreateChatCompletionRequest { @maxValue(128) n?: int32 | null = 1; + modalities?: ChatCompletionModalities | null; + + @doc(""" + Parameters for audio output. Required when audio output is requested with + `modalities: ["audio"]`. [Learn more](/docs/guides/audio). + """) + @extension("x-oaiExpandable", true) + audio?: { + @doc(""" + Specifies the voice type. Supported voices are `alloy`, `echo`, + `fable`, `onyx`, `nova`, and `shimmer`. + """) + voice: "alloy" | "echo" | "fable" | "onyx" | "nova" | "shimmer"; + + @doc(""" + Specifies the output audio format. Must be one of `wav`, `mp3`, `flac`, + `opus`, or `pcm16`. + """) + format: "wav" | "mp3" | "flac" | "opus" | "pcm16"; + } | null; + /** * Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. * @@ -322,6 +350,7 @@ union ChatCompletionToolChoiceOption { ChatCompletionNamedToolChoice, } +/** Learn about [text inputs](/docs/guides/text-generation). */ model ChatCompletionRequestMessageContentPartText { /** The type of the content part. */ type: "text"; @@ -330,6 +359,7 @@ model ChatCompletionRequestMessageContentPartText { text: string; } +/** Learn about [image inputs](/docs/guides/vision). */ model ChatCompletionRequestMessageContentPartImage { /** The type of the content part. */ type: "image_url"; @@ -379,6 +409,24 @@ model ChatCompletionRequestMessage { role: string; } +/** Learn about [audio inputs](/docs/guides/audio). */ +model ChatCompletionRequestMessageContentPartAudio { + @doc(""" + The type of the content part. Always `input_audio`. + """) + type: "input_audio"; + + input_audio: { + // Tool customization: use encoded type for audio data + /** Base64 encoded audio data. */ + @encode("base64") + data: bytes; + + /** The format of the encoded audio data. Currently supports "wav" and "mp3". */ + format: "wav" | "mp3"; + }; +} + @extension("x-oaiExpandable", true) union ChatCompletionRequestSystemMessageContentPart { ChatCompletionRequestMessageContentPartText, @@ -388,6 +436,7 @@ union ChatCompletionRequestSystemMessageContentPart { union ChatCompletionRequestUserMessageContentPart { ChatCompletionRequestMessageContentPartText, ChatCompletionRequestMessageContentPartImage, + ChatCompletionRequestMessageContentPartAudio, } @extension("x-oaiExpandable", true) @@ -450,6 +499,16 @@ model ChatCompletionRequestAssistantMessage /** An optional name for the participant. Provides the model information to differentiate between participants of the same role. */ name?: string; + /** + * Data about a previous audio response from the model. + * [Learn more](/docs/guides/audio). + */ + @extension("x-oaiExpandable", true) + audio?: { + /** Unique identifier for a previous audio response from the model. */ + id: string; + } | null; + tool_calls?: ChatCompletionMessageToolCalls; // Tool customization: preserve earlier, more intuitive field order @@ -496,6 +555,20 @@ model ChatCompletionRequestFunctionMessage /** The tool calls generated by the model, such as function calls. */ model ChatCompletionMessageToolCalls is ChatCompletionMessageToolCall[]; +@doc(""" + Output types that you would like the model to generate for this request. + Most models are capable of generating text, which is the default: + + `["text"]` + + The `gpt-4o-audio-preview` model can also be used to [generate audio](/docs/guides/audio). To + request that this model generate both text and audio responses, you can + use: + + `["text", "audio"]` + """) +model ChatCompletionModalities is ("text" | "audio")[]; + // Tool customization: convert to enum /** The role of the author of a message */ enum ChatCompletionRole { @@ -571,6 +644,36 @@ model ChatCompletionResponseMessage { name: string; arguments: string; }; + + /** + * If the audio output modality is requested, this object contains data + * about the audio response from the model. [Learn more](/docs/guides/audio). + */ + @extension("x-oaiExpandable", true) + audio?: { + /** Unique identifier for this audio response. */ + id: string; + + // Tool customization: 'created' and fields ending in '_at' are Unix encoded utcDateTime + /** + * The Unix timestamp (in seconds) for when this audio response will + * no longer be accessible on the server for use in multi-turn + * conversations. + */ + @encode("unixTimestamp", int32) + expires_at: utcDateTime; + + // Tool customization: use encoded type for audio data + /** + * Base64 encoded audio bytes generated by the model, in the format + * specified in the request. + */ + @encode("base64") + data: bytes; + + /** Transcript of the audio generated by the model. */ + transcript: string; + } | null; } model ChatCompletionTokenLogprob { @@ -625,8 +728,12 @@ model ChatCompletionFunctions { parameters?: FunctionParameters; } +// Tool customization: Add a missing "audio" to the chat streaming delta definition /** A chat completion delta generated by streamed model responses. */ model ChatCompletionStreamResponseDelta { + /** Response audio associated with the streaming chat delta payload. */ + audio?: ChatCompletionMessageAudioChunk; + /** The contents of the chunk message. */ content?: string | null; @@ -748,7 +855,7 @@ model CreateChatCompletionStreamResponse { /** Total number of tokens used in the request (prompt + completion). */ total_tokens: int32; - }; + } | null; } /** Represents a streamed chunk of a chat completion response returned by model, based on the provided input. */ diff --git a/.typespec/realtime/custom.tsp b/.typespec/realtime/custom.tsp index cb2588bea..3ce8eff8d 100644 --- a/.typespec/realtime/custom.tsp +++ b/.typespec/realtime/custom.tsp @@ -7,6 +7,7 @@ using TypeSpec.OpenAPI; namespace OpenAI; model RealtimeRequestSession { + ...RealtimeSessionBase; modalities?: RealtimeModalities; instructions?: string; voice?: RealtimeVoice; @@ -17,17 +18,65 @@ model RealtimeRequestSession { tools?: RealtimeTool[]; tool_choice?: RealtimeToolChoice; temperature?: float32; - - // Note: spec errata for 'max_output_tokens' max_response_output_tokens?: int32 | "inf"; } +model RealtimeResponseSession { + ...RealtimeSessionBase; + object: "realtime.session"; + id: string; + `model`: string; + modalities: RealtimeModalities; + instructions: string; + voice: RealtimeVoice; + input_audio_format: RealtimeAudioFormat; + output_audio_format: RealtimeAudioFormat; + input_audio_transcription: RealtimeAudioInputTranscriptionSettings | null; + turn_detection: RealtimeTurnDetection; + tools: RealtimeTool[]; + tool_choice: RealtimeToolChoice; + temperature: float32; + max_response_output_tokens: int32 | "inf" | null; +} + +model RealtimeResponseOptions { + ...RealtimeResponseBase; + + /** The modalities for the response. */ + modalities?: RealtimeModalities; + + /** Instructions for the model. */ + instructions?: string; + + @doc(""" + The voice the model uses to respond - one of `alloy`, `echo`, or `shimmer`. + """) + voice?: RealtimeVoice; + + /** The format of output audio. */ + output_audio_format?: RealtimeAudioFormat; + + // Tool customization: apply enriched tool definition hierarchy + /** Tools (functions) available to the model. */ + tools?: RealtimeTool[]; + + /** How the model chooses tools. */ + tool_choice?: RealtimeToolChoice; + + /** Sampling temperature. */ + temperature?: float32; + + /** Maximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or "inf" for the maximum available tokens for a given model. Defaults to "inf". */ + max_output_tokens?: int32 | "inf"; +} + model RealtimeResponse { + ...RealtimeResponseBase; object: "realtime.response"; id: string; status: RealtimeResponseStatus = RealtimeResponseStatus.in_progress; status_details: RealtimeResponseStatusDetails | null; - output: RealtimeResponseItem[]; + output: RealtimeConversationResponseItem[]; usage: { total_tokens: int32; input_tokens: int32; @@ -44,23 +93,6 @@ model RealtimeResponse { }; } -model RealtimeResponseSession { - object: "realtime.session"; - id: string; - `model`: string; - modalities: RealtimeModalities; - instructions: string; - voice: RealtimeVoice; - input_audio_format: RealtimeAudioFormat; - output_audio_format: RealtimeAudioFormat; - input_audio_transcription: RealtimeAudioInputTranscriptionSettings | null; - turn_detection: RealtimeTurnDetection; - tools: RealtimeTool[]; - tool_choice: RealtimeToolChoice; - temperature: float32; - max_response_output_tokens: int32 | "inf" | null; -} - union RealtimeVoice { string, alloy: "alloy", diff --git a/.typespec/realtime/custom/items.tsp b/.typespec/realtime/custom/items.tsp index 8e939b412..43ae09ef2 100644 --- a/.typespec/realtime/custom/items.tsp +++ b/.typespec/realtime/custom/items.tsp @@ -5,13 +5,14 @@ using TypeSpec.OpenAPI; namespace OpenAI; @discriminator("type") -model RealtimeRequestItem { +model RealtimeConversationRequestItem { + ...RealtimeConversationItemBase; type: RealtimeItemType; id?: string; } @discriminator("role") -model RealtimeRequestMessageItem extends RealtimeRequestItem { +model RealtimeRequestMessageItem extends RealtimeConversationRequestItem { type: RealtimeItemType.message; role: RealtimeMessageRole; status?: RealtimeItemStatus; @@ -32,7 +33,7 @@ model RealtimeRequestAssistantMessageItem extends RealtimeRequestMessageItem { content: RealtimeRequestTextContentPart[]; } -model RealtimeRequestFunctionCallItem extends RealtimeRequestItem { +model RealtimeRequestFunctionCallItem extends RealtimeConversationRequestItem { type: RealtimeItemType.function_call; name: string; call_id: string; @@ -40,7 +41,8 @@ model RealtimeRequestFunctionCallItem extends RealtimeRequestItem { status?: RealtimeItemStatus; } -model RealtimeRequestFunctionCallOutputItem extends RealtimeRequestItem { +model RealtimeRequestFunctionCallOutputItem + extends RealtimeConversationRequestItem { type: RealtimeItemType.function_call_output; call_id: string; output: string; @@ -49,26 +51,28 @@ model RealtimeRequestFunctionCallOutputItem extends RealtimeRequestItem { // TODO: representation of a doubly-discriminated type with an absent second discriminator // (first discriminator: type = message; second discriminator: no role present) -model RealtimeRequestMessageReferenceItem { // extends RealtimeRequestItem { +model RealtimeRequestMessageReferenceItem { // extends RealtimeConversationRequestItem { type: RealtimeItemType.message; id: string; } @discriminator("type") -model RealtimeResponseItem { +model RealtimeConversationResponseItem { + ...RealtimeConversationItemBase; object: "realtime.item"; type: RealtimeItemType; id: string | null; } -model RealtimeResponseMessageItem extends RealtimeResponseItem { +model RealtimeResponseMessageItem extends RealtimeConversationResponseItem { type: RealtimeItemType.message; role: RealtimeMessageRole; content: RealtimeContentPart[]; status: RealtimeItemStatus; } -model RealtimeResponseFunctionCallItem extends RealtimeResponseItem { +model RealtimeResponseFunctionCallItem + extends RealtimeConversationResponseItem { type: RealtimeItemType.function_call; name: string; call_id: string; @@ -76,7 +80,8 @@ model RealtimeResponseFunctionCallItem extends RealtimeResponseItem { status: RealtimeItemStatus; } -model RealtimeResponseFunctionCallOutputItem extends RealtimeResponseItem { +model RealtimeResponseFunctionCallOutputItem + extends RealtimeConversationResponseItem { type: RealtimeItemType.function_call_output; call_id: string; output: string; diff --git a/.typespec/realtime/models.tsp b/.typespec/realtime/models.tsp index 047c48f9c..0826558f8 100644 --- a/.typespec/realtime/models.tsp +++ b/.typespec/realtime/models.tsp @@ -10,32 +10,53 @@ using TypeSpec.OpenAPI; namespace OpenAI; // Tool customization: apply discriminated type base -/** Send this event to update the session’s default configuration. */ +@doc(""" + Send this event to update the session’s default configuration. The client may send this event at any time to update the session configuration, and any field may be updated at any time, except for "voice". The server will respond with a `session.updated` event that shows the full effective configuration. Only fields that are present are updated, thus the correct way to clear a field like "instructions" is to pass an empty string. + """) model RealtimeClientEventSessionUpdate extends RealtimeClientEvent { // Tool customization: apply discriminated type base /** The event type, must be "session.update". */ type: RealtimeClientEventType.session_update; - // Tool customization: apply shared session type - /** Session configuration to update. */ + // Tool customization: apply enriched request-specific model session: RealtimeRequestSession; } +// Tool customization: establish custom, enriched discriminated type hierarchy +/** The item to add to the conversation. */ +model RealtimeConversationItemBase { + /** Customized to enriched RealtimeConversation{Request,Response}Item models */ +} + +// Tool customization: apply enriched response type +/** The response resource. */ +model RealtimeResponseBase { + /** applied in enriched RealtimeResponse */ +} + // Tool customization: apply discriminated type base -/** Send this event to append audio bytes to the input audio buffer. */ +/** + * Send this event to append audio bytes to the input audio buffer. The audio buffer is temporary storage you can write to and later commit. In Server VAD mode, the audio buffer is used to detect speech and the server will decide when to commit. When Server VAD is disabled, you must commit the audio buffer manually. + * The client may choose how much audio to place in each event up to a maximum of 15 MiB, for example streaming smaller chunks from the client may allow the VAD to be more responsive. Unlike made other client events, the server will not send a confirmation response to this event. + */ model RealtimeClientEventInputAudioBufferAppend extends RealtimeClientEvent { // Tool customization: apply discriminated type base /** The event type, must be "input_audio_buffer.append". */ type: RealtimeClientEventType.input_audio_buffer_append; // Tool customization: use encoded type for audio data - /** Base64-encoded audio bytes. */ + @doc(""" + Base64-encoded audio bytes. This must be in the format specified by the `input_audio_format` field in the session configuration. + """) @encode("base64") audio: bytes; } // Tool customization: apply discriminated type base -/** Send this event to commit audio bytes to a user message. */ +@doc(""" + Send this event to commit the user input audio buffer, which will create a new user message item in the conversation. This event will produce an error if the input audio buffer is empty. When in Server VAD mode, the client does not need to send this event, the server will commit the audio buffer automatically. + Committing the input audio buffer will trigger input audio transcription (if enabled in session configuration), but it will not create a response from the model. The server will respond with an `input_audio_buffer.committed` event. + """) model RealtimeClientEventInputAudioBufferCommit extends RealtimeClientEvent { // Tool customization: apply discriminated type base /** The event type, must be "input_audio_buffer.commit". */ @@ -43,7 +64,9 @@ model RealtimeClientEventInputAudioBufferCommit extends RealtimeClientEvent { } // Tool customization: apply discriminated type base -/** Send this event to clear the audio bytes in the buffer. */ +@doc(""" + Send this event to clear the audio bytes in the buffer. The server will respond with an `input_audio_buffer.cleared` event. + """) model RealtimeClientEventInputAudioBufferClear extends RealtimeClientEvent { // Tool customization: apply discriminated type base /** The event type, must be "input_audio_buffer.clear". */ @@ -51,39 +74,49 @@ model RealtimeClientEventInputAudioBufferClear extends RealtimeClientEvent { } // Tool customization: apply discriminated type base -/** Send this event when adding an item to the conversation. */ +@doc(""" + Add a new Item to the Conversation's context, including messages, function calls, and function call responses. This event can be used both to populate a "history" of the conversation and to add new items mid-stream, but has the current limitation that it cannot populate assistant audio messages. + If successful, the server will respond with a `conversation.item.created` event, otherwise an `error` event will be sent. + """) model RealtimeClientEventConversationItemCreate extends RealtimeClientEvent { // Tool customization: apply discriminated type base - /** The event type, must be "conversation.item.create". */ + @doc(""" + The event type, must be `conversation.item.create`. + """) type: RealtimeClientEventType.conversation_item_create; - /** The ID of the preceding item after which the new item will be inserted. */ + /** The ID of the preceding item after which the new item will be inserted. If not set, the new item will be appended to the end of the conversation. If set, it allows an item to be inserted mid-conversation. If the ID cannot be found, an error will be returned and the item will not be added. */ previous_item_id?: string; // Tool customization: apply enriched item definition hierarchy - /** The item to add to the conversation. */ - item: RealtimeRequestItem; + item: RealtimeConversationRequestItem; } // Tool customization: apply discriminated type base -/** Send this event when you want to truncate a previous assistant message’s audio. */ +@doc(""" + Send this event to truncate a previous assistant message’s audio. The server will produce audio faster than realtime, so this event is useful when the user interrupts to truncate audio that has already been sent to the client but not yet played. This will synchronize the server's understanding of the audio with the client's playback. + Truncating audio will delete the server-side text transcript to ensure there is not text in the context that hasn't been heard by the user. + If successful, the server will respond with a `conversation.item.truncated` event. + """) model RealtimeClientEventConversationItemTruncate extends RealtimeClientEvent { // Tool customization: apply discriminated type base /** The event type, must be "conversation.item.truncate". */ type: RealtimeClientEventType.conversation_item_truncate; - /** The ID of the assistant message item to truncate. */ + /** The ID of the assistant message item to truncate. Only assistant message items can be truncated. */ item_id: string; - /** The index of the content part to truncate. */ + /** The index of the content part to truncate. Set this to 0. */ content_index: int32; - /** Inclusive duration up to which audio is truncated, in milliseconds. */ + /** Inclusive duration up to which audio is truncated, in milliseconds. If the audio_end_ms is greater than the actual audio duration, the server will respond with an error. */ audio_end_ms: int32; } // Tool customization: apply discriminated type base -/** Send this event when you want to remove any item from the conversation history. */ +@doc(""" + Send this event when you want to remove any item from the conversation history. The server will respond with a `conversation.item.deleted` event, unless the item does not exist in the conversation history, in which case the server will respond with an error. + """) model RealtimeClientEventConversationItemDelete extends RealtimeClientEvent { // Tool customization: apply discriminated type base /** The event type, must be "conversation.item.delete". */ @@ -94,53 +127,37 @@ model RealtimeClientEventConversationItemDelete extends RealtimeClientEvent { } // Tool customization: apply discriminated type base -/** Send this event to trigger a response generation. */ +@doc(""" + This event instructs the server to create a Response, which means triggering model inference. When in Server VAD mode, the server will create Responses automatically. + A Response will include at least one Item, and may have two, in which case the second will be a function call. These Items will be appended to the conversation history. + The server will respond with a `response.created` event, events for Items and content created, and finally a `response.done` event to indicate the Response is complete. + The `response.create` event includes inference configuration like `instructions`, and `temperature`. These fields will override the Session's configuration for this Response only. + """) model RealtimeClientEventResponseCreate extends RealtimeClientEvent { // Tool customization: apply discriminated type base - /** The event type, must be "response.create". */ + @doc(""" + The event type, must be `response.create`. + """) type: RealtimeClientEventType.response_create; - /** Configuration for the response. */ - response: { - /** The modalities for the response. */ - modalities?: string[]; - - /** Instructions for the model. */ - instructions?: string; - - @doc(""" - The voice the model uses to respond - one of `alloy`, `echo`, or `shimmer`. - """) - voice?: string; - - /** The format of output audio. */ - output_audio_format?: string; - - // Tool customization: apply enriched tool definition hierarchy - /** Tools (functions) available to the model. */ - tools?: RealtimeTool[]; - - /** How the model chooses tools. */ - tool_choice?: string; - - /** Sampling temperature. */ - temperature?: float32; - - /** Maximum number of output tokens for a single assistant response, inclusive of tool calls. Provide an integer between 1 and 4096 to limit output tokens, or "inf" for the maximum available tokens for a given model. Defaults to "inf". */ - max_output_tokens?: int32 | "inf"; - }; + // Tool customization: apply custom, distinct type for request-side response options + response: RealtimeResponseOptions; } // Tool customization: apply discriminated type base -/** Send this event to cancel an in-progress response. */ +@doc(""" + Send this event to cancel an in-progress response. The server will respond with a `response.cancelled` event or an error if there is no response to cancel. + """) model RealtimeClientEventResponseCancel extends RealtimeClientEvent { // Tool customization: apply discriminated type base - /** The event type, must be "response.cancel". */ + @doc(""" + The event type, must be `response.cancel`. + """) type: RealtimeClientEventType.response_cancel; } // Tool customization: apply discriminated type base -/** Returned when an error occurs. */ +/** Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default. */ model RealtimeServerEventError extends RealtimeServerEvent { // Tool customization: apply discriminated type /** The event type, must be "error". */ @@ -166,29 +183,35 @@ model RealtimeServerEventError extends RealtimeServerEvent { } // Tool customization: apply discriminated type base -/** Returned when a session is created. Emitted automatically when a new connection is established. */ +/** Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration. */ model RealtimeServerEventSessionCreated extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "session.created". */ + @doc(""" + The event type, must be `session.created`. + """) type: RealtimeServerEventType.session_created; - // Tool customization: apply shared session type - /** The session resource. */ + // Tool customization: apply enriched response-specific model session: RealtimeResponseSession; } // Tool customization: apply discriminated type base -/** Returned when a session is updated. */ +@doc(""" + Returned when a session is updated with a `session.update` event, unless there is an error. + """) model RealtimeServerEventSessionUpdated extends RealtimeServerEvent { // Tool customization: apply discriminated type /** The event type, must be "session.updated". */ type: RealtimeServerEventType.session_updated; - // Tool customization: apply shared session type - /** The updated session resource. */ + // Tool customization: apply enriched response-specific model session: RealtimeResponseSession; } +// Tool customization: establish base for enriched request/response split models +/** Realtime session object configuration. */ +model RealtimeSessionBase {} + // Tool customization: apply discriminated type base /** Returned when a conversation is created. Emitted right after session creation. */ model RealtimeServerEventConversationCreated extends RealtimeServerEvent { @@ -207,10 +230,14 @@ model RealtimeServerEventConversationCreated extends RealtimeServerEvent { } // Tool customization: apply discriminated type base -/** Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. */ +@doc(""" + Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The `item_id` property is the ID of the user message item that will be created, thus a `conversation.item.created` event will also be sent to the client. + """) model RealtimeServerEventInputAudioBufferCommitted extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "input_audio_buffer.committed". */ + @doc(""" + The event type, must be `input_audio_buffer.committed`. + """) type: RealtimeServerEventType.input_audio_buffer_committed; /** The ID of the preceding item after which the new item will be inserted. */ @@ -221,22 +248,32 @@ model RealtimeServerEventInputAudioBufferCommitted extends RealtimeServerEvent { } // Tool customization: apply discriminated type base -/** Returned when the input audio buffer is cleared by the client. */ +@doc(""" + Returned when the input audio buffer is cleared by the client with a `input_audio_buffer.clear` event. + """) model RealtimeServerEventInputAudioBufferCleared extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "input_audio_buffer.cleared". */ + @doc(""" + The event type, must be `input_audio_buffer.cleared`. + """) type: RealtimeServerEventType.input_audio_buffer_cleared; } // Tool customization: apply discriminated type base -/** Returned in server turn detection mode when speech is detected. */ +@doc(""" + Sent by the server when in `server_vad` mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user. The client should expect to receive a `input_audio_buffer.speech_stopped` event when speech stops. The `item_id` property is the ID of the user message item that will be created when speech stops and will also be included in the `input_audio_buffer.speech_stopped` event (unless the client manually commits the audio buffer during VAD activation). + """) model RealtimeServerEventInputAudioBufferSpeechStarted extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "input_audio_buffer.speech_started". */ + @doc(""" + The event type, must be `input_audio_buffer.speech_started`. + """) type: RealtimeServerEventType.input_audio_buffer_speech_started; - /** Milliseconds since the session started when speech was detected. */ + @doc(""" + Milliseconds from the start of all audio written to the buffer during the session when speech was first detected. This will correspond to the beginning of audio sent to the model, and thus includes the `prefix_padding_ms` configured in the Session. + """) audio_start_ms: int32; /** The ID of the user message item that will be created when speech stops. */ @@ -244,14 +281,20 @@ model RealtimeServerEventInputAudioBufferSpeechStarted } // Tool customization: apply discriminated type base -/** Returned in server turn detection mode when speech stops. */ +@doc(""" + Returned in `server_vad` mode when the server detects the end of speech in the audio buffer. The server will also send an `conversation.item.created` event with the user message item that is created from the audio buffer. + """) model RealtimeServerEventInputAudioBufferSpeechStopped extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "input_audio_buffer.speech_stopped". */ + @doc(""" + The event type, must be `input_audio_buffer.speech_stopped`. + """) type: RealtimeServerEventType.input_audio_buffer_speech_stopped; - /** Milliseconds since the session started when speech stopped. */ + @doc(""" + Milliseconds since the session started when speech stopped. This will correspond to the end of audio sent to the model, and thus includes the `min_silence_duration_ms` configured in the Session. + """) audio_end_ms: int32; /** The ID of the user message item that will be created. */ @@ -259,29 +302,40 @@ model RealtimeServerEventInputAudioBufferSpeechStopped } // Tool customization: apply discriminated type base -/** Returned when a conversation item is created. */ +@doc(""" + Returned when a conversation item is created. There are several scenarios that produce this event: + - The server is generating a Response, which if successful will produce either one or two Items, which will be of type `message` (role `assistant`) or type `function_call`. + - The input audio buffer has been committed, either by the client or the server (in `server_vad` mode). The server will take the content of the input audio buffer and add it to a new user message Item. + - The client has sent a `conversation.item.create` event to add a new Item to the Conversation. + """) model RealtimeServerEventConversationItemCreated extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "conversation.item.created". */ + @doc(""" + The event type, must be `conversation.item.created`. + """) type: RealtimeServerEventType.conversation_item_created; - /** The ID of the preceding item. */ + /** The ID of the preceding item in the Conversation context, allows the client to understand the order of the conversation. */ previous_item_id: string; // Tool customization: apply enriched item definition hierarchy - /** The item that was created. */ - item: RealtimeResponseItem; + item: RealtimeConversationResponseItem; } // Tool customization: apply discriminated type base -/** Returned when input audio transcription is enabled and a transcription succeeds. */ +@doc(""" + This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in `server_vad` mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. + Realtime API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model, currently always `whisper-1`. Thus the transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide. + """) model RealtimeServerEventConversationItemInputAudioTranscriptionCompleted extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "conversation.item.input_audio_transcription.completed". */ + @doc(""" + The event type, must be `conversation.item.input_audio_transcription.completed`. + """) type: RealtimeServerEventType.conversation_item_input_audio_transcription_completed; - /** The ID of the user message item. */ + /** The ID of the user message item containing the audio. */ item_id: string; /** The index of the content part containing the audio. */ @@ -292,11 +346,15 @@ model RealtimeServerEventConversationItemInputAudioTranscriptionCompleted } // Tool customization: apply discriminated type base -/** Returned when input audio transcription is configured, and a transcription request for a user message failed. */ +@doc(""" + Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other `error` events so that the client can identify the related Item. + """) model RealtimeServerEventConversationItemInputAudioTranscriptionFailed extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "conversation.item.input_audio_transcription.failed". */ + @doc(""" + The event type, must be `conversation.item.input_audio_transcription.failed`. + """) type: RealtimeServerEventType.conversation_item_input_audio_transcription_failed; /** The ID of the user message item. */ @@ -322,10 +380,15 @@ model RealtimeServerEventConversationItemInputAudioTranscriptionFailed } // Tool customization: apply discriminated type base -/** Returned when an earlier assistant audio message item is truncated by the client. */ +@doc(""" + Returned when an earlier assistant audio message item is truncated by the client with a `conversation.item.truncate` event. This event is used to synchronize the server's understanding of the audio with the client's playback. + This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user. + """) model RealtimeServerEventConversationItemTruncated extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "conversation.item.truncated". */ + @doc(""" + The event type, must be `conversation.item.truncated`. + """) type: RealtimeServerEventType.conversation_item_truncated; /** The ID of the assistant message item that was truncated. */ @@ -339,10 +402,14 @@ model RealtimeServerEventConversationItemTruncated extends RealtimeServerEvent { } // Tool customization: apply discriminated type base -/** Returned when an item in the conversation is deleted. */ +@doc(""" + Returned when an item in the conversation is deleted by the client with a `conversation.item.delete` event. This event is used to synchronize the server's understanding of the conversation history with the client's view. + """) model RealtimeServerEventConversationItemDeleted extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "conversation.item.deleted". */ + @doc(""" + The event type, must be `conversation.item.deleted`. + """) type: RealtimeServerEventType.conversation_item_deleted; /** The ID of the item that was deleted. */ @@ -350,63 +417,67 @@ model RealtimeServerEventConversationItemDeleted extends RealtimeServerEvent { } // Tool customization: apply discriminated type base -/** Returned when a new Response is created. The first event of response creation, where the response is in an initial state of "in_progress". */ +@doc(""" + Returned when a new Response is created. The first event of response creation, where the response is in an initial state of `in_progress`. + """) model RealtimeServerEventResponseCreated extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "response.created". */ + @doc(""" + The event type, must be `response.created`. + """) type: RealtimeServerEventType.response_created; - // Tool customization: apply shared response type - /** The response resource. */ response: RealtimeResponse; } // Tool customization: apply discriminated type base -/** Returned when a Response is done streaming. Always emitted, no matter the final state. */ +@doc(""" + Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the `response.done` event will include all output Items in the Response but will omit the raw audio data. + """) model RealtimeServerEventResponseDone extends RealtimeServerEvent { // Tool customization: apply discriminated type /** The event type, must be "response.done". */ type: RealtimeServerEventType.response_done; - // Tool customization: apply shared response type - /** The response resource. */ response: RealtimeResponse; } // Tool customization: apply discriminated type base -/** Returned when a new Item is created during response generation. */ +/** Returned when a new Item is created during Response generation. */ model RealtimeServerEventResponseOutputItemAdded extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "response.output_item.added". */ + @doc(""" + The event type, must be `response.output_item.added`. + """) type: RealtimeServerEventType.response_output_item_added; - /** The ID of the response to which the item belongs. */ + /** The ID of the Response to which the item belongs. */ response_id: string; - /** The index of the output item in the response. */ + /** The index of the output item in the Response. */ output_index: int32; // Tool customization: apply enriched item definition hierarchy - /** The item that was added. */ - item: RealtimeResponseItem; + item: RealtimeConversationResponseItem; } // Tool customization: apply discriminated type base /** Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled. */ model RealtimeServerEventResponseOutputItemDone extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "response.output_item.done". */ + @doc(""" + The event type, must be `response.output_item.done`. + """) type: RealtimeServerEventType.response_output_item_done; - /** The ID of the response to which the item belongs. */ + /** The ID of the Response to which the item belongs. */ response_id: string; - /** The index of the output item in the response. */ + /** The index of the output item in the Response. */ output_index: int32; // Tool customization: apply enriched item definition hierarchy - /** The completed item. */ - item: RealtimeResponseItem; + item: RealtimeConversationResponseItem; } // Tool customization: apply discriminated type base @@ -645,10 +716,12 @@ model RealtimeServerEventResponseFunctionCallArgumentsDone } // Tool customization: apply discriminated type base -/** Emitted after every "response.done" event to indicate the updated rate limits. */ +/** Emitted at the beginning of a Response to indicate the updated rate limits. When a Response is created some tokens will be "reserved" for the output tokens, the rate limits shown here reflect that reservation, which is then adjusted accordingly once the Response is completed. */ model RealtimeServerEventRateLimitsUpdated extends RealtimeServerEvent { // Tool customization: apply discriminated type - /** The event type, must be "rate_limits.updated". */ + @doc(""" + The event type, must be `rate_limits.updated`. + """) type: RealtimeServerEventType.rate_limits_updated; // Tool customization: use custom type for rate limit items (applying encoded duration) diff --git a/README.md b/README.md index fac2ff08e..ed5c9897a 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,10 @@ # A conversion of the OpenAI OpenAPI to TypeSpec -Snapshot: 9da44b1e126916bbd4ab0bd62accf5622a3ec6ba -Ingestion tool: https://github.com/trrwilson/OpenApiToTsp@fa8e27d +For information on spec ingestion, see the Sorento wiki page: +https://dev.azure.com/project-argos/Sorento/_wiki/wikis/Sorento.wiki/3021/Generate-OpenAI's-YAML-Spec + +Snapshot: https://project-argos@dev.azure.com/project-argos/Sorento/_git/export-api@54593e37 +Ingestion tool: https://project-argos@dev.azure.com/project-argos/Sorento/_git/sdk@da3aa64 There are some deltas: diff --git a/openapi3-original.yaml b/openapi3-original.yaml index 3dd6e4681..c229c604d 100644 --- a/openapi3-original.yaml +++ b/openapi3-original.yaml @@ -1330,7 +1330,14 @@ paths: operationId: createChatCompletion tags: - Chat - summary: Creates a model response for the given chat conversation. + summary: > + Creates a model response for the given chat conversation. Learn more in + the + + [text generation](/docs/guides/text-generation), + [vision](/docs/guides/vision), + + and [audio](/docs/guides/audio) guides. requestBody: required: true content: @@ -10770,6 +10777,32 @@ components: description: The tool calls generated by the model, such as function calls. items: $ref: "#/components/schemas/ChatCompletionMessageToolCall" + ChatCompletionModalities: + type: array + nullable: true + description: > + Output types that you would like the model to generate for this request. + + Most models are capable of generating text, which is the default: + + + `["text"]` + + + The `gpt-4o-audio-preview` model can also be used to [generate + audio](/docs/guides/audio). To + + request that this model generate both text and audio responses, you can + + use: + + + `["text", "audio"]` + items: + type: string + enum: + - text + - audio ChatCompletionNamedToolChoice: type: object description: Specifies a tool the model should use. Use to force the model to @@ -10825,6 +10858,20 @@ components: type: string description: An optional name for the participant. Provides the model information to differentiate between participants of the same role. + audio: + type: object + nullable: true + x-oaiExpandable: true + description: | + Data about a previous audio response from the model. + [Learn more](/docs/guides/audio). + required: + - id + properties: + id: + type: string + description: | + Unique identifier for a previous audio response from the model. tool_calls: $ref: "#/components/schemas/ChatCompletionMessageToolCalls" function_call: @@ -10883,9 +10930,42 @@ components: - $ref: "#/components/schemas/ChatCompletionRequestToolMessage" - $ref: "#/components/schemas/ChatCompletionRequestFunctionMessage" x-oaiExpandable: true + ChatCompletionRequestMessageContentPartAudio: + type: object + title: Audio content part + description: | + Learn about [audio inputs](/docs/guides/audio). + properties: + type: + type: string + enum: + - input_audio + description: The type of the content part. Always `input_audio`. + input_audio: + type: object + properties: + data: + type: string + description: Base64 encoded audio data. + format: + type: string + enum: + - wav + - mp3 + description: > + The format of the encoded audio data. Currently supports "wav" + and "mp3". + required: + - data + - format + required: + - type + - input_audio ChatCompletionRequestMessageContentPartImage: type: object title: Image content part + description: | + Learn about [image inputs](/docs/guides/vision). properties: type: type: string @@ -10931,6 +11011,8 @@ components: ChatCompletionRequestMessageContentPartText: type: object title: Text content part + description: | + Learn about [text inputs](/docs/guides/text-generation). properties: type: type: string @@ -11021,10 +11103,9 @@ components: description: The text contents of the message. title: Text content - type: array - description: An array of content parts with a defined type, each can be of type - `text` or `image_url` when passing in images. You can pass - multiple images by adding multiple `image_url` content parts. - Image input is only supported when using the `gpt-4o` model. + description: An array of content parts with a defined type. Supported options + differ based on the [model](/docs/models) being used to generate + the response. Can contain text, image, or audio inputs. title: Array of content parts items: $ref: "#/components/schemas/ChatCompletionRequestUserMessageContentPart" @@ -11046,6 +11127,7 @@ components: oneOf: - $ref: "#/components/schemas/ChatCompletionRequestMessageContentPartText" - $ref: "#/components/schemas/ChatCompletionRequestMessageContentPartImage" + - $ref: "#/components/schemas/ChatCompletionRequestMessageContentPartAudio" x-oaiExpandable: true ChatCompletionResponseMessage: type: object @@ -11085,6 +11167,41 @@ components: required: - name - arguments + audio: + type: object + nullable: true + description: > + If the audio output modality is requested, this object contains data + + about the audio response from the model. [Learn + more](/docs/guides/audio). + x-oaiExpandable: true + required: + - id + - expires_at + - data + - transcript + properties: + id: + type: string + description: Unique identifier for this audio response. + expires_at: + type: integer + description: > + The Unix timestamp (in seconds) for when this audio response + will + + no longer be accessible on the server for use in multi-turn + + conversations. + data: + type: string + description: | + Base64 encoded audio bytes generated by the model, in the format + specified in the request. + transcript: + type: string + description: Transcript of the audio generated by the model. required: - role - content @@ -11685,10 +11802,14 @@ components: messages: description: > A list of messages comprising the conversation so far. Depending on - the [model](/docs/models) you use, different message types - (modalities) are supported, like - [text](/docs/guides/text-generation), [images](/docs/guides/vision), - and audio. + the + + [model](/docs/models) you use, different message types (modalities) + are + + supported, like [text](/docs/guides/text-generation), + + [images](/docs/guides/vision), and [audio](/docs/guides/audio). type: array minItems: 1 items: @@ -11712,6 +11833,8 @@ components: - gpt-4o-2024-08-06 - gpt-4o-realtime-preview - gpt-4o-realtime-preview-2024-10-01 + - gpt-4o-audio-preview + - gpt-4o-audio-preview-2024-10-01 - chatgpt-4o-latest - gpt-4o-mini - gpt-4o-mini-2024-07-18 @@ -11832,6 +11955,46 @@ components: message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep `n` as `1` to minimize costs. + modalities: + $ref: "#/components/schemas/ChatCompletionModalities" + audio: + type: object + nullable: true + description: > + Parameters for audio output. Required when audio output is requested + with + + `modalities: ["audio"]`. [Learn more](/docs/guides/audio). + required: + - voice + - format + x-oaiExpandable: true + properties: + voice: + type: string + enum: + - alloy + - echo + - fable + - onyx + - nova + - shimmer + description: | + Specifies the voice type. Supported voices are `alloy`, `echo`, + `fable`, `onyx`, `nova`, and `shimmer`. + format: + type: string + enum: + - wav + - mp3 + - flac + - opus + - pcm16 + description: > + Specifies the output audio format. Must be one of `wav`, `mp3`, + `flac`, + + `opus`, or `pcm16`. presence_penalty: type: number default: 0 @@ -12142,28 +12305,34 @@ components: group: chat example: | { - "id": "chatcmpl-123", + "id": "chatcmpl-123456", "object": "chat.completion", - "created": 1677652288, - "model": "gpt-4o-mini", - "system_fingerprint": "fp_44709d6fcb", - "choices": [{ - "index": 0, - "message": { - "role": "assistant", - "content": "\n\nHello there, how may I assist you today?", - }, - "logprobs": null, - "finish_reason": "stop" - }], + "created": 1728933352, + "model": "gpt-4o-2024-08-06", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "Hi there! How can I assist you today?", + "refusal": null + }, + "logprobs": null, + "finish_reason": "stop" + } + ], "usage": { - "prompt_tokens": 9, - "completion_tokens": 12, - "total_tokens": 21, + "prompt_tokens": 19, + "completion_tokens": 10, + "total_tokens": 29, + "prompt_tokens_details": { + "cached_tokens": 0 + }, "completion_tokens_details": { "reasoning_tokens": 0 } - } + }, + "system_fingerprint": "fp_6b68a8204b" } CreateChatCompletionStreamResponse: type: object @@ -12268,6 +12437,7 @@ components: - chat.completion.chunk usage: type: object + nullable: true description: > An optional field that will only be present when you set `stream_options: {"include_usage": true}` in your request. @@ -17273,65 +17443,32 @@ components: - role RealtimeClientEventConversationItemCreate: type: object - description: Send this event when adding an item to the conversation. + description: >- + Add a new Item to the Conversation's context, including messages, + function calls, and function call responses. This event can be used both + to populate a "history" of the conversation and to add new items + mid-stream, but has the current limitation that it cannot populate + assistant audio messages. + + If successful, the server will respond with a + `conversation.item.created` event, otherwise an `error` event will be + sent. properties: event_id: type: string description: Optional client-generated ID used to identify this event. type: type: string - description: The event type, must be "conversation.item.create". + description: The event type, must be `conversation.item.create`. previous_item_id: type: string description: The ID of the preceding item after which the new item will be - inserted. + inserted. If not set, the new item will be appended to the end of + the conversation. If set, it allows an item to be inserted + mid-conversation. If the ID cannot be found, an error will be + returned and the item will not be added. item: - type: object - description: The item to add to the conversation. - properties: - id: - type: string - description: The unique ID of the item. - type: - type: string - description: The type of the item ("message", "function_call", - "function_call_output"). - status: - type: string - description: The status of the item ("completed", "in_progress", "incomplete"). - role: - type: string - description: The role of the message sender ("user", "assistant", "system"). - content: - type: array - description: The content of the message. - items: - type: object - properties: - type: - type: string - description: The content type ("input_text", "input_audio", "text", "audio"). - text: - type: string - description: The text content. - audio: - type: string - description: Base64-encoded audio bytes. - transcript: - type: string - description: The transcript of the audio. - call_id: - type: string - description: The ID of the function call (for "function_call" items). - name: - type: string - description: The name of the function being called (for "function_call" items). - arguments: - type: string - description: The arguments of the function call (for "function_call" items). - output: - type: string - description: The output of the function call (for "function_call_output" items). + $ref: "#/components/schemas/RealtimeConversationItem" required: - type - item @@ -17346,7 +17483,6 @@ components: "item": { "id": "msg_001", "type": "message", - "status": "completed", "role": "user", "content": [ { @@ -17359,7 +17495,10 @@ components: RealtimeClientEventConversationItemDelete: type: object description: Send this event when you want to remove any item from the - conversation history. + conversation history. The server will respond with a + `conversation.item.deleted` event, unless the item does not exist in the + conversation history, in which case the server will respond with an + error. properties: event_id: type: string @@ -17384,8 +17523,18 @@ components: } RealtimeClientEventConversationItemTruncate: type: object - description: Send this event when you want to truncate a previous assistant - message’s audio. + description: >- + Send this event to truncate a previous assistant message’s audio. The + server will produce audio faster than realtime, so this event is useful + when the user interrupts to truncate audio that has already been sent to + the client but not yet played. This will synchronize the server's + understanding of the audio with the client's playback. + + Truncating audio will delete the server-side text transcript to ensure + there is not text in the context that hasn't been heard by the user. + + If successful, the server will respond with a + `conversation.item.truncated` event. properties: event_id: type: string @@ -17395,13 +17544,16 @@ components: description: The event type, must be "conversation.item.truncate". item_id: type: string - description: The ID of the assistant message item to truncate. + description: The ID of the assistant message item to truncate. Only assistant + message items can be truncated. content_index: type: integer - description: The index of the content part to truncate. + description: The index of the content part to truncate. Set this to 0. audio_end_ms: type: integer description: Inclusive duration up to which audio is truncated, in milliseconds. + If the audio_end_ms is greater than the actual audio duration, the + server will respond with an error. required: - type - item_id @@ -17420,7 +17572,17 @@ components: } RealtimeClientEventInputAudioBufferAppend: type: object - description: Send this event to append audio bytes to the input audio buffer. + description: >- + Send this event to append audio bytes to the input audio buffer. The + audio buffer is temporary storage you can write to and later commit. In + Server VAD mode, the audio buffer is used to detect speech and the + server will decide when to commit. When Server VAD is disabled, you must + commit the audio buffer manually. + + The client may choose how much audio to place in each event up to a + maximum of 15 MiB, for example streaming smaller chunks from the client + may allow the VAD to be more responsive. Unlike made other client + events, the server will not send a confirmation response to this event. properties: event_id: type: string @@ -17430,7 +17592,8 @@ components: description: The event type, must be "input_audio_buffer.append". audio: type: string - description: Base64-encoded audio bytes. + description: Base64-encoded audio bytes. This must be in the format specified by + the `input_audio_format` field in the session configuration. required: - type - audio @@ -17445,7 +17608,8 @@ components: } RealtimeClientEventInputAudioBufferClear: type: object - description: Send this event to clear the audio bytes in the buffer. + description: Send this event to clear the audio bytes in the buffer. The server + will respond with an `input_audio_buffer.cleared` event. properties: event_id: type: string @@ -17465,7 +17629,17 @@ components: } RealtimeClientEventInputAudioBufferCommit: type: object - description: Send this event to commit audio bytes to a user message. + description: >- + Send this event to commit the user input audio buffer, which will create + a new user message item in the conversation. This event will produce an + error if the input audio buffer is empty. When in Server VAD mode, the + client does not need to send this event, the server will commit the + audio buffer automatically. + + Committing the input audio buffer will trigger input audio transcription + (if enabled in session configuration), but it will not create a response + from the model. The server will respond with an + `input_audio_buffer.committed` event. properties: event_id: type: string @@ -17485,14 +17659,16 @@ components: } RealtimeClientEventResponseCancel: type: object - description: Send this event to cancel an in-progress response. + description: Send this event to cancel an in-progress response. The server will + respond with a `response.cancelled` event or an error if there is no + response to cancel. properties: event_id: type: string description: Optional client-generated ID used to identify this event. type: type: string - description: The event type, must be "response.cancel". + description: The event type, must be `response.cancel`. required: - type x-oaiMeta: @@ -17505,67 +17681,31 @@ components: } RealtimeClientEventResponseCreate: type: object - description: Send this event to trigger a response generation. + description: >- + This event instructs the server to create a Response, which means + triggering model inference. When in Server VAD mode, the server will + create Responses automatically. + + A Response will include at least one Item, and may have two, in which + case the second will be a function call. These Items will be appended to + the conversation history. + + The server will respond with a `response.created` event, events for + Items and content created, and finally a `response.done` event to + indicate the Response is complete. + + The `response.create` event includes inference configuration like + `instructions`, and `temperature`. These fields will override the + Session's configuration for this Response only. properties: event_id: type: string description: Optional client-generated ID used to identify this event. type: type: string - description: The event type, must be "response.create". + description: The event type, must be `response.create`. response: - type: object - description: Configuration for the response. - properties: - modalities: - type: array - items: - type: string - description: The modalities for the response. - instructions: - type: string - description: Instructions for the model. - voice: - type: string - description: The voice the model uses to respond - one of `alloy`, `echo`, or - `shimmer`. - output_audio_format: - type: string - description: The format of output audio. - tools: - type: array - description: Tools (functions) available to the model. - items: - type: object - properties: - type: - type: string - description: The type of the tool. - name: - type: string - description: The name of the function. - description: - type: string - description: The description of the function. - parameters: - type: object - description: Parameters of the function in JSON Schema. - tool_choice: - type: string - description: How the model chooses tools. - temperature: - type: number - description: Sampling temperature. - max_output_tokens: - oneOf: - - type: integer - - type: string - enum: - - inf - description: Maximum number of output tokens for a single assistant response, - inclusive of tool calls. Provide an integer between 1 and 4096 - to limit output tokens, or "inf" for the maximum available - tokens for a given model. Defaults to "inf". + $ref: "#/components/schemas/RealtimeResponse" required: - type - response @@ -17603,7 +17743,13 @@ components: } RealtimeClientEventSessionUpdate: type: object - description: Send this event to update the session’s default configuration. + description: Send this event to update the session’s default configuration. The + client may send this event at any time to update the session + configuration, and any field may be updated at any time, except for + "voice". The server will respond with a `session.updated` event that + shows the full effective configuration. Only fields that are present are + updated, thus the correct way to clear a field like "instructions" is to + pass an empty string. properties: event_id: type: string @@ -17612,90 +17758,7 @@ components: type: string description: The event type, must be "session.update". session: - type: object - description: Session configuration to update. - properties: - modalities: - type: array - items: - type: string - description: The set of modalities the model can respond with. To disable audio, - set this to ["text"]. - instructions: - type: string - description: The default system instructions prepended to model calls. - voice: - type: string - description: The voice the model uses to respond - one of `alloy`, `echo`, - or `shimmer`. Cannot be changed once the model has responded - with audio at least once. - input_audio_format: - type: string - description: The format of input audio. Options are "pcm16", "g711_ulaw", or - "g711_alaw". - output_audio_format: - type: string - description: The format of output audio. Options are "pcm16", "g711_ulaw", or - "g711_alaw". - input_audio_transcription: - type: object - description: Configuration for input audio transcription. Can be set to `null` - to turn off. - properties: - model: - type: string - description: The model to use for transcription (e.g., "whisper-1"). - turn_detection: - type: object - description: Configuration for turn detection. Can be set to `null` to turn off. - properties: - type: - type: string - description: Type of turn detection, only "server_vad" is currently supported. - threshold: - type: number - description: Activation threshold for VAD (0.0 to 1.0). - prefix_padding_ms: - type: integer - description: Amount of audio to include before speech starts (in milliseconds). - silence_duration_ms: - type: integer - description: Duration of silence to detect speech stop (in milliseconds). - tools: - type: array - description: Tools (functions) available to the model. - items: - type: object - properties: - type: - type: string - description: The type of the tool, e.g., "function". - name: - type: string - description: The name of the function. - description: - type: string - description: The description of the function. - parameters: - type: object - description: Parameters of the function in JSON Schema. - tool_choice: - type: string - description: How the model chooses tools. Options are "auto", "none", - "required", or specify a function. - temperature: - type: number - description: Sampling temperature for the model. - max_output_tokens: - oneOf: - - type: integer - - type: string - enum: - - inf - description: Maximum number of output tokens for a single assistant response, - inclusive of tool calls. Provide an integer between 1 and 4096 - to limit output tokens, or "inf" for the maximum available - tokens for a given model. Defaults to "inf". + $ref: "#/components/schemas/RealtimeSession" required: - type - session @@ -17719,13 +17782,13 @@ components: "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, - "silence_duration_ms": 200 + "silence_duration_ms": 500 }, "tools": [ { "type": "function", "name": "get_weather", - "description": "Get the current weather for a location.", + "description": "Get the current weather for a location, tell the user you are fetching the weather.", "parameters": { "type": "object", "properties": { @@ -17737,9 +17800,157 @@ components: ], "tool_choice": "auto", "temperature": 0.8, - "max_output_tokens": null + "max_response_output_tokens": "inf" } } + RealtimeConversationItem: + type: object + description: The item to add to the conversation. + properties: + id: + type: string + description: The unique ID of the item, this can be generated by the client to + help manage server-side context, but is not required because the + server will generate one if not provided. + type: + type: string + description: The type of the item (`message`, `function_call`, + `function_call_output`). + status: + type: string + description: The status of the item (`completed`, `incomplete`). These have no + effect on the conversation, but are accepted for consistency with + the `conversation.item.created` event. + role: + type: string + description: The role of the message sender (`user`, `assistant`, `system`), + only applicable for `message` items. + content: + type: array + description: The content of the message, applicable for `message` items. Message + items with a role of `system` support only `input_text` content, + message items of role `user` support `input_text` and `input_audio` + content, and message items of role `assistant` support `text` + content. + items: + type: object + properties: + type: + type: string + description: The content type (`input_text`, `input_audio`, `text`). + text: + type: string + description: The text content, used for `input_text` and `text` content types. + audio: + type: string + description: Base64-encoded audio bytes, used for `input_audio` content type. + transcript: + type: string + description: The transcript of the audio, used for `input_audio` content type. + call_id: + type: string + description: The ID of the function call (for `function_call` and + `function_call_output` items). If passed on a `function_call_output` + item, the server will check that a `function_call` item with the + same ID exists in the conversation history. + name: + type: string + description: The name of the function being called (for `function_call` items). + arguments: + type: string + description: The arguments of the function call (for `function_call` items). + output: + type: string + description: The output of the function call (for `function_call_output` items). + RealtimeResponse: + type: object + description: The response resource. + properties: + id: + type: string + description: The unique ID of the response. + object: + type: string + description: The object type, must be `realtime.response`. + status: + type: string + description: The final status of the response (`completed`, `cancelled`, + `failed`, `incomplete`). + status_details: + type: object + description: Additional details about the status. + properties: + type: + type: string + description: The type of error that caused the response to fail, corresponding + with the `status` field (`cancelled`, `incomplete`, `failed`). + reason: + type: string + description: The reason the Response did not complete. For a `cancelled` + Response, one of `turn_detected` (the server VAD detected a new + start of speech) or `client_cancelled` (the client sent a cancel + event). For an `incomplete` Response, one of `max_output_tokens` + or `content_filter` (the server-side safety filter activated and + cut off the response). + error: + type: object + description: A description of the error that caused the response to fail, + populated when the `status` is `failed`. + properties: + type: + type: string + description: The type of error. + code: + type: string + description: Error code, if any. + output: + type: array + description: The list of output items generated by the response. + items: + type: object + description: An item in the response output. + usage: + type: object + description: Usage statistics for the Response, this will correspond to billing. + A Realtime API session will maintain a conversation context and + append new Items to the Conversation, thus output from previous + turns (text and audio tokens) will become the input for later turns. + properties: + total_tokens: + type: integer + description: The total number of tokens in the Response including input and + output text and audio tokens. + input_tokens: + type: integer + description: The number of input tokens used in the Response, including text and + audio tokens. + output_tokens: + type: integer + description: The number of output tokens sent in the Response, including text + and audio tokens. + input_token_details: + type: object + description: Details about the input tokens used in the Response. + properties: + cached_tokens: + type: integer + description: The number of cached tokens used in the Response. + text_tokens: + type: integer + description: The number of text tokens used in the Response. + audio_tokens: + type: integer + description: The number of audio tokens used in the Response. + output_token_details: + type: object + description: Details about the output tokens used in the Response. + properties: + text_tokens: + type: integer + description: The number of text tokens used in the Response. + audio_tokens: + type: integer + description: The number of audio tokens used in the Response. RealtimeServerEventConversationCreated: type: object description: Returned when a conversation is created. Emitted right after @@ -17779,67 +17990,25 @@ components: } RealtimeServerEventConversationItemCreated: type: object - description: Returned when a conversation item is created. + description: >- + Returned when a conversation item is created. There are several + scenarios that produce this event: + - The server is generating a Response, which if successful will produce either one or two Items, which will be of type `message` (role `assistant`) or type `function_call`. + - The input audio buffer has been committed, either by the client or the server (in `server_vad` mode). The server will take the content of the input audio buffer and add it to a new user message Item. + - The client has sent a `conversation.item.create` event to add a new Item to the Conversation. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "conversation.item.created". + description: The event type, must be `conversation.item.created`. previous_item_id: type: string - description: The ID of the preceding item. + description: The ID of the preceding item in the Conversation context, allows + the client to understand the order of the conversation. item: - type: object - description: The item that was created. - properties: - id: - type: string - description: The unique ID of the item. - object: - type: string - description: The object type, must be "realtime.item". - type: - type: string - description: The type of the item ("message", "function_call", - "function_call_output"). - status: - type: string - description: The status of the item ("completed", "in_progress", "incomplete"). - role: - type: string - description: The role associated with the item ("user", "assistant", "system"). - content: - type: array - description: The content of the item. - items: - type: object - properties: - type: - type: string - description: The content type ("text", "audio", "input_text", "input_audio"). - text: - type: string - description: The text content. - audio: - type: string - description: Base64-encoded audio data. - transcript: - type: string - description: The transcript of the audio. - call_id: - type: string - description: The ID of the function call (for "function_call" items). - name: - type: string - description: The name of the function being called. - arguments: - type: string - description: The arguments of the function call. - output: - type: string - description: The output of the function call (for "function_call_output" items). + $ref: "#/components/schemas/RealtimeConversationItem" required: - event_id - type @@ -17862,21 +18031,25 @@ components: "content": [ { "type": "input_audio", - "transcript": null + "transcript": "hello how are you", + "audio": "base64encodedaudio==" } ] } } RealtimeServerEventConversationItemDeleted: type: object - description: Returned when an item in the conversation is deleted. + description: Returned when an item in the conversation is deleted by the client + with a `conversation.item.delete` event. This event is used to + synchronize the server's understanding of the conversation history with + the client's view. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "conversation.item.deleted". + description: The event type, must be `conversation.item.deleted`. item_id: type: string description: The ID of the item that was deleted. @@ -17895,8 +18068,18 @@ components: } RealtimeServerEventConversationItemInputAudioTranscriptionCompleted: type: object - description: Returned when input audio transcription is enabled and a - transcription succeeds. + description: >- + This event is the output of audio transcription for user audio written + to the user audio buffer. Transcription begins when the input audio + buffer is committed by the client or server (in `server_vad` mode). + Transcription runs asynchronously with Response creation, so this event + may come before or after the Response events. + + Realtime API models accept audio natively, and thus input transcription + is a separate process run on a separate ASR (Automatic Speech + Recognition) model, currently always `whisper-1`. Thus the transcript + may diverge somewhat from the model's interpretation, and should be + treated as a rough guide. properties: event_id: type: string @@ -17904,10 +18087,10 @@ components: type: type: string description: The event type, must be - "conversation.item.input_audio_transcription.completed". + `conversation.item.input_audio_transcription.completed`. item_id: type: string - description: The ID of the user message item. + description: The ID of the user message item containing the audio. content_index: type: integer description: The index of the content part containing the audio. @@ -17934,7 +18117,9 @@ components: RealtimeServerEventConversationItemInputAudioTranscriptionFailed: type: object description: Returned when input audio transcription is configured, and a - transcription request for a user message failed. + transcription request for a user message failed. These events are + separate from other `error` events so that the client can identify the + related Item. properties: event_id: type: string @@ -17942,7 +18127,7 @@ components: type: type: string description: The event type, must be - "conversation.item.input_audio_transcription.failed". + `conversation.item.input_audio_transcription.failed`. item_id: type: string description: The ID of the user message item. @@ -17989,15 +18174,22 @@ components: } RealtimeServerEventConversationItemTruncated: type: object - description: Returned when an earlier assistant audio message item is truncated - by the client. + description: >- + Returned when an earlier assistant audio message item is truncated by + the client with a `conversation.item.truncate` event. This event is used + to synchronize the server's understanding of the audio with the client's + playback. + + This action will truncate the audio and remove the server-side text + transcript to ensure there is no text in the context that hasn't been + heard by the user. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "conversation.item.truncated". + description: The event type, must be `conversation.item.truncated`. item_id: type: string description: The ID of the assistant message item that was truncated. @@ -18026,7 +18218,10 @@ components: } RealtimeServerEventError: type: object - description: Returned when an error occurs. + description: Returned when an error occurs, which could be a client problem or a + server problem. Most errors are recoverable and the session will stay + open, we recommend to implementors to monitor and log error messages by + default. properties: event_id: type: string @@ -18075,14 +18270,15 @@ components: } RealtimeServerEventInputAudioBufferCleared: type: object - description: Returned when the input audio buffer is cleared by the client. + description: Returned when the input audio buffer is cleared by the client with + a `input_audio_buffer.clear` event. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "input_audio_buffer.cleared". + description: The event type, must be `input_audio_buffer.cleared`. required: - event_id - type @@ -18097,14 +18293,16 @@ components: RealtimeServerEventInputAudioBufferCommitted: type: object description: Returned when an input audio buffer is committed, either by the - client or automatically in server VAD mode. + client or automatically in server VAD mode. The `item_id` property is + the ID of the user message item that will be created, thus a + `conversation.item.created` event will also be sent to the client. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "input_audio_buffer.committed". + description: The event type, must be `input_audio_buffer.committed`. previous_item_id: type: string description: The ID of the preceding item after which the new item will be @@ -18129,17 +18327,29 @@ components: } RealtimeServerEventInputAudioBufferSpeechStarted: type: object - description: Returned in server turn detection mode when speech is detected. + description: Sent by the server when in `server_vad` mode to indicate that + speech has been detected in the audio buffer. This can happen any time + audio is added to the buffer (unless speech is already detected). The + client may want to use this event to interrupt audio playback or provide + visual feedback to the user. The client should expect to receive a + `input_audio_buffer.speech_stopped` event when speech stops. The + `item_id` property is the ID of the user message item that will be + created when speech stops and will also be included in the + `input_audio_buffer.speech_stopped` event (unless the client manually + commits the audio buffer during VAD activation). properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "input_audio_buffer.speech_started". + description: The event type, must be `input_audio_buffer.speech_started`. audio_start_ms: type: integer - description: Milliseconds since the session started when speech was detected. + description: Milliseconds from the start of all audio written to the buffer + during the session when speech was first detected. This will + correspond to the beginning of audio sent to the model, and thus + includes the `prefix_padding_ms` configured in the Session. item_id: type: string description: The ID of the user message item that will be created when speech @@ -18161,17 +18371,22 @@ components: } RealtimeServerEventInputAudioBufferSpeechStopped: type: object - description: Returned in server turn detection mode when speech stops. + description: Returned in `server_vad` mode when the server detects the end of + speech in the audio buffer. The server will also send an + `conversation.item.created` event with the user message item that is + created from the audio buffer. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "input_audio_buffer.speech_stopped". + description: The event type, must be `input_audio_buffer.speech_stopped`. audio_end_ms: type: integer - description: Milliseconds since the session started when speech stopped. + description: Milliseconds since the session started when speech stopped. This + will correspond to the end of audio sent to the model, and thus + includes the `min_silence_duration_ms` configured in the Session. item_id: type: string description: The ID of the user message item that will be created. @@ -18192,15 +18407,17 @@ components: } RealtimeServerEventRateLimitsUpdated: type: object - description: Emitted after every "response.done" event to indicate the updated - rate limits. + description: Emitted at the beginning of a Response to indicate the updated rate + limits. When a Response is created some tokens will be "reserved" for + the output tokens, the rate limits shown here reflect that reservation, + which is then adjusted accordingly once the Response is completed. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "rate_limits.updated". + description: The event type, must be `rate_limits.updated`. rate_limits: type: array description: List of rate limit information. @@ -18209,8 +18426,7 @@ components: properties: name: type: string - description: The name of the rate limit ("requests", "tokens", "input_tokens", - "output_tokens"). + description: The name of the rate limit (`requests`, `tokens`). limit: type: integer description: The maximum allowed value for the rate limit. @@ -18246,7 +18462,7 @@ components: } ] } - RealtimeServerEventResponseAudioDelta: + RealtimeServerEventOutputAudioDelta: type: object description: Returned when the model-generated audio is updated. properties: @@ -18292,7 +18508,7 @@ components: "content_index": 0, "delta": "Base64EncodedAudioDelta" } - RealtimeServerEventResponseAudioDone: + RealtimeServerEventOutputAudioDone: type: object description: Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled. @@ -18334,7 +18550,7 @@ components: "output_index": 0, "content_index": 0 } - RealtimeServerEventResponseAudioTranscriptDelta: + RealtimeServerEventOutputAudioTranscriptDelta: type: object description: Returned when the model-generated transcription of audio output is updated. @@ -18381,7 +18597,7 @@ components: "content_index": 0, "delta": "Hello, how can I a" } - RealtimeServerEventResponseAudioTranscriptDone: + RealtimeServerEventOutputAudioTranscriptDone: type: object description: Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, @@ -18560,39 +18776,16 @@ components: type: object description: Returned when a new Response is created. The first event of response creation, where the response is in an initial state of - "in_progress". + `in_progress`. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "response.created". + description: The event type, must be `response.created`. response: - type: object - description: The response resource. - properties: - id: - type: string - description: The unique ID of the response. - object: - type: string - description: The object type, must be "realtime.response". - status: - type: string - description: The status of the response ("in_progress"). - status_details: - type: object - description: Additional details about the status. - output: - type: array - description: The list of output items generated by the response. - items: - type: object - description: An item in the response output. - usage: - type: object - description: Usage statistics for the response. + $ref: "#/components/schemas/RealtimeResponse" required: - event_id - type @@ -18616,7 +18809,9 @@ components: RealtimeServerEventResponseDone: type: object description: Returned when a Response is done streaming. Always emitted, no - matter the final state. + matter the final state. The Response object included in the + `response.done` event will include all output Items in the Response but + will omit the raw audio data. properties: event_id: type: string @@ -18625,31 +18820,7 @@ components: type: string description: The event type, must be "response.done". response: - type: object - description: The response resource. - properties: - id: - type: string - description: The unique ID of the response. - object: - type: string - description: The object type, must be "realtime.response". - status: - type: string - description: The final status of the response ("completed", "cancelled", - "failed", "incomplete"). - status_details: - type: object - description: Additional details about the status. - output: - type: array - description: The list of output items generated by the response. - items: - type: object - description: An item in the response output. - usage: - type: object - description: Usage statistics for the response. + $ref: "#/components/schemas/RealtimeResponse" required: - event_id - type @@ -18682,9 +18853,18 @@ components: } ], "usage": { - "total_tokens": 50, - "input_tokens": 20, - "output_tokens": 30 + "total_tokens":275, + "input_tokens":127, + "output_tokens":148, + "input_token_details": { + "cached_tokens":0, + "text_tokens":119, + "audio_tokens":8 + }, + "output_token_details": { + "text_tokens":36, + "audio_tokens":112 + } } } } @@ -18784,58 +18964,22 @@ components: } RealtimeServerEventResponseOutputItemAdded: type: object - description: Returned when a new Item is created during response generation. + description: Returned when a new Item is created during Response generation. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "response.output_item.added". + description: The event type, must be `response.output_item.added`. response_id: type: string - description: The ID of the response to which the item belongs. + description: The ID of the Response to which the item belongs. output_index: type: integer - description: The index of the output item in the response. + description: The index of the output item in the Response. item: - type: object - description: The item that was added. - properties: - id: - type: string - description: The unique ID of the item. - object: - type: string - description: The object type, must be "realtime.item". - type: - type: string - description: The type of the item ("message", "function_call", - "function_call_output"). - status: - type: string - description: The status of the item ("in_progress", "completed"). - role: - type: string - description: The role associated with the item ("assistant"). - content: - type: array - description: The content of the item. - items: - type: object - properties: - type: - type: string - description: The content type ("text", "audio"). - text: - type: string - description: The text content. - audio: - type: string - description: Base64-encoded audio data. - transcript: - type: string - description: The transcript of the audio. + $ref: "#/components/schemas/RealtimeConversationItem" required: - event_id - type @@ -18870,51 +19014,15 @@ components: description: The unique ID of the server event. type: type: string - description: The event type, must be "response.output_item.done". + description: The event type, must be `response.output_item.done`. response_id: type: string - description: The ID of the response to which the item belongs. + description: The ID of the Response to which the item belongs. output_index: type: integer - description: The index of the output item in the response. + description: The index of the output item in the Response. item: - type: object - description: The completed item. - properties: - id: - type: string - description: The unique ID of the item. - object: - type: string - description: The object type, must be "realtime.item". - type: - type: string - description: The type of the item ("message", "function_call", - "function_call_output"). - status: - type: string - description: The final status of the item ("completed", "incomplete"). - role: - type: string - description: The role associated with the item ("assistant"). - content: - type: array - description: The content of the item. - items: - type: object - properties: - type: - type: string - description: The content type ("text", "audio"). - text: - type: string - description: The text content. - audio: - type: string - description: Base64-encoded audio data. - transcript: - type: string - description: The transcript of the audio. + $ref: "#/components/schemas/RealtimeConversationItem" required: - event_id - type @@ -19040,103 +19148,18 @@ components: } RealtimeServerEventSessionCreated: type: object - description: Returned when a session is created. Emitted automatically when a - new connection is established. + description: Returned when a Session is created. Emitted automatically when a + new connection is established as the first server event. This event will + contain the default Session configuration. properties: event_id: type: string description: The unique ID of the server event. type: type: string - description: The event type, must be "session.created". + description: The event type, must be `session.created`. session: - type: object - description: The session resource. - properties: - id: - type: string - description: The unique ID of the session. - object: - type: string - description: The object type, must be "realtime.session". - model: - type: string - description: The default model used for this session. - modalities: - type: array - items: - type: string - description: The set of modalities the model can respond with. - instructions: - type: string - description: The default system instructions. - voice: - type: string - description: The voice the model uses to respond - one of `alloy`, `echo`, or - `shimmer`. - input_audio_format: - type: string - description: The format of input audio. - output_audio_format: - type: string - description: The format of output audio. - input_audio_transcription: - type: object - description: Configuration for input audio transcription. - properties: - enabled: - type: boolean - description: Whether input audio transcription is enabled. - model: - type: string - description: The model used for transcription. - turn_detection: - type: object - description: Configuration for turn detection. - properties: - type: - type: string - description: The type of turn detection ("server_vad" or "none"). - threshold: - type: number - description: Activation threshold for VAD. - prefix_padding_ms: - type: integer - description: Audio included before speech starts (in milliseconds). - silence_duration_ms: - type: integer - description: Duration of silence to detect speech stop (in milliseconds). - tools: - type: array - description: Tools (functions) available to the model. - items: - type: object - properties: - type: - type: string - description: The type of the tool. - name: - type: string - description: The name of the function. - description: - type: string - description: The description of the function. - parameters: - type: object - description: Parameters of the function in JSON Schema. - tool_choice: - type: string - description: How the model chooses tools. - temperature: - type: number - description: Sampling temperature. - max_output_tokens: - oneOf: - - type: integer - - type: string - enum: - - inf - description: Maximum number of output tokens. + $ref: "#/components/schemas/RealtimeSession" required: - event_id - type @@ -19167,12 +19190,13 @@ components: "tools": [], "tool_choice": "auto", "temperature": 0.8, - "max_output_tokens": null + "max_response_output_tokens": null } } RealtimeServerEventSessionUpdated: type: object - description: Returned when a session is updated. + description: Returned when a session is updated with a `session.update` event, + unless there is an error. properties: event_id: type: string @@ -19181,93 +19205,7 @@ components: type: string description: The event type, must be "session.updated". session: - type: object - description: The updated session resource. - properties: - id: - type: string - description: The unique ID of the session. - object: - type: string - description: The object type, must be "realtime.session". - model: - type: string - description: The default model used for this session. - modalities: - type: array - items: - type: string - description: The set of modalities the model can respond with. - instructions: - type: string - description: The default system instructions. - voice: - type: string - description: The voice the model uses to respond - one of `alloy`, `echo`, or - `shimmer`. - input_audio_format: - type: string - description: The format of input audio. - output_audio_format: - type: string - description: The format of output audio. - input_audio_transcription: - type: object - description: Configuration for input audio transcription. - properties: - enabled: - type: boolean - description: Whether input audio transcription is enabled. - model: - type: string - description: The model used for transcription. - turn_detection: - type: object - description: Configuration for turn detection. - properties: - type: - type: string - description: The type of turn detection ("server_vad" or "none"). - threshold: - type: number - description: Activation threshold for VAD. - prefix_padding_ms: - type: integer - description: Audio included before speech starts (in milliseconds). - silence_duration_ms: - type: integer - description: Duration of silence to detect speech stop (in milliseconds). - tools: - type: array - description: Tools (functions) available to the model. - items: - type: object - properties: - type: - type: string - description: The type of the tool. - name: - type: string - description: The name of the function. - description: - type: string - description: The description of the function. - parameters: - type: object - description: Parameters of the function in JSON Schema. - tool_choice: - type: string - description: How the model chooses tools. - temperature: - type: number - description: Sampling temperature. - max_output_tokens: - oneOf: - - type: integer - - type: string - enum: - - inf - description: Maximum number of output tokens. + $ref: "#/components/schemas/RealtimeSession" required: - event_id - type @@ -19289,18 +19227,127 @@ components: "input_audio_format": "pcm16", "output_audio_format": "pcm16", "input_audio_transcription": { - "enabled": true, "model": "whisper-1" }, - "turn_detection": { - "type": "none" - }, + "turn_detection": null, "tools": [], "tool_choice": "none", "temperature": 0.7, - "max_output_tokens": 200 + "max_response_output_tokens": 200 } } + RealtimeSession: + type: object + description: Realtime session object configuration. + properties: + modalities: + type: array + items: + type: string + description: The set of modalities the model can respond with. To disable audio, + set this to ["text"]. + instructions: + type: string + description: >- + The default system instructions (i.e. system message) prepended to + model calls. This field allows the client to guide the model on + desired responses. The model can be instructed on response content + and format, (e.g. "be extremely succinct", "act friendly", "here are + examples of good responses") and on audio behavior (e.g. "talk + quickly", "inject emotion into your voice", "laugh frequently"). The + instructions are not guaranteed to be followed by the model, but + they provide guidance to the model on the desired behavior. + + Note that the server sets default instructions which will be used if + this field is not set and are visible in the `session.created` event + at the start of the session. + voice: + type: string + description: The voice the model uses to respond - one of `alloy`, `echo`, + or `shimmer`. Cannot be changed once the model has responded with + audio at least once. + input_audio_format: + type: string + description: The format of input audio. Options are `pcm16`, `g711_ulaw`, or + `g711_alaw`. + output_audio_format: + type: string + description: The format of output audio. Options are `pcm16`, `g711_ulaw`, or + `g711_alaw`. + input_audio_transcription: + type: object + description: Configuration for input audio transcription, defaults to off and + can be set to `null` to turn off once on. Input audio transcription + is not native to the model, since the model consumes audio directly. + Transcription runs asynchronously through Whisper and should be + treated as rough guidance rather than the representation understood + by the model. + properties: + model: + type: string + description: The model to use for transcription, `whisper-1` is the only + currently supported model. + turn_detection: + type: object + description: Configuration for turn detection. Can be set to `null` to turn off. + Server VAD means that the model will detect the start and end of + speech based on audio volume and respond at the end of user speech. + properties: + type: + type: string + description: Type of turn detection, only `server_vad` is currently supported. + threshold: + type: number + description: Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A + higher threshold will require louder audio to activate the + model, and thus might perform better in noisy environments. + prefix_padding_ms: + type: integer + description: Amount of audio to include before the VAD detected speech (in + milliseconds). Defaults to 300ms. + silence_duration_ms: + type: integer + description: Duration of silence to detect speech stop (in milliseconds). + Defaults to 500ms. With shorter values the model will respond + more quickly, but may jump in on short pauses from the user. + tools: + type: array + description: Tools (functions) available to the model. + items: + type: object + properties: + type: + type: string + description: The type of the tool, i.e. `function`. + name: + type: string + description: The name of the function. + description: + type: string + description: The description of the function, including guidance on when and how + to call it, and guidance about what to tell the user when + calling (if anything). + parameters: + type: object + description: Parameters of the function in JSON Schema. + tool_choice: + type: string + description: How the model chooses tools. Options are `auto`, `none`, + `required`, or specify a function. + temperature: + type: number + description: Sampling temperature for the model, limited to [0.6, 1.2]. Defaults + to 0.8. + max_response_output_tokens: + oneOf: + - type: integer + - type: string + enum: + - inf + description: Maximum number of output tokens for a single assistant response, + inclusive of tool calls. Provide an integer between 1 and 4096 to + limit output tokens, or `inf` for the maximum available tokens for a + given model. Defaults to `inf`. ResponseFormatJsonObject: type: object properties: @@ -22145,16 +22192,16 @@ x-oaiMeta: key: RealtimeServerEventResponseTextDone path: - type: object - key: RealtimeServerEventResponseAudioTranscriptDelta + key: RealtimeServerEventOutputAudioTranscriptDelta path: - type: object - key: RealtimeServerEventResponseAudioTranscriptDone + key: RealtimeServerEventOutputAudioTranscriptDone path: - type: object - key: RealtimeServerEventResponseAudioDelta + key: RealtimeServerEventOutputAudioDelta path: - type: object - key: RealtimeServerEventResponseAudioDone + key: RealtimeServerEventOutputAudioDone path: - type: object key: RealtimeServerEventResponseFunctionCallArgumentsDelta