Skip to content

Conversation

@drch-
Copy link

@drch- drch- commented Oct 19, 2025

towards #13815 as discussed with @amanda-tarafa

updated after implementing feedback from stephentoub

Summary: small fix to correctly hande null FinishReason in streamed updates, and a larger one to bridge the gap between our Part.ThoughtSignature and Microsoft.Extensions.AI's TextReasoningContent.

First the small one:

FinishReason

When we send streaming updates, VertexAI sets FinishReason to Unspecified on the interim results and the overall reason (typically Stop unless otherwise filtered) on the final update. I've updated the mapping of our FinishReason.Unspecified to a null M.E.AI.ChatFinishReason, which is how the OpenAI client handles it.

Question: How should we handle new FinishReasons in the catch-all? Or alternatively, do we expect future FinishReasons to be ContentFilter or Stop? M.E.AI has only Stop, Length, ToolCalls, ContentFilter. I've left it as is (Stop) but I'm leaning towards ContentFilter.

Thought Summaries and Thought Signatures

Gemini 2.5 models can think. Callers can specify a thinkingBudget. 2.5 Flash variants allow a minimum of 0 (disabled). 2.5 Pro has a minimum of 128 (ie, thinking is always enabled).

Callers can specify "includeThoughts" : true in a request. They are purely informational summaries of the raw thoughts, and can provide the developer with insights into reasoning or an end-user a responsive experience during the thinking phase. They are returned as text Parts with "thought":true

ThoughtSignatures are particularly effective for function calling. If the thinking budget is > 0, AND function calling is enabled, Gemini will include a ThoughtSignature on the first non-thinking Part in the response. A ThoughtSignature is an encrypted representation of the model's internal thought process. (Note, the thought signature appearing on the first Part is an observation, not specified in the API).

Example response where includeThoughts is true, and thinkingBudget > 0

  {
    "candidates": [
        {
            "content": {
                "role": "model",
                "parts": [
                    {
                        // thought
                        "text": "Okay, here's what I'm thinking:\n\nFirst, this user wants to plan a day trip, and they've got some weather preferences. Great. I see that I can use `get_current_weather` – that seems like a relevant tool.\n\nNow, I need...(snip)...",
                        "thought": true
                    },
                    {
                        // text part
                        "text": "I can certainly look up the weather for you in those cities. That way you can decide which city has the best weather for you to walk around in.",
                        "thoughtSignature": "CrMNAePx/16T5mM0WQgdAtYLJ...(snip)..."
                    },
                    {   
                        // functionCall part
                        "functionCall": {"name": "get_current_weather", "args": { "location": "Prague" } } 
                    }
                ]
            }
        }
    ]
    // ...
}

Gemini is stateless, so in a multi-turn conversation, the caller has to include all the parts of the conversation so far. ThoughtSignatures should also be included to provide the model's raw thoughts back to itself, which are otherwise lost. In 1P libraries, this is straightforward - append the response's candidates[].content to the next request.

M.E. AI uses a single type, TextReasoningContent, for both. It has a Text property and ProtectedData property. The above snippet is converted to the following AIContent subclasses:

// parts[0]
new TextReasoningContent("Okay, here's what I'm thinking...")

// parts[1]
new TextReasoningContent(null) { ProtectedData = "CrMNa..."} // extract the ThoughtSignature from the part
new TextContent("I can certainly look up the weather for you in those cities...")

// parts[2]
new FunctionCall(callId: "get_current_weather", name: "get_current_weather", args: new Dictionary<string,object?> {{"location", "Prague"}})

Users of the library will also simply copy these contents into the next request, so we have to rebuild the new request from these objects. We ignore any thought summary Text (ie, the first TextReasoningContent above). If there are any TextReasoningContent with non-null ProtectedData, we set this as the ThoughtSignature the first Part in the list. There will only ever be 0 or 1 ProtectedData in a given turn.

Streaming and non-streaming

Responses can be streamed to provide the caller with content while the model generates it. In particular for text content (both thoughts and model text output). These partial responses of ChatResponseUpdate are coallesced once the final response has been received. In our case above, our two TextReasoningContent in a streaming response will be merged into a single one with both Text and ProtectedData set. Note that ThoughtSignatures are not meant to be merged and M.E.AI will also keep multiple TextReasoningContents instances during the merge if they have ProtectedData set. (fix will go out shortly)

…eaming updates

fix: correct FinishReason for streaming updates
  VertexAI returns FinishReason.Unspecified on partial updates.  This was
  incorrectly mapped to ChatFinishReason.Stop and should be null.

fix: add ModelArmor to support FinishReason, mapping it to
  ChatFinishReason.ContentFilter

feat: support Gemini Thought Signatures
  use M.E.AI's TextReasoningContent for thought summaries (Part.Thought == true)
  store ThoughtSignature for individal Parts in the
  AdditionalProperties dictionary of the corresponding AIContent.
@drch- drch- force-pushed the vertexai-extensions branch from 38cc036 to 1aaf075 Compare October 19, 2025 20:22
@stephentoub
Copy link
Contributor

stephentoub commented Oct 20, 2025

Just saw this, thanks.

I didn't follow why TextReasoningContent.ProtectedData is insufficient. Is the issue fundamentally what you wrote: "In Gemini, reasoning isn't a separate Part, but rather a property on another valid Part (such as Text or FunctionCall)"? Did you consider just translating any part that has a thought signature instead into multiple AIContent instances, one that's a TextReasoningContent with the thought signature as ProtectedData and one for the other content (e.g. the text, the function call, etc.)? A TextReasoningContent without Text is only good for roundtripping back to the original model, so a consumer is just going to ignore it, and the IChatClient implementation can reassembly things when sending back to the model so that a TextReasoningContent just contributes its ProtectedData as the ThoughtSignature for the subsequent part.

@stephentoub
Copy link
Contributor

M.E.AI has only Stop, Length, ToolCalls, ContentFilter

Those are the only pre-defined values (static properties on ChatFinishReason), but ChatFinishReason is a string-wrapping struct, so technically you could store whatever you want.

@drch-
Copy link
Author

drch- commented Oct 20, 2025

I didn't follow why TextReasoningContent.ProtectedData is insufficient. Is the issue fundamentally what you wrote: "In Gemini, reasoning isn't a separate Part, but rather a property on another valid Part (such as Text or FunctionCall)"? Did you consider just translating any part that has a thought signature instead into multiple AIContent instances, one that's a TextReasoningContent with the thought signature as ProtectedData and one for the other content

No I didn't. I also mistakenly assumed that the thought signature could appear on any part in the non-streaming case. It is indeed on the first content part so this seems doable and cleaner.

  {
    "candidates": [
        {
            "content": {
                "role": "model",
                "parts": [
                    {
                        "text": "Okay, here's what I'm thinking:\n\nFirst, this user wants to plan a day trip, and they've got some weather preferences. Great. I see that I can use `get_current_weather` – that seems like a relevant tool.\n\nNow, I need...(snip)...",
                        "thought": true
                    },
                    {
                        "text": "I can certainly look up the weather for you in those cities. That way you can decide which city has the best weather for you to walk around in.",
                        "thoughtSignature": "CrMNAePx/16T5mM0WQgdAtYLJ...(snip)..."
                    },
                    { "functionCall": {"name": "get_current_weather", "args": { "location": "Salzburg"} } },
                    { "functionCall": {"name": "get_current_weather", "args": { "location": "Paris" } } },
                    { "functionCall": {"name": "get_current_weather", "args": { "location": "Prague" } } }
                ]
            },
            "finishReason": "STOP",
            "avgLogprobs": -0.75160148962220152
        }
    ]
    ...
}

I'll update the PR with that approach. Thanks!

… signatures

WIP: broken on streaming results
@drch-
Copy link
Author

drch- commented Oct 20, 2025

I've updated with the suggested approach and it's cleaner and works well for the non-streaming use case.

The integration test for the streaming case is failing. The individual AIContents are all there but ToChatResponse flattens everything down to a single TextContent and a single TextReasoningContent and loses the ProtectedData.

I haven't debugged it yet extensively and will pick it up again in a few hours.

@stephentoub
Copy link
Contributor

stephentoub commented Oct 20, 2025

It shouldn't be losing the ProtectedData, but if it is, or if the logic of ToChatResponse needs to be tweaked, we can do that (ideally today if possible, to catch our next build) I will take a look when I'm at my desk. The intent was that a TextReasoningContemt A could be combined with a following TextReasoningContent B if the former didn't have ProtectedData, even if the latter did, and then the combined text plus any ProtectedData from B would be on the combined instance. It's possible that's not happening.

Streaming + IncludeThoughts
Streaming + not IncludeThoughts
Non-Streaming + IncludeThoughts
Non-Streaming + not IncludeThoughts
@drch-
Copy link
Author

drch- commented Oct 20, 2025

To clarify the implementation, given the following response...

[{
    "text": "Okay, here's what I'm thinking:\n\nFirst, this user wants to plan a day trip, and they've got some weather preferences. Great. I see that I can use `get_current_weather` – that seems like a relevant tool.\n\nNow, I need...(snip)...",
    "thought": true
},
{
    "text": "I can certainly look up the weather for you in those cities. That way you can decide which city has the best weather for you to walk around in.",
    "thoughtSignature": "CrMNAePx/16T5mM0WQgdAtYLJ...(snip)..."
}]

... I'm parsing it into:

new TextReasoningContent(parts[0].text),
new TextReasoningContent(null) { ProtectedData = parts[1].thoughtSignature },
new TextContent(part[1].text)

When we don't specify IncludeThoughts=true in the request, everything is fine since we only have a single TextReasoningContent.

@stephentoub
Copy link
Contributor

stephentoub commented Oct 20, 2025

My suggestion had been to put the reasoning content with the ProtectedData first. Does that fix it, or no?

EDIT: I misread what you wrote. I see, the first text part is noted as a thought but then the second text part has a thought signature.

@drch-
Copy link
Author

drch- commented Oct 20, 2025

Here's a simple example without involving the PredictionServiceChatClient:

    [Fact]
    public void ToChatResponse_SimpleText_WithThoughtsAndThoughtSignature()
    {
        // Response #1 [{ "text" : "This is my thought.", "thought": true }]
        var textReasoningContent1 = new TextReasoningContent("This is my thought.");

        // Response #2 [{ "text", "This is my response.", "thoughtSignature" : "someThoughtSignatureBase64" }]
        var textSignatureContent = new TextReasoningContent(null) { ProtectedData = "someThoughtSignatureBase64" };
        var textContent1 = new TextContent("This is my response.");

        List<ChatResponseUpdate> responses = [
            new ChatResponseUpdate(ChatRole.Assistant, [textReasoningContent1]), //1
            new ChatResponseUpdate(ChatRole.Assistant, [textSignatureContent, textContent1]), //2
        ];

        var chatResponse = responses.ToChatResponse();
        var contents = chatResponse.Messages[0].Contents;

        Assert.Equal("This is my response.", chatResponse.Text);
        Assert.Equal("This is my thought.", contents.OfType<TextReasoningContent>().First(trc => trc.ProtectedData is null).Text);
        Assert.Equal(1, contents.OfType<TextContent>().Count());
        Assert.Equal(1, contents.OfType<TextReasoningContent>().Count(trc => trc.ProtectedData is null));
        Assert.Equal(1, contents.OfType<TextReasoningContent>().Count(trc => trc.ProtectedData is not null)); // <-- Fails.
    }

Is it reasonable for ToChatResponse not to merge TextReasoningContent when ProtectedData is set? We only expect to have exactly 1 TextReasoningContent with ProtectedData set per turn, but there may be 0 or multiple TextReasoningContent("thought")'s.

@stephentoub
Copy link
Contributor

// <-- Fails.

There's a bug in ToChatResponse I'm fixing right now where the ProtectedData is getting dropped.

Is it reasonable for ToChatResponse not to merge TextReasoningContent when ProtectedData is set?

The intent of the current scheme was that you could have a sequence of TextReasoningContent and they'd be mergeable up to and including the first one that has ProtectedData set, e.g.

TRC { Text = "Hello ", ProtectedData = null }
TRC { Text = "World", ProtectedData = <bytes> }
TRC { Text = "Howdy", ProtectedData = <bytes> }
TRC { Text = "How ", ProtectedData = null }
TRC { Text = "are ", ProtectedData = null }
TRC { Text = "you?", ProtectedData = <bytes> }

would become:

TRC { Text = "Hello World", ProtectedData = <bytes> }
TRC { Text = "Howdy", ProtectedData = <bytes> }
TRC { Text = "How are you?", ProtectedData = <bytes> }

The bug is that right now it's becoming:

TRC { Text = "Hello World", ProtectedData = null }
TRC { Text = "Howdy", ProtectedData = null }
TRC { Text = "How are you?", ProtectedData = null }

Once that's fixed, does that work for your needs, or any combining with something that has ProtectedData set is problematic?

@amanda-tarafa amanda-tarafa self-assigned this Oct 20, 2025
@amanda-tarafa amanda-tarafa self-requested a review October 20, 2025 16:55
@drch-
Copy link
Author

drch- commented Oct 20, 2025

In theory it's workable. Per model turn, we expect 0 or 1 ThoughtSignatures and 0 or many Thoughts (max 1 if non-streaming). It doesn't quite map to our API because a Thought is just a flag on a text part, and a ThoughtSignature is an attribute on Part. But I think we could just attach it to the appropriate first Part Content.Parts in the subequent request to the model (which is what I have already).

From the user perspective, I'm not so sure though. This means we have two uses for TextReasoningContent. One is throw-away and informational and the other is critical and needs to be persisted. And the user needs to inspect the contents of the object to determine what they need to do with it. Even if these are correctly merged, we still have an object that has one throw-away property (Text - which can be quite large) and one critical one (ProtectedData).

Could this warrant a new type for informational Thoughts or a type explicitly for ThoughtSignature/ProtectedData? If we had something like InformationalTextContent and we put Thoughts there, I think it would be more intuitive for the user.

Edit: I must add, when I say it's workable I mean getting the code to work and the tests to pass. I would have to defer to @amanda-tarafa to answer if it works for our needs.

@stephentoub
Copy link
Contributor

stephentoub commented Oct 20, 2025

In theory it's workable

Great

Could this warrant a new type for informational Thoughts or a type explicitly for ThoughtSignature/ProtectedData?

I don't think so? We arrived at this based on looking at how varying services represent this, e.g. other services actually expect them to be merged, such that for example streaming reasoning sends individual text deltas then a final signature and expects both the coalesced text and the signature to be roundtripped later. (Also, at this point adding something the duplicates what's here would, I expect, lead to confusion.)

Even if these are correctly merged, we still have an object that has one throw-away property (Text - which can be quite large) and one critical one (ProtectedData).

While it may be for Google, Text is not throw away for all services, though.

I think it would be more intuitive for the user.

Users generally don't need to look at ProtectedData; it's there just to roundtrip.

@stephentoub
Copy link
Contributor

stephentoub commented Oct 20, 2025

when I say it's workable I mean getting the code to work and the tests to pass

What else is there? That's a genuine question. Thought signatures / encrypted content / protected data / etc. are not user facing; they exist purely to roundtrip data back to the service/model. If everything holds together technically, such that the data can be roundtripped, what other concerns do we have?

@drch-
Copy link
Author

drch- commented Oct 20, 2025

Roger.

I'll need to update the tests to match the expected merged content, but other than that it should work once the fix is there. It already works fine without IncludeThoughts for both non-streaming and streaming, and with IncludeThoughts for non-streaming. I'll add a few more tests around the thought handling.

@stephentoub
Copy link
Contributor

Ok, please let me know if you run into any issues. I'm certainly open to revising the coalescing strategy if it's not workable as is (after the bug fix dotnet/extensions#6936).

@amanda-tarafa
Copy link
Contributor

amanda-tarafa commented Oct 20, 2025

From https://ai.google.dev/gemini-api/docs/thinking#signatures

Other usage limitations to consider with function calling include:

  • Signatures are returned from the model within other parts in the response, for example function calling or text parts. Return the entire response with all parts back to the model in subsequent turns.
  • Don't concatenate parts with signatures together.
  • Don't merge one part with a signature with another part without a signature.

And from https://ai.google.dev/gemini-api/docs/function-calling?example=meeting#thinking

The standard pattern for multi-turn tool use is to append the model's complete previous response to the conversation history.
...
If you modify the conversation history manually ...

  • Always send the thought_signature back to the model inside its original Part.
  • Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought.
  • Don't combine two Parts that both contain signatures, as the signature strings cannot be merged.

Recaping, and let me know if there's something wrong here:

  • We need be ale to recreate the whole original response content as it was, including the original (unmerged) parts and the Thought flag.
  • We shouldn't ignore any thought summaries, etc. Users of the library can do that on their own if they want by cleaning up the AIContents before using them for a new request.
  • The signature may be included on any Vertex AI part type, but it's only supported on M.E.AI.TextReasoningContent through the ProtectedData property.
  • Thought may be true for any Vertex AI part type, though examples assume it's only on text parts, which does not seem a bad assumption, given that Thought == true is what's beeing used to identify thought summaries. But strictly speaking, there's no M.E.AI equivalent property. Instead, M.E.AI.TextReasoningContent is considered to always be a thought and it's the only type of thought there is.
  • A part with a thought signature is not neccesarily a thought.
  • The current M.E.AI coalescing strategy does not work for Gemini models.

Proposal:

  • Represent text parts that are thoughts as TextReasoningContent.
  • Represent text parts that are not thoughts as TextContent.
  • Other part types have corresponding M.E.AI types but:
    • if they are thoughts (unlikely?), we add an AIContent.AdditionalProperties to signal this.
    • alternative: we rely on RawRepresentation to figure out if a non-text part is a thought.
  • Any part (including text parts that are thoughts) that have a signature should be followed by a null-text TextReasoningContent that contains the signature only.
  • @stephentoub Could you support a way for us to say "do not merge these"? Maybe an AIContent.AdditionalProperty?
    • Alternative, we rely on RawRepresentation to figure out that a single TextReasoningContent is really several coalesced parts?

I like the alternatives that rely on RawRepresentation as that implies we don't need to know about any of M.E.AI coalescing strategies or add special fields, but RawRepresentation is ignored for serialization (as it probably should).

What do you think? I can work on this myself but it'll likely be Wednesday.

@drch-
Copy link
Author

drch- commented Oct 21, 2025

If you modify the conversation history manually ...

  • Always send the thought_signature back to the model inside its original Part.
  • Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought.
  • Don't combine two Parts that both contain signatures, as the signature strings cannot be merged.

Recaping, and let me know if there's something wrong here:

  • We need be able to recreate the whole original response content as it was, including the original (unmerged) parts and the Thought flag.

One point to clarify. The requests payload for both the streaming and non-streaming endpoints are identical. The streaming endpoint streams the partial responses as they are generated. IEnumerable.ToChatResponse() is used to flattens these (as opposed to, say, compacting multiple turns into fewer to manage context length). ADK does this too. We don't need to preserve the unflattened form when we round trip to the model. IE the flattened version we recreate is what the response would have been if they hit the non-streaming endpoints.

In practice, this just means flattening any multiple thoughts or multiple text parts into 1 each, and leaving everything else as is.

@stephentoub
Copy link
Contributor

Thanks for all the details, @drch- !

What I'm hearing (and please correct me if I've misunderstood) is that you can make it work with what's there today, but doing so requires you to effectively bake in knowledge of the heuristics ToChatResponse uses, thereby potentially making the code brittle if we were to tweak the heuristics in an incompatible way. And thus, while we can make forward progress now (bits containing my fix for ProtectedData should be out today I believe), we're looking for a better longer-term solution. Is that a fair assessment, or is it more critical than that?

Based on what you've said, I think all of these are potential solutions, in my order of preference (not promising we'd do any of these yet, just trying to understand the field and then we'll need to vet against other providers):

  1. Add a public string? PartId { get; set; } property to AIContent. Providers could choose to set that to whatever they want and use it however they want, but ToChatResponse would avoid coalescing any otherwise-coalescable AIContent instances if they had different non-null PartID values.
  2. Push public string? ProtectedData { get; set; } down to the base AIContent. A concrete instance of the base AIContent can never be coalesced with anything. A provider that wanted content of any kind associated with protected data would instantiate the derived type, but such derived types might still be coalescable, e.g. a stream of TextReasoningContent with text and no protected data followed by the signature for the whole part in the form of a TextReasoningContent with no text and with protected data. A provider could instantiate the base AIContent directly with protected data, and that would never coalesce with anything.
  3. Add a public bool Coalescable { get; set; } = true; flag to AIContent. AIContent that should never be merged would have that property explicitly set to false, and ToChatResponse would avoid merging them. This is the same as your AdditionalProperties suggestion, just strongly-typed.

?

@amanda-tarafa
Copy link
Contributor

@drch-

We don't need to preserve the unflattened form when we round trip to the model. IE the flattened version we recreate is what the response would have been if they hit the non-streaming endpoints.

In practice, this just means flattening any multiple thoughts or multiple text parts into 1 each, and leaving everything else as is.

Yes, I think this makes sense and does not contradict any of the documentation.
We still cannot merge the signature though. And we are loosing the thought flag if it ever comes in something other than a text part, but that's on us, so we can use additional properties for it.

@stephentoub

is that you can make it work with what's there today

Not really, we shouldn't be merging the signature with the previous thoughts. Because you are coalescing, we would have to drop all mergeable thoughts on our side before returning, and I really believe we shouldn't be in the business of dropping bits.
Note that it "probably" will work to some extend even with the merged signature, but we would be going hard against documentation: "Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought." and we really can't do that. Also @drch- current solution seems to work because he's attaching the signature to a specfic part (the first non-thought I think) but that's based on observation, and we can't do that either.

It seems we need 1 or 3, 2 won't work because we cannot guarantee that the signature will end up in its original part. I like 3 better because 1 (and 2 if made to work) are mixing data with behaviour and those relations may not hold in the future. What about a version of 3 where the property is named AsIs which means you don't apply any transforms to the content at all, it's just coalescing now, it may be something else in the future.

@stephentoub
Copy link
Contributor

stephentoub commented Oct 21, 2025

Not really, we shouldn't be merging the signature with the previous thoughts

But that's avoidable by inserting an AIContent between them that will prevent them from being coalesced (literally new AIContent()), right? I'm missing why that doesn't work? I realize that's not ideal and a better longer term solution is desirable, but does it not work?

@amanda-tarafa
Copy link
Contributor

But that's avoidable by inserting an AIContent between them that will prevent them from being coalesced (literally new AIContent()), right?

Yes, I hadn't either read (this is not exactly 2 I think) or thought about this, but that would work, I'd still insert one before and one after, etc. But yes, using empty AIContent as boundary as a (very) temporary solution would work.

@amanda-tarafa
Copy link
Contributor

@stephentoub The prefered solution would be a combination of 2 and 3 with AsIs or something more general than Coalescable. From 2, we'd take the ProtectedData on AIContent, but not to use empty AIContents to represent a signature, instead, to be able to attach a signature to each type of content (something @drch- proposed at the start). From 3 we get the ability to mark any type of content as non-transformable (coalesce or any other transformation). With these changes, the current behaviour for the other providers is not affected, and we decouple the (protected)data from the transformability.

@stephentoub
Copy link
Contributor

The prefered solution would be a combination of 2 and 3 with AsIs or something more general than Coalescable

(1) gives you the same capability as (3)... anything you don't want coalesced is just given its own part id. But it's also in my mind much cleaner than (3) because it's describing what the thing is rather than behaviors that should be applied to it by unrelated components that may or may not be used later on. It's why (3) is my least favorite. And various providers actually have that concept, whether it be "part" or "index" or such terminology.

@stephentoub
Copy link
Contributor

Also, for (2), does gemini/vertex ever today actually send a thought signature with non-text, or is that a hypothetical future thing?

@drch-
Copy link
Author

drch- commented Oct 22, 2025

Yes, it does. For example it's common that a response contains only function call(s). The thought signature appears on this part.

@amanda-tarafa
Copy link
Contributor

amanda-tarafa commented Oct 22, 2025

does gemini/vertex ever today actually send a thought signature with non-text

As @drch- said, yes, and it's even documented in https://ai.google.dev/gemini-api/docs/thinking#signatures: "Signatures are returned from the model within other parts in the response, for example function calling or text parts.". If we don't have 2) we have to use empty TRCs to put these signatures before or after the actual content.

(1) gives you the same capability as (3)

Yes, agreed in tersm of cabality.

because it's describing what the thing is rather than behaviors that should be applied to it by unrelated components that may or may not be used later on

But M.E.AI would be actually infering a desired beahaviour just from the description of the thing. That's precisely what I don't like. I prefer the description of the thing and the expectations of behaviour to be separate. RawResponse is a similar description of the thing, and yet you are not infering behavoiur from it (I know it's not serializable and it would be breaking to use it now anyway). AdditionalProperties is also a description of the thing and you are not infering any behaviour from it, even if loosing info on the way because you only attach the first AdditionalProperties to the merged result. PartId is fundamentally no different than those, even if it's "an ID".
There's another argument to make about future-proofing and how this is new and chaging fast. I think the more intentional the API is the easier it will be to accomodate future changes without breaking folks. In that sense I think I now like more your original Coalesceable than my more general AsIs.

@stephentoub
Copy link
Contributor

But M.E.AI would be actually infering a desired beahaviour just from the description of the thing.

Yes, it would be inferring that it's ok within a very narrow heuristic to coalesce all text (or separately text reasoning) that's part of the same part. As @drch- pointed out, ADK does that, too, merging viable updates within the same part.

Adding a "coalescable" onto AIContent is putting external concerns onto it. This is not specific to the content, but rather a directive to an external algorithm about how that algorithm should behave. It's declaring a "how" for one possible consumer of the data, rather than declaring a "what", which is directly tied to the data. It happens that we ship an implementation of such an algorithm in the same assembly, but that's just packaging; anyone could write such a thing. And that doesn't scale. What if someone wants to write a reducer that removes content in order to reduce the size of chat history... should AIContent also be imbued with a strongly-typed property for priority on removal? What if someone wants to write a translater that tranforms a history into a different language... should AIContent be given a strongly-typed property that dictates whether it's intended to be translated or whether that would result in issues? And so on.

I would much rather enable the AIContent (or, probably ChatResponseUpdate) to express the part that it's a part of, reflecting the actual data provided by the provider, which is about the what rather than the how for some unrelated algorithm. And then that algorithm's heuristics can take that into account.

@stephentoub
Copy link
Contributor

after the bug fix dotnet/extensions#6936

This is in the 9.10.1 release on nuget.

@amanda-tarafa
Copy link
Contributor

amanda-tarafa commented Oct 23, 2025

First, one note, we will make this work almost certainly, with whaterver surface M.E.AI ends up having. We can make it work now with AIContent as boundary, and any of the options being discussed here are already better than that. I'll continue the discussion to try to get to a better surface, but not because this is blocking.

My thoughts below but my proposal is this: Add public IList<AIContent> OriginalContent { get; set; } to AIContent. Continue to coalesce as you have but include all the original unmerged content on the OriginalContent property of the AIContent (TRC, etc.) that contains the merged content. This is general enough that any algorithm can do the same, and anything fundamental for rountripping can be preserved.

ADK does that, too, merging viable updates within the same part

Not exactly, what ADK does is wrap, and return, each streamed response in an LlmResponse that preserves the original content, including the original parts with signatures, etc. And then, when possible, returns extra synthetic LlmResponses that contain the merged text or thoughts. The important bit being, merging viable content does not loose information and the original parts can all be recovered, which is what we need for Gemini, at least for the signed parts.

It happens that we ship an implementation of such an algorithm in the same assembly, but that's just packaging; anyone could write such a thing. And that doesn't scale. What if someone wants to write a reducer that removes content in order to reduce the size of chat history...
... which is about the what rather than the how for some unrelated algorithm...

I don't agree with that what MEAI is shipping just happens to be a specific implementation. If someone writes a custom reducer and I use it, I do so at my own risk and it's my responsability to understand the roundtripping implications of using that custom reducer for the set of models I'm working with.

But I think it's a fair expectation to put on M.E.AI's own ToChatResponse that it preservers all the information for roundtripping without user intervention, right? (As far as I've seen, ToChatResponse it's the only way to obtain ChatMessages from the streaming results, and those messages are part of the input for the next turn). I did wonder at some point why was M.E.AI in the business of manipulating the data, but you said earlier on that some models actually expect the coalesced data back.

I don't agree either with this being about instructions to some random algorithm. What we are trying to do is allow each provider to specify how the data needs to be preserved for rounttripping, which is a fundamental aspect, wheter we call that "describing the data in terms of preservation" or "giving instructions for preservation of the data", is, if I may so, somewhat irrelevant. And of course, algorithms may decide not to respect those instructions, or they may not need to, but that's on them.

This is not specific to the content, but rather a directive to an external algorithm about how that algorithm should behave. ... And that doesn't scale.

Adding PartId won't scale either. That's just a proxy for passing on the preservation instructions by convention, but they continue to be incomplete instructions (as compared to the domain of possible transformations, etc.). M.E.AI was using ProtectedData in the same way, and that didn't scale just now, that's why we are talking about this. In three months, ProtectedData and PartId won't be enoug and there'll need to be a third hint somewhere.

Last night I was thinking about pushing the merging decision down to the providers through callbacks or similar, but that won't scale either (there are plenty more transformations than just merging) and seems messy in terms of API surface.

So, what about AIContent having a public IList<AIContent> OriginalContent { get; set; } so anyone can synthetize new content by transforming original content, but the surface allows for that original content to be preserved. M.E.AI would continue to synthetize as it has, but would include all the unmerged elements on the OriginalContent list of the merged element? This is very similar to what ADK does.

@amanda-tarafa
Copy link
Contributor

amanda-tarafa commented Oct 23, 2025

after the bug fix dotnet/extensions#6936

This is in the 9.10.1 release on nuget.

With this in, we can work on a workaround for preserving the signature. Given the parts:

Given the following Vertex parts:

text { text = "Hello ", thought = true }
text { text = "World", thought = true }
text { text = "We said hello", thought_signature = <bytes>}
text { text = "Howdy", thought = true, thought_signature = <bytes> }
function_call { function_call = { }, thought_signature = <bytes> }
text { text = "I'm done" }

We yield the following AIContents:

TRC { Text = "Hello " }
TRC { Text = "World" }
AIC { } // Begin AIContent boundary so that the signature part is not merged with anything
TC { Text = "We said hello" } // TextContent only, this was not a thought
TRC { ProtectedData = <bytes> }
AIC { } // End AIContent boundary so that the signature part is not merged with anything
AIC { } // Begin AIContent boundary
TRC { Text = "Howdy", ProtectedData = <bytes> } // TextReasoningContent, this was a thought. 
AIC { } // End AIContent boundary
AIC { } // Begin AIContent boundary
FCC { ... }
TRC { ProtectedData = <bytes> }
AIC { } // End AIContent boundary
TC { Text = "I'm done" }

So that they are coalesced as follows:

TRC { Text = "Hello World" } // Merged!
AIC { } // Boundaries are preserved. Begin AIContent boundary
TC { Text = "We said hello" } 
TRC { ProtectedData = <bytes> }
AIC { } // End AIContent boundary
AIC { } // Begin AIContent boundary
TRC { Text = "Howdy" ProtectedData = <bytes>}
AIC { } // End AIContent boundary
AIC { } // Begin AIContent boundary
FCC { ... }
TRC { ProtectedData = <bytes> }
AIC { } // End AIContent boundary
TC { Text = "I'm done" }

@stephentoub , @drch- do you think this is right, assuming MEAI's current coalesce strategy?
@drch- Feel free to make the changes on the PR if you have time, else, I'll do that tomorrow.

@drch-
Copy link
Author

drch- commented Oct 24, 2025

Yes that coalescing is accurate. I tested that specific scenario with the latest M.E.AI and the output was exactly as described.

I'd be happy if you would make the changes, @amanda-tarafa . I won't likely have a chance today.

@stephentoub
Copy link
Contributor

First, one note, we will make this work almost certainly, with whaterver surface M.E.AI ends up having.

Great. I think we're going to need to agree to disagree on the specifics ;-)

@amanda-tarafa
Copy link
Contributor

I'd be happy if you would make the changes, @amanda-tarafa . I won't likely have a chance today.

I was out for part of this week, but I'll get to this as soon as I have a chance.

@stephentoub
Copy link
Contributor

I'd be happy if you would make the changes, @amanda-tarafa . I won't likely have a chance today.

I was out for part of this week, but I'll get to this as soon as I have a chance.

Thanks :)

Are we blocking an initial release of the package for this? Or could this just be incremental bug fixes?

@amanda-tarafa
Copy link
Contributor

Are we blocking an initial release of the package for this? Or could this just be incremental bug fixes?

Not, just for me to get some time to patch the current implementation and add a couple of headers to track usage. It won't be later than Wednesday. ( @jskeet already fixed project generation so we are good and that also).

@stephentoub
Copy link
Contributor

Cool, sounds good. Please let me know if I can help at all.

(Also, it shouldn't affect anything, but there's a 9.10.2 for M.E.AI.Abstractions if you want to bump to that.)

@verdie-g
Copy link

@amanda-tarafa any news about that release? :)

@amanda-tarafa
Copy link
Contributor

No news yet, but I'm on it, please bear me with.

@amanda-tarafa
Copy link
Contributor

Closing, replacement PR coming shortly.

@stephentoub
Copy link
Contributor

Closing, replacement PR coming shortly.

@amanda-tarafa, any updates here on Google.Cloud.VertexAI.Extensions getting published? Can I help in some way?

cc: @rogerbarreto

@CharlieDigital
Copy link

Any news on the followup PR for this?

@amanda-tarafa
Copy link
Contributor

Please see #13815 (comment) and in general follow #13815 for updates. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants