Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 28 additions & 36 deletions docs/decisions/0022-chat-history-persistence-consistency.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@ The persistence timing and `FunctionResultContent` trimming behaviors are interr

- **Per-run persistence**: When messages are batched and persisted at the end of the full run, trailing `FunctionResultContent` trimming becomes necessary to match the service's behavior. Without trimming, the stored history contains `FunctionResultContent` that the service would never have stored.

This means the trimming feature (introduced in [PR #4792](https://github.com/microsoft/agent-framework/pull/4792)) is primarily needed as a complement to per-run persistence. The `PersistChatHistoryAtEndOfRun` setting (introduced in [PR #4762](https://github.com/microsoft/agent-framework/pull/4762)) inverts the default so that per-service-call persistence is the standard behavior, and per-run persistence is opt-in.

## Decision Drivers

- **A. Consistency**: The default behavior of `ChatHistoryProvider` should produce stored history that closely matches what the underlying AI service would store, minimizing surprise when switching between framework-managed and service-managed chat history.
Expand All @@ -43,33 +41,30 @@ This means the trimming feature (introduced in [PR #4792](https://github.com/mic

## Considered Options

- Option 1: Default to per-run persistence with `FunctionResultContent` trimming (opt-in to per-service-call)
- Option 2: Default to per-service-call persistence (opt-in to per-run)
- Option 1: Per-run persistence with opt-in FRC (FunctionResultContent) trimming
- Option 2: Opt-in per-service-call persistence (via `SimulateServiceStoredChatHistory`)

## Pros and Cons of the Options

### Option 1: Default to per-run persistence with `FunctionResultContent` trimming

Keep the current default behavior of persisting chat history only at the end of the full agent run. Add `FunctionResultContent` trimming as the default to improve consistency with service storage. Provide an opt-in setting for users who want per-service-call persistence.
### Option 1: Per-run persistence with opt-in FRC trimming

Settings:
- `PersistChatHistoryAtEndOfRun` = `true`
Keep the current default behavior of persisting chat history only at the end of the full agent run. Add `FunctionResultContent` trimming as an opt-in behavior to improve consistency with service storage.

- Good, because runs are atomic — chat history is only updated when the full run succeeds, satisfying driver B.
- Good, because the mental model is simple: one run = one history update, satisfying driver D.
- Good, because trimming trailing `FunctionResultContent` improves consistency with service storage, partially satisfying driver A.
- Good, because users can opt in to per-service-call persistence for checkpointing/recovery scenarios, satisfying drivers C and E.
- Bad, because the default persistence timing still differs from the service's behavior (per-run vs. per-service-call), only partially satisfying driver A.
- Bad, because if the process crashes mid-loop, all intermediate progress from the current run is lost, not satisfying driver C by default.
- Bad, because if the process crashes mid-loop, all intermediate progress from the current run is lost, not satisfying driver C.
- Bad, because this option alone does not provide a way for users to opt into per-service-call persistence, not satisfying driver E.

### Option 2: Default to per-service-call persistence
### Option 2: Opt-in per-service-call persistence (via `SimulateServiceStoredChatHistory`)

Change the default to persist chat history after each individual service call within the FIC loop, matching the AI service's behavior. Trailing `FunctionResultContent` trimming is unnecessary with this approach (it is naturally handled). Provide an opt-in setting for users who want per-run atomicity with trimming.
Introduce an optional SimulateServiceStoredChatHistory setting to persist chat history after each individual service call within the FIC loop, matching the AI service's behavior. Trailing `FunctionResultContent` trimming is unnecessary with this approach (it is naturally handled).

Settings:
- `PersistChatHistoryAtEndOfRun` = `false` (default)
- `SimulateServiceStoredChatHistory` = `true`
Comment thread
westey-m marked this conversation as resolved.

- Good, because the stored history matches the service's behavior by default for both timing and content, fully satisfying driver A.
- Good, because the stored history matches the service's behavior when opting in for both timing and content, fully satisfying driver A.
- Good, because intermediate progress is preserved if the process is interrupted, satisfying driver C.
- Good, because no separate `FunctionResultContent` trimming logic is needed, reducing complexity.
- Bad, because chat history may be left in an incomplete state if the run fails mid-loop (e.g., `FunctionCallContent` stored without corresponding `FunctionResultContent`), not satisfying driver B. A subsequent run cannot proceed without manually providing the missing `FunctionResultContent`.
Expand All @@ -78,39 +73,36 @@ Settings:

## Decision Outcome

Chosen option: **Option 2 — Default to per-service-call persistence**, because it fully satisfies the consistency driver (A), naturally handles `FunctionResultContent` trimming without additional logic, and provides better recoverability for long-running tool-calling loops. Per-run persistence remains available via the `PersistChatHistoryAtEndOfRun` setting for users who prefer atomic run semantics.
Chosen option: **Option 2: Opt-in per-service-call persistence (via `SimulateServiceStoredChatHistory`)**. The existing per-run persistence behavior is retained as-is, requiring no changes from users. Per-service-call persistence is available as an opt-in feature via the `SimulateServiceStoredChatHistory` setting. This satisfies drivers B (atomicity) and D (simplicity) for the common case, while fully satisfying driver A (consistency) for users who opt into simulated service-stored behavior. Users who need per-service-call persistence for recoverability (driver C) can enable it explicitly.

### Configuration Matrix

The behavior depends on the combination of `UseProvidedChatClientAsIs` and `PersistChatHistoryAtEndOfRun`:
The behavior depends on the combination of `UseProvidedChatClientAsIs` and `SimulateServiceStoredChatHistory`:

| `UseProvidedChatClientAsIs` | `PersistChatHistoryAtEndOfRun` | Behavior |
| `UseProvidedChatClientAsIs` | `SimulateServiceStoredChatHistory` | Behavior |
|---|---|---|
| `false` (default) | `false` (default) | **Per-service-call persistence.** A `ChatHistoryPersistingChatClient` middleware is automatically injected into the chat client pipeline between `FunctionInvokingChatClient` and the leaf `IChatClient`. Messages are persisted after each service call. |
| `true` | `false` | **User responsibility.** No middleware is injected because the user has provided a custom chat client stack. The user is responsible for ensuring correct persistence behavior (e.g., by including their own persisting middleware). |
| `false` | `true` | **Per-run persistence with marking.** A `ChatHistoryPersistingChatClient` middleware is injected, but configured to *mark* messages with metadata rather than store them immediately. At the end of the run, marked messages are stored. Trailing `FunctionResultContent` is trimmed. |
| `true` | `true` | **Per-run persistence with warning.** The system checks whether the custom chat client stack includes a `ChatHistoryPersistingChatClient`. If not, a warning is emitted (particularly relevant for workflow handoff scenarios where trimming cannot be guaranteed). If no `ChatHistoryPersistingChatClient` is preset, all messages are stored at the end of the run, otherwise marked messages are stored. |
| `false` (default) | `false` (default) | **Per-run persistence.** Messages are persisted at the end of the full agent run via the `ChatHistoryProvider`. |
| `false` | `true` | **Per-service-call persistence (simulated).** A `ServiceStoredSimulatingChatClient` middleware is automatically injected into the chat client pipeline between `FunctionInvokingChatClient` and the leaf `IChatClient`. Messages are persisted after each service call. A sentinel `ConversationId` causes FIC to treat the conversation as service-managed. |
| `true` | `false` | **Per-run persistence.** No middleware is injected because the user has provided a custom chat client stack. Messages are persisted at the end of the run. |
| `true` | `true` | **User responsibility.** The system checks whether the custom chat client stack includes a `ServiceStoredSimulatingChatClient`. If not, a warning is emitted — the user is expected to have added their own per-service-call persistence mechanism. End-of-run persistence is skipped. |

### Consequences

- Good, because the stored history matches the service's behavior by default for both timing and content, fully satisfying consistency (driver A).
- Good, because intermediate progress is preserved if the process is interrupted, satisfying recoverability (driver C).
- Good, because no separate `FunctionResultContent` trimming logic is needed in the default path, reducing complexity.
- Good, because marking persisted messages with metadata enables deduplication and aids debugging.
- Good, because warnings for custom chat client configurations without the persisting middleware help prevent silent failures in workflow handoff scenarios.
- Bad, because chat history may be left in an incomplete state if the run fails mid-loop (e.g., `FunctionCallContent` stored without corresponding `FunctionResultContent`), requiring manual recovery in rare cases.
- Bad, because the mental model is more complex for the default path: a single run may produce multiple history updates.
- Neutral, because users who prefer atomic run semantics can opt in to per-run persistence via `PersistChatHistoryAtEndOfRun = true`.
- Good, because per-run persistence is atomic by default — chat history is only updated when the full run succeeds, satisfying driver B.
- Good, because the default mental model is simple: one run = one history update, satisfying driver D.
- Good, because users who opt into `SimulateServiceStoredChatHistory` get stored history that matches the service's behavior for both timing and content, fully satisfying driver A.
- Good, because per-service-call persistence preserves intermediate progress if the process is interrupted, satisfying driver C when opted in.
- Good, because no separate `FunctionResultContent` trimming logic is needed when per-service-call persistence is active — it is naturally handled.
- Good, because conflict detection (configurable via `ThrowOnChatHistoryProviderConflict`, `WarnOnChatHistoryProviderConflict`, `ClearOnChatHistoryProviderConflict`) prevents misconfiguration when a service returns a `ConversationId` alongside a configured `ChatHistoryProvider`.
- Bad, because per-service-call persistence (when opted in) may leave chat history in an incomplete state if the run fails mid-loop (e.g., `FunctionCallContent` stored without corresponding `FunctionResultContent`), requiring manual recovery in rare cases.
- Neutral, because users who want per-service-call consistency can opt in via `SimulateServiceStoredChatHistory = true`, satisfying driver E.
- Neutral, because increased write frequency from per-service-call persistence may impact performance for some storage backends; this can be mitigated with a caching decorator.

### Implementation Notes

#### Conversation ID Consistency

The `ChatHistoryPersistingChatClient` middleware must also update the session's `ConversationId` consistently for both response-based and conversation-based service interactions, ensuring the session always reflects the latest service-provided identifier.

## More Information
We should introduce a separate `ConversationIdPersistingChatClient`, middleware which allows us to
persist response `ConversationIds` during the FICC loop. This could be used with or without
`ServiceStoredSimulatingChatClient`.

- [PR #4762: Persist messages during function call loop](https://github.com/microsoft/agent-framework/pull/4762) — introduces `PersistChatHistoryAfterEachServiceCall` option and `ChatHistoryPersistingChatClient` decorator
- [PR #4792: Trim final FRC to match service storage](https://github.com/microsoft/agent-framework/pull/4792) — introduces `StoreFinalFunctionResultContent` option and `FilterFinalFunctionResultContent` logic
- [Issue #2889](https://github.com/microsoft/agent-framework/issues/2889) — original issue tracking chat history persistence during function call loops
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
// Copyright (c) Microsoft. All rights reserved.

// This sample demonstrates how the ChatClientAgent persists chat history after each individual
// call to the AI service.
// call to the AI service, using the SimulateServiceStoredChatHistory option.
// When an agent uses tools, FunctionInvokingChatClient may loop multiple times
// (service call → tool execution → service call), and intermediate messages (tool calls and
// results) are persisted after each service call. This allows you to inspect or recover them
// even if the process is interrupted mid-loop, but may also result in chat history that is not
// yet finalized (e.g., tool calls without results) being persisted, which may be undesirable in some cases.
//
// To opt into end-of-run persistence instead (atomic run semantics), set
// PersistChatHistoryAtEndOfRun = true on ChatClientAgentOptions.
// To use end-of-run persistence instead (atomic run semantics), remove the
// SimulateServiceStoredChatHistory = true setting (or set it to false). End-of-run
// persistence is the default behavior.
//
// The sample runs two multi-turn conversations: one using non-streaming (RunAsync) and one
// using streaming (RunStreamingAsync), to demonstrate correct behavior in both modes.
Expand Down Expand Up @@ -53,7 +54,7 @@ static string GetTime([Description("The city name.")] string city) =>
_ => $"{city}: time data not available."
};

// Create the agent — per-service-call persistence is the default behavior.
// Create the agent — per-service-call persistence is enabled via SimulateServiceStoredChatHistory.
// The in-memory ChatHistoryProvider is used by default when the service does not require service stored chat
// history, so for those cases, we can inspect the chat history via session.TryGetInMemoryChatHistory().
IChatClient chatClient = string.Equals(store, "TRUE", StringComparison.OrdinalIgnoreCase) ?
Expand All @@ -63,6 +64,7 @@ static string GetTime([Description("The city name.")] string city) =>
new ChatClientAgentOptions
{
Name = "WeatherAssistant",
SimulateServiceStoredChatHistory = true,
ChatOptions = new()
{
Instructions = "You are a helpful assistant. When asked about multiple cities, call the appropriate tool for each city.",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# In-Function-Loop Checkpointing

This sample demonstrates how `ChatClientAgent` persists chat history after each individual call to the AI service by default. This per-service-call persistence ensures intermediate progress is saved during the function invocation loop.
This sample demonstrates how `ChatClientAgent` can persist chat history after each individual call to the AI service using the `SimulateServiceStoredChatHistory` option. This per-service-call persistence ensures intermediate progress is saved during the function invocation loop.

## What This Sample Shows

When an agent uses tools, the `FunctionInvokingChatClient` loops multiple times (service call → tool execution → service call → …). By default, chat history is persisted after each service call via the `ChatHistoryPersistingChatClient` decorator:
When an agent uses tools, the `FunctionInvokingChatClient` loops multiple times (service call → tool execution → service call → …). By enabling `SimulateServiceStoredChatHistory = true`, chat history is persisted after each service call via the `ServiceStoredSimulatingChatClient` decorator:

- A `ChatHistoryPersistingChatClient` decorator is automatically inserted into the chat client pipeline
- A `ServiceStoredSimulatingChatClient` decorator is inserted into the chat client pipeline
- Before each service call, the decorator loads history from the `ChatHistoryProvider` and prepends it to the request
- After each service call, the decorator notifies the `ChatHistoryProvider` (and any `AIContextProvider` instances) with the new messages
- Only **new** messages are sent to providers on each notification — messages that were already persisted in an earlier call within the same run are deduplicated automatically

To opt into end-of-run persistence instead (atomic run semantics), set `PersistChatHistoryAtEndOfRun = true` on `ChatClientAgentOptions`. In that mode, the decorator marks messages with metadata rather than persisting them immediately, and `ChatClientAgent` persists only the marked messages at the end of the run.
By default (without `SimulateServiceStoredChatHistory`), chat history is persisted at the end of the full agent run instead. To use per-service-call persistence, set `SimulateServiceStoredChatHistory = true` on `ChatClientAgentOptions`.

With `SimulateServiceStoredChatHistory` = true, the behavior matches that of chat history stored in the underlying AI service exactly.

Per-service-call persistence is useful for:
- **Crash recovery** — if the process is interrupted mid-loop, the intermediate tool calls and results are already persisted
Expand All @@ -26,7 +29,7 @@ The sample asks the agent about the weather and time in three cities. The model
```
ChatClientAgent
└─ FunctionInvokingChatClient (handles tool call loop)
└─ ChatHistoryPersistingChatClient (persists after each service call)
└─ ServiceStoredSimulatingChatClient (persists after each service call)
└─ Leaf IChatClient (Azure OpenAI)
```

Expand Down
Loading
Loading