Python: Skip get_final_response in OTel _finalize_stream when stream errored#5232
Merged
moonbox3 merged 7 commits intomicrosoft:mainfrom Apr 14, 2026
Conversation
…errored When a streaming error occurs, _finalize_stream (a cleanup hook registered by AgentTelemetryLayer) was unconditionally calling get_final_response(), which triggers all registered result hooks including after_run context providers. This caused providers to fire incorrectly on error paths. Guard against this by checking result_stream._consumed: True only after StopAsyncIteration (normal completion), False when an exception was raised. The fix applies to both the chat client and agent telemetry layers. Closes microsoft#5231
ec7ad25 to
b345761
Compare
5 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes OTel streaming instrumentation so result hooks (e.g., after_run context providers) don’t fire when a stream errors, by skipping get_final_response() unless the stream completed normally.
Changes:
- Add a
_consumedguard in the chat stream_finalize_streamto avoid callingget_final_response()on error paths. - Add the same guard in the agent stream
_finalize_stream.
…ror in OTel span Address Copilot review feedback on microsoft#5232: - Add `_stream_error: Exception | None` to ResponseStream, set in __anext__'s except branch so cleanup hooks can inspect the failure. - Expose public `consumed` and `stream_error` properties to avoid coupling observability.py to private stream internals. - Update both _finalize_stream closures (chat and agent layers) to use the public properties and call capture_exception() with the stream error before returning early, ensuring the OTel span records the failure rather than closing silently.
- Use stream_error is not None as the guard in _finalize_stream instead of not consumed, so the early-return path is keyed precisely to actual errors rather than any non-normal completion state. - Clear _stream_error after _run_cleanup_hooks() completes to avoid retaining the exception traceback (and any large object graphs it references) on the stream instance beyond the cleanup phase.
Contributor
Author
|
@moonbox3 @eavanvalkenburg @TaoChenOSU - Would appreciate a look when you get a chance. |
Member
eavanvalkenburg
left a comment
There was a problem hiding this comment.
small comment, otherwise looks good and thanks
…rectly Per review feedback: since observability.py and _types.py are in the same package, accessing _stream_error directly is fine and the public properties are unnecessary.
eavanvalkenburg
approved these changes
Apr 14, 2026
Contributor
Python Test Coverage Report •
Python Unit Test Overview
|
|||||||||||||||||||||||||||||||||||
Keep _stream_error private (consistent with rest of ResponseStream), and suppress reportPrivateUsage at the call sites in observability.py with inline pyright: ignore comments — access is intentional within the package.
moonbox3
approved these changes
Apr 14, 2026
Contributor
Author
|
@moonbox3 - seems like a rate limit error in ms learn MCP. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5231
Problem
AgentTelemetryLayerregisters_finalize_streamas a cleanup hook on theResponseStream. Cleanup hooks run on both normal completion and streaming errors. Inside_finalize_stream,get_final_response()was called unconditionally — andget_final_response()always runs all registered result hooks, including_post_hookin_agents.pywhich calls_run_after_providers.This caused after_run context providers to fire on error paths, which is incorrect.
The call chain on streaming error:
Fix
_types.py—ResponseStream_stream_errorwhen__anext__catches one, then clear it after_run_cleanup_hooks()completes to avoid retaining the traceback and any large object graphs it references.consumed: bool(True only after normalStopAsyncIteration) andstream_error: Exception | None.observability.py— both_finalize_streamclosures (chat and agent layers)result_stream.stream_error is not None— this is precise to actual errors rather than any non-normal completion state, so edge cases whereconsumedis False but no error is set still proceed toget_final_response().capture_exception(span, result_stream.stream_error, ...)before returning so the OTel span records the failure rather than closing silently.Notes
Affects both
_trace_chat_client_invocation(chat layer) and_trace_agent_invocation(agent layer) — both had the same pattern and are fixed here.