-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: add structured outputs schema logging for Anthropic and Gemini #3454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
1de9ffa
feat: add structured outputs schema logging for Anthropic and Gemini
nirga d6360b2
fix: remove unused imports in sample apps
nirga ca5f423
test: add structured outputs tests for Anthropic with VCR
nirga 6f12631
Merge branch 'main' into feat/structured-outputs-logging
nirga 13f1e33
chore: bump anthropic SDK to 0.74.1 for structured outputs support
nirga 0f2d11a
fix: update Event API for OpenTelemetry SDK 1.38.0 compatibility
nirga f5ca45f
chore: remove skip decorator from structured outputs tests
nirga c91f7c6
fix: update Anthropic SDK minimum version to 0.74.0 for structured ou…
nirga 8aa83f7
Merge branch 'main' into feat/structured-outputs-logging
nirga cd7f14d
revert: undo Event API changes to event_emitter.py and test_messages.py
nirga d8c8b35
fix(anthropic): correct structured outputs API format and update tests
nirga File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
197 changes: 197 additions & 0 deletions
197
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,197 @@ | ||
| import json | ||
|
|
||
| import pytest | ||
| from opentelemetry.semconv._incubating.attributes import ( | ||
| gen_ai_attributes as GenAIAttributes, | ||
| ) | ||
| from opentelemetry.semconv_ai import SpanAttributes | ||
|
|
||
| from .utils import verify_metrics | ||
|
|
||
| # NOTE: These tests require anthropic SDK >= 0.50.0 which supports structured outputs | ||
| # The feature was announced in November 2025 but the SDK version installed (0.49.0) | ||
| # does not yet support the output_format parameter. | ||
| # Tests are kept here for when the SDK is updated. | ||
|
|
||
|
|
||
| JOKE_SCHEMA = { | ||
| "type": "object", | ||
| "properties": { | ||
| "joke": { | ||
| "type": "string", | ||
| "description": "A joke about OpenTelemetry" | ||
| }, | ||
| "rating": { | ||
| "type": "integer", | ||
| "description": "Rating of the joke from 1 to 10" | ||
| } | ||
| }, | ||
| "required": ["joke", "rating"], | ||
| "additionalProperties": False | ||
| } | ||
|
|
||
| OUTPUT_FORMAT = { | ||
| "type": "json", | ||
| "json_schema": { | ||
| "name": "joke_response", | ||
| "strict": True, | ||
| "schema": JOKE_SCHEMA | ||
| } | ||
| } | ||
|
|
||
|
|
||
| @pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") | ||
| @pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") | ||
| @pytest.mark.vcr | ||
| def test_anthropic_structured_outputs_legacy( | ||
| instrument_legacy, anthropic_client, span_exporter, log_exporter, reader | ||
| ): | ||
| response = anthropic_client.beta.messages.create( | ||
| model="claude-sonnet-4-5-20250929", | ||
| max_tokens=1024, | ||
| betas=["structured-outputs-2025-11-13"], | ||
| messages=[ | ||
| { | ||
| "role": "user", | ||
| "content": "Tell me a joke about OpenTelemetry and rate it from 1 to 10" | ||
| } | ||
| ], | ||
| output_format=OUTPUT_FORMAT | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| assert len(spans) == 1 | ||
| assert spans[0].name == "anthropic.chat" | ||
|
|
||
| anthropic_span = spans[0] | ||
| assert ( | ||
| anthropic_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.content"] | ||
| == "Tell me a joke about OpenTelemetry and rate it from 1 to 10" | ||
| ) | ||
| assert anthropic_span.attributes[f"{GenAIAttributes.GEN_AI_PROMPT}.0.role"] == "user" | ||
| assert ( | ||
| anthropic_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content") | ||
| == response.content[0].text | ||
| ) | ||
| assert ( | ||
| anthropic_span.attributes.get(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role") | ||
| == "assistant" | ||
| ) | ||
|
|
||
| assert SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in anthropic_span.attributes | ||
| schema_attr = json.loads( | ||
| anthropic_span.attributes[SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA] | ||
| ) | ||
| assert "properties" in schema_attr | ||
| assert "joke" in schema_attr["properties"] | ||
| assert "rating" in schema_attr["properties"] | ||
|
|
||
| assert anthropic_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "claude-sonnet-4-5-20250929" | ||
| assert anthropic_span.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) == "claude-sonnet-4-5-20250929" | ||
|
|
||
| response_json = json.loads(response.content[0].text) | ||
| assert "joke" in response_json | ||
| assert "rating" in response_json | ||
|
|
||
| metrics_data = reader.get_metrics_data() | ||
| resource_metrics = metrics_data.resource_metrics | ||
| verify_metrics(resource_metrics, "claude-sonnet-4-5-20250929") | ||
|
|
||
| logs = log_exporter.get_finished_logs() | ||
| assert len(logs) == 0, ( | ||
| "Assert that it doesn't emit logs when use_legacy_attributes is True" | ||
| ) | ||
|
|
||
|
|
||
| @pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") | ||
| @pytest.mark.vcr | ||
| def test_anthropic_structured_outputs_with_events_with_content( | ||
| instrument_with_content, anthropic_client, span_exporter, log_exporter, reader | ||
| ): | ||
| response = anthropic_client.beta.messages.create( | ||
| model="claude-sonnet-4-5-20250929", | ||
| max_tokens=1024, | ||
| betas=["structured-outputs-2025-11-13"], | ||
| messages=[ | ||
| { | ||
| "role": "user", | ||
| "content": "Tell me a joke about OpenTelemetry and rate it from 1 to 10" | ||
| } | ||
| ], | ||
| output_format=OUTPUT_FORMAT | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| assert len(spans) == 1 | ||
| assert spans[0].name == "anthropic.chat" | ||
|
|
||
| anthropic_span = spans[0] | ||
|
|
||
| assert SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in anthropic_span.attributes | ||
| schema_attr = json.loads( | ||
| anthropic_span.attributes[SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA] | ||
| ) | ||
| assert "properties" in schema_attr | ||
| assert "joke" in schema_attr["properties"] | ||
| assert "rating" in schema_attr["properties"] | ||
|
|
||
| assert anthropic_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "claude-sonnet-4-5-20250929" | ||
| assert anthropic_span.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) == "claude-sonnet-4-5-20250929" | ||
|
|
||
| response_json = json.loads(response.content[0].text) | ||
| assert "joke" in response_json | ||
| assert "rating" in response_json | ||
|
|
||
| metrics_data = reader.get_metrics_data() | ||
| resource_metrics = metrics_data.resource_metrics | ||
| verify_metrics(resource_metrics, "claude-sonnet-4-5-20250929") | ||
|
|
||
| logs = log_exporter.get_finished_logs() | ||
| assert len(logs) == 2 | ||
|
|
||
|
|
||
| @pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") | ||
| @pytest.mark.vcr | ||
| def test_anthropic_structured_outputs_with_events_with_no_content( | ||
| instrument_with_no_content, anthropic_client, span_exporter, log_exporter, reader | ||
| ): | ||
| response = anthropic_client.beta.messages.create( | ||
| model="claude-sonnet-4-5-20250929", | ||
| max_tokens=1024, | ||
| betas=["structured-outputs-2025-11-13"], | ||
| messages=[ | ||
| { | ||
| "role": "user", | ||
| "content": "Tell me a joke about OpenTelemetry and rate it from 1 to 10" | ||
| } | ||
| ], | ||
| output_format=OUTPUT_FORMAT | ||
| ) | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| assert len(spans) == 1 | ||
| assert spans[0].name == "anthropic.chat" | ||
|
|
||
| anthropic_span = spans[0] | ||
|
|
||
| assert SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in anthropic_span.attributes | ||
| schema_attr = json.loads( | ||
| anthropic_span.attributes[SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA] | ||
| ) | ||
| assert "properties" in schema_attr | ||
| assert "joke" in schema_attr["properties"] | ||
| assert "rating" in schema_attr["properties"] | ||
|
|
||
| assert anthropic_span.attributes.get(GenAIAttributes.GEN_AI_REQUEST_MODEL) == "claude-sonnet-4-5-20250929" | ||
| assert anthropic_span.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_MODEL) == "claude-sonnet-4-5-20250929" | ||
|
|
||
| response_json = json.loads(response.content[0].text) | ||
| assert "joke" in response_json | ||
| assert "rating" in response_json | ||
|
|
||
| metrics_data = reader.get_metrics_data() | ||
| resource_metrics = metrics_data.resource_metrics | ||
| verify_metrics(resource_metrics, "claude-sonnet-4-5-20250929") | ||
|
|
||
| logs = log_exporter.get_finished_logs() | ||
| assert len(logs) == 2 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
55 changes: 55 additions & 0 deletions
55
packages/sample-app/sample_app/anthropic_structured_outputs_demo.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| from anthropic import Anthropic | ||
| from traceloop.sdk import Traceloop | ||
| from dotenv import load_dotenv | ||
|
|
||
| load_dotenv() | ||
|
|
||
| client = Anthropic() | ||
|
|
||
| Traceloop.init( | ||
| app_name="anthropic_structured_outputs_demo", | ||
| ) | ||
|
|
||
|
|
||
| def main(): | ||
| print("Making request with structured outputs...") | ||
|
|
||
| joke_schema = { | ||
| "type": "object", | ||
| "properties": { | ||
| "joke": { | ||
| "type": "string", | ||
| "description": "A joke about OpenTelemetry" | ||
| }, | ||
| "rating": { | ||
| "type": "integer", | ||
| "description": "Rating of the joke from 1 to 10" | ||
| } | ||
| }, | ||
| "required": ["joke", "rating"], | ||
| "additionalProperties": False | ||
| } | ||
|
|
||
| response = client.beta.messages.create( | ||
| model="claude-sonnet-4-5-20250929", | ||
| max_tokens=1024, | ||
| betas=["structured-outputs-2025-11-13"], | ||
| messages=[ | ||
| { | ||
| "role": "user", | ||
| "content": "Tell me a joke about OpenTelemetry and rate it from 1 to 10" | ||
| } | ||
| ], | ||
| output_format={ | ||
| "type": "json_schema", | ||
| "schema": joke_schema | ||
| } | ||
| ) | ||
|
|
||
| print("\n=== Response ===") | ||
| print(response.content[0].text) | ||
| print("\n=== The 'gen_ai.request.structured_output_schema' attribute should be logged ===") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
49 changes: 49 additions & 0 deletions
49
packages/sample-app/sample_app/gemini_structured_outputs_demo.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| import os | ||
| import google.generativeai as genai | ||
| from traceloop.sdk import Traceloop | ||
| from dotenv import load_dotenv | ||
|
|
||
| load_dotenv() | ||
|
|
||
| genai.configure(api_key=os.environ.get("GOOGLE_API_KEY")) | ||
|
|
||
| Traceloop.init( | ||
| app_name="gemini_structured_outputs_demo", | ||
| ) | ||
|
|
||
|
|
||
| def main(): | ||
| print("Making request with structured outputs...") | ||
|
|
||
| response_schema = { | ||
| "type": "object", | ||
| "properties": { | ||
| "joke": { | ||
| "type": "string", | ||
| "description": "A joke about OpenTelemetry" | ||
| }, | ||
| "rating": { | ||
| "type": "integer", | ||
| "description": "Rating of the joke from 1 to 10" | ||
| } | ||
| }, | ||
| "required": ["joke", "rating"] | ||
| } | ||
|
|
||
| model = genai.GenerativeModel("gemini-1.5-flash") | ||
|
|
||
| result = model.generate_content( | ||
| "Tell me a joke about OpenTelemetry and rate it", | ||
| generation_config=genai.GenerationConfig( | ||
| response_mime_type="application/json", | ||
| response_schema=response_schema | ||
| ) | ||
| ) | ||
|
|
||
| print("\n=== Response ===") | ||
| print(result.text) | ||
| print("\n=== The 'gen_ai.request.structured_output_schema' attribute should be logged ===") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
38 changes: 38 additions & 0 deletions
38
packages/sample-app/sample_app/openai_structured_outputs_demo.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| import os | ||
| from openai import OpenAI | ||
| from pydantic import BaseModel | ||
| from traceloop.sdk import Traceloop | ||
| from dotenv import load_dotenv | ||
|
|
||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
| load_dotenv() | ||
|
|
||
| client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
|
|
||
| Traceloop.init( | ||
| app_name="structured_outputs_demo", | ||
| ) | ||
|
|
||
|
|
||
| class Joke(BaseModel): | ||
| joke: str | ||
| rating: int | ||
|
|
||
|
|
||
| def main(): | ||
| print("Making request with structured outputs...") | ||
| response = client.beta.chat.completions.parse( | ||
| model="gpt-4o-2024-08-06", | ||
| messages=[{"role": "user", "content": "Tell me a joke about OpenTelemetry"}], | ||
| response_format=Joke, | ||
| ) | ||
|
|
||
| print("\n=== Response ===") | ||
| print(f"Joke: {response.choices[0].message.parsed.joke}") | ||
| print(f"Rating: {response.choices[0].message.parsed.rating}") | ||
| print( | ||
| "\n=== Check the span output above for 'gen_ai.request.structured_output_schema' attribute ===" | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply skip decorator consistently across all structured output tests.
Only the first test has the skip decorator for SDK version >= 0.50.0, but all three tests use the same
beta.messages.createAPI withoutput_formatandbetas=["structured-outputs-2025-11-13"]. If the SDK version requirement applies to the first test, it should apply to all three tests that exercise the same structured outputs feature.Apply this diff to add the skip decorator to the remaining tests:
+@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") @pytest.mark.vcr def test_anthropic_structured_outputs_with_events_with_content( instrument_with_content, anthropic_client, span_exporter, log_exporter, reader ):+@pytest.mark.skip(reason="Requires anthropic SDK >= 0.50.0 with structured outputs support") @pytest.mark.vcr def test_anthropic_structured_outputs_with_events_with_no_content( instrument_with_no_content, anthropic_client, span_exporter, log_exporter, reader ):Also applies to: 100-100, 146-146
🤖 Prompt for AI Agents