-
Notifications
You must be signed in to change notification settings - Fork 31
Observability: OTel semantic conventions for agentic systems #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5aececc
0ddaf61
fa884b8
ef98f98
a583c4a
acc7621
5d61225
41c28bf
80107f6
c7505fa
bfc70f7
db8bda8
c5fd604
d60d9d3
d8d7572
cc31050
dc70015
0d13644
34366d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,300 @@ | ||
| Date: 19th December 2025<br/> | ||
| Authors: david-martin, evaline-ju<br/> | ||
| Status: Provisional<br/> | ||
|
|
||
| # Observability in Agentic Networking | ||
|
|
||
| This proposal defines tracing schemas for agentic systems, specifically the structure of traces emitted at runtime. It focuses on proxy-like workloads in Kubernetes environments, including sidecars and gateways. APIs for configuring observability (e.g., Kubernetes CRDs) will be addressed in subsequent proposals. | ||
|
|
||
| ## OpenTelemetry Semantic Conventions | ||
|
|
||
| Follow these OpenTelemetry semantic conventions: | ||
|
|
||
| - [GenAI Agent spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) for agent runtime spans | ||
| - [GenAI LLM spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/llm-spans/) for LLM provider spans | ||
| - [Model Context Protocol (MCP)](https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/) for tool/MCP server spans | ||
| - [Security rule attributes](https://opentelemetry.io/docs/specs/semconv/registry/attributes/security-rule/) for policy enforcement and guardrails | ||
| - [Error attributes](https://opentelemetry.io/docs/specs/semconv/registry/attributes/error/) for error handling | ||
|
|
||
| ### Alternate Conventions and Interoperability | ||
|
|
||
| The agentic observability ecosystem includes earlier conventions like [OpenInference](https://github.com/Arize-ai/openinference) (used by [Phoenix](https://phoenix.arize.com/) from Arize) that predate the official OpenTelemetry GenAI conventions, as well as OpenTelemetry-compatible extensions like [OpenLLMetry](https://github.com/traceloop/openllmetry) (used by [Traceloop](https://www.traceloop.com/)/[Langfuse](https://langfuse.com/)). | ||
|
|
||
| For alternate conventions, use [span processors](https://opentelemetry.io/docs/specs/otel/trace/sdk/#span-processor) or the OpenTelemetry Collector's [transform processor](https://opentelemetry.io/docs/collector/configuration/#processors) with [OTTL](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor) to remap attributes to the GenAI semantic conventions. For non-OpenTelemetry sources, a custom SDK-level bridge is needed to convert to OpenTelemetry spans first. | ||
|
|
||
| ## Context Propagation | ||
|
|
||
| Use [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing and [W3C Baggage](https://www.w3.org/TR/baggage/) to propagate: | ||
|
|
||
| - `user.id`: Identifier for the user who delegated authority (obfuscated if needed) | ||
| - `agent.id`: Identifier for the agent executing operations | ||
|
|
||
| W3C Baggage is used instead of `tracestate` because these are application-level identifiers needed for runtime decisions (authorization, rate limiting), not tracing vendor metadata. | ||
|
|
||
| ## Proposed Extensions | ||
|
|
||
| Where OpenTelemetry does not define attributes for permission enforcement outcomes, we propose: | ||
|
|
||
| | Attribute | Values | Description | | ||
| |-----------|--------|-------------| | ||
| | `event.action` | `allow`, `deny` | Action taken due to the policy check | | ||
| | `event.outcome` | `success`, `failure`, `unknown` | Outcome of the check itself | | ||
|
|
||
| These can be considered for inclusion in an existing or new OpenTelemetry semantic convention registry. | ||
|
|
||
| ### Per-Rule Evaluation Spans | ||
|
|
||
| For authorization and guardrail checks, emit an `mcp.authorization.rule` or `mcp.guardrail.rule` child span for each rule evaluated within the parent span. Each child span records: | ||
| - `security_rule.name` — the rule identifier | ||
| - `security_rule.match` — whether the rule matched (`true`/`false`) | ||
| - `event.action` — the action the rule would take (`allow`/`deny`) | ||
|
|
||
| Evaluation stops at the determining rule. For authorization, if no allow rule matches, emit an explicit `default-deny` rule span (`security_rule.name: default-deny`, `security_rule.match: true`, `event.action: deny`) to make the implicit default-deny behavior visible in traces. | ||
|
david-martin marked this conversation as resolved.
|
||
|
|
||
| ## Retries | ||
|
|
||
| Agentic retries often involve changed parameters (different tool arguments, altered prompts, or alternate tools). Use a common trace ID to link retry attempts. Reference prompts by hash to avoid full logging. | ||
|
|
||
| ## Examples | ||
|
|
||
| These examples illustrate how the conventions apply. Span attributes shown are not comprehensive; see the linked OpenTelemetry specifications for complete attribute definitions. | ||
|
|
||
| ### Access policy enforcement | ||
|
|
||
| This shows a trace example of a permission rule checked prior to tool access through a gateway. | ||
|
|
||
| ``` | ||
| trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 | ||
|
|
||
| Claude AI Agent | ||
| │ | ||
| │ LLM call | ||
| │ | ||
| ▼ | ||
| Span: gen_ai.agent.chat [span_id: 3d4e5f6a] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 342ms ━━━━━━━━━━━━ | ||
| ├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 | ||
| ├─ span.kind: INTERNAL | ||
| │ | ||
| ├─ gen_ai.agent.id: claude-agent-prod-001 | ||
| ├─ gen_ai.agent.name: claude-sonnet-4.5 | ||
| ├─ gen_ai.operation.name: chat | ||
| ├─ gen_ai.system: anthropic | ||
| ├─ gen_ai.request.model: claude-sonnet-4-5-20250929 | ||
| │ | ||
| ├─ gen_ai.usage.input_tokens: 1247 | ||
| ├─ gen_ai.usage.output_tokens: 89 | ||
| │ | ||
| └─ status: OK | ||
| │ | ||
| │ | ||
| └─► Span: gen_ai.tool.call [span_id: 4e5f6a7b] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12ms ━━━━━━━━━━━━━ | ||
| ├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id | ||
| ├─ span.kind: INTERNAL | ||
| ├─ parent_span_id: 3d4e5f6a | ||
| │ | ||
| ├─ gen_ai.tool.name: delete_customer_data | ||
| └─ status: OK | ||
| │ | ||
| │ POST /mcp | ||
| │ Headers: | ||
| │ traceparent: 00-f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8-4e5f6a7b-01 | ||
| │ Authorization: Bearer eyJhbGc... | ||
| │ ... | ||
| │ Body (JSON-RPC): | ||
| │ { | ||
| │ "jsonrpc": "2.0", | ||
| │ "method": "tools/call", | ||
| │ "params": { | ||
| │ "name": "delete_customer_data", | ||
| │ "arguments": { "customer_id": "12345" } | ||
| │ }, | ||
| │ "id": 1 | ||
| │ } | ||
| ▼ | ||
|
|
||
| Span: mcp.gateway.request [span_id: 5e6f7a8b] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17ms ━━━━━━━━━━━━ | ||
| ├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id | ||
| ├─ span.kind: SERVER | ||
| ├─ parent_span_id: 4e5f6a7b | ||
| ├─ http.response.status_code: 403 | ||
| │ | ||
| ├─ gen_ai.operation.name: execute_tool | ||
| ├─ gen_ai.tool.name: delete_customer_data | ||
| │ | ||
| ├─ mcp.method.name: tools/call | ||
| ├─ mcp.session.id: sess_agent_2p7k4m | ||
| │ | ||
| ├─ error.type: PermissionDeniedError | ||
| └─ status: ERROR | ||
| │ | ||
| │ | ||
| ├─► Span: mcp.authorization [span_id: 7a8b9c1d] | ||
| │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15ms ━━━━━━━━━━━━ | ||
| │ ├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id | ||
| │ ├─ span.kind: INTERNAL | ||
| │ ├─ parent_span_id: 5e6f7a8b | ||
| │ │ | ||
| │ ├─ security_rule.ruleset.name: crm_data_access_policy | ||
| │ ├─ event.action: deny | ||
| │ ├─ event.outcome: success | ||
| │ │ | ||
| │ ├─ error.type: PermissionDeniedError | ||
| │ ├─ error.message: "User role 'support_agent' lacks privileges for customer_data.delete" | ||
| │ ├─ status: ERROR | ||
| │ │ | ||
| │ ├─► Span: mcp.authorization.rule [span_id: 9c1d2e3f] | ||
| │ │ ├─ security_rule.name: read_only_support | ||
| │ │ ├─ security_rule.match: false | ||
| │ │ ├─ event.action: allow | ||
| │ │ └─ event.outcome: success | ||
| │ │ | ||
| │ ├─► Span: mcp.authorization.rule [span_id: 1d2e3f4a] | ||
| │ │ ├─ security_rule.name: admin_only_delete | ||
| │ │ ├─ security_rule.match: false | ||
| │ │ ├─ event.action: allow | ||
| │ │ └─ event.outcome: success | ||
| │ │ | ||
| │ └─► Span: mcp.authorization.rule [span_id: 2e3f4a5b] | ||
| │ ├─ security_rule.name: default-deny | ||
| │ ├─ security_rule.match: true | ||
| │ ├─ event.action: allow | ||
| │ └─ event.outcome: success | ||
| │ | ||
| │ | ||
| └─► Span: mcp.audit.log [span_id: 8b9c1d2e] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2ms ━━━━━━━━━━━ | ||
| ├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id | ||
| ├─ span.kind: INTERNAL | ||
| ├─ parent_span_id: 5e6f7a8b | ||
| ├─ audit.event.type: authorization_failure | ||
| ├─ audit.event.category: security | ||
| ├─ audit.event.outcome: failure | ||
| ├─ ... | ||
| └─ status: OK | ||
|
|
||
| │ | ||
| ▼ | ||
| ⛔ Error: 403 - Forbidden | ||
| 🔒 Permission denied: insufficient privileges for customer_data.delete | ||
| ``` | ||
|
|
||
| ### Guardrailing | ||
|
|
||
| This shows a trace example of a guardrail blocking a request at a gateway. | ||
|
Comment on lines
+184
to
+186
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👋 from OpenTelemetry GenAI SIG I don't have a ton of context on this proposal, but I wanted to share a proposal for Guardrails open-telemetry/semantic-conventions#3233 |
||
|
|
||
| ``` | ||
| trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 | ||
|
|
||
| Claude AI Agent | ||
| │ | ||
| │ LLM call | ||
| │ | ||
| ▼ | ||
| Span: gen_ai.agent.chat [span_id: 3d4e5f6a] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 342ms ━━━━━━━━━━━━ | ||
| ├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 | ||
| ├─ span.kind: INTERNAL | ||
| │ | ||
| ├─ gen_ai.agent.id: claude-agent-prod-001 | ||
| ├─ gen_ai.agent.name: claude-sonnet-4.5 | ||
| ├─ gen_ai.operation.name: chat | ||
| ├─ gen_ai.system: anthropic | ||
| ├─ gen_ai.request.model: claude-sonnet-4-5-20250929 | ||
| │ | ||
| ├─ gen_ai.usage.input_tokens: 427 | ||
| ├─ gen_ai.usage.output_tokens: 89 | ||
| │ | ||
| └─ status: OK | ||
| │ | ||
| │ | ||
| └─► Span: gen_ai.tool.call [span_id: 4e5f6a7b] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12ms ━━━━━━━━━━━━━ | ||
| ├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id | ||
| ├─ span.kind: INTERNAL | ||
| ├─ parent_span_id: 3d4e5f6a | ||
| │ | ||
| ├─ gen_ai.tool.name: send_email | ||
| └─ status: OK | ||
| │ | ||
| │ POST /mcp | ||
| │ Headers: | ||
| │ traceparent: 00-c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8-4e5f6a7b-01 | ||
| │ ... | ||
| │ Body (JSON-RPC): | ||
| │ { | ||
| │ "jsonrpc": "2.0", | ||
| │ "method": "tools/call", | ||
| │ "params": { | ||
| │ "name": "send_email", | ||
| │ "arguments": { | ||
| │ "to": "bob@example.com", | ||
| │ "body": "SSN: 123-45-6789, CC: 4532-1234-5678-9010" | ||
| │ } | ||
| │ }, | ||
| │ "id": 2 | ||
| │ } | ||
| ▼ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the point where the traffic hits a k8s cluster?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say the request is inside the MCP aware router/gateway, or in the MCP Server at this point. |
||
|
|
||
| Span: mcp.gateway.request [span_id: 5e6f7a8b] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 30ms ━━━━━━━━━━━━ | ||
| ├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id | ||
| ├─ span.kind: SERVER | ||
| ├─ parent_span_id: 4e5f6a7b | ||
| ├─ http.response.status_code: 400 | ||
| │ | ||
| ├─ gen_ai.operation.name: execute_tool | ||
| ├─ gen_ai.tool.name: send_email | ||
| │ | ||
| ├─ mcp.method.name: tools/call | ||
| ├─ mcp.session.id: sess_agent_5k9m2n | ||
| │ | ||
| ├─ error.type: GuardrailViolationError | ||
| └─ status: ERROR | ||
| │ | ||
| ├─► Span: mcp.guardrail.evaluate [span_id: 7a8b9c1d] | ||
| │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25ms ━━━━━━━━━━━━ | ||
| │ ├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id | ||
| │ ├─ span.kind: INTERNAL | ||
| │ ├─ parent_span_id: 5e6f7a8b | ||
| │ │ | ||
| │ ├─ security_rule.ruleset.name: pii_detection_policy | ||
| │ ├─ event.action: deny | ||
| │ ├─ event.outcome: success | ||
| │ │ | ||
| │ ├─ error.type: GuardrailViolationError | ||
| │ ├─ error.message: "PII detected: SSN, Credit Card" | ||
| │ ├─ status: ERROR | ||
| │ │ | ||
| │ ├─► Span: mcp.guardrail.rule [span_id: 9c1d2e3f] | ||
| │ │ ├─ security_rule.name: block_profanity | ||
| │ │ ├─ security_rule.match: false | ||
| │ │ ├─ event.action: deny | ||
| │ │ └─ event.outcome: success | ||
| │ │ | ||
| │ └─► Span: mcp.guardrail.rule [span_id: 1d2e3f4a] | ||
| │ ├─ security_rule.name: block_sensitive_pii | ||
| │ ├─ security_rule.match: true | ||
| │ ├─ event.action: deny | ||
| │ ├─ event.outcome: success | ||
| │ ├─ guardrail.pii.types_detected: [ssn, credit_card] | ||
| │ ├─ guardrail.pii.confidence: high | ||
| │ └─ guardrail.pii.field: arguments.body | ||
| │ | ||
| └─► Span: mcp.audit.log [span_id: 8b9c1d2e] | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5ms ━━━━━━━━━━━ | ||
| ├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id | ||
| ├─ span.kind: INTERNAL | ||
| ├─ audit.event.type: guardrail_violation | ||
| ├─ ... | ||
| ├─ audit.pii.types: [ssn, credit_card] | ||
| ├─ audit.severity: critical | ||
| └─ status: OK | ||
|
|
||
| │ | ||
| ▼ | ||
| ⛔ 400 Bad Request - Guardrail violation: PII detected | ||
| 🔒 Blocked: SSN and Credit Card found in request | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.