Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 300 additions & 0 deletions docs/proposals/0033-Observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
Date: 19th December 2025<br/>
Authors: david-martin, evaline-ju<br/>
Status: Provisional<br/>

# Observability in Agentic Networking
Comment thread
david-martin marked this conversation as resolved.

This proposal defines tracing schemas for agentic systems, specifically the structure of traces emitted at runtime. It focuses on proxy-like workloads in Kubernetes environments, including sidecars and gateways. APIs for configuring observability (e.g., Kubernetes CRDs) will be addressed in subsequent proposals.

## OpenTelemetry Semantic Conventions

Follow these OpenTelemetry semantic conventions:

- [GenAI Agent spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) for agent runtime spans
- [GenAI LLM spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/llm-spans/) for LLM provider spans
- [Model Context Protocol (MCP)](https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/) for tool/MCP server spans
- [Security rule attributes](https://opentelemetry.io/docs/specs/semconv/registry/attributes/security-rule/) for policy enforcement and guardrails
- [Error attributes](https://opentelemetry.io/docs/specs/semconv/registry/attributes/error/) for error handling

### Alternate Conventions and Interoperability

The agentic observability ecosystem includes earlier conventions like [OpenInference](https://github.com/Arize-ai/openinference) (used by [Phoenix](https://phoenix.arize.com/) from Arize) that predate the official OpenTelemetry GenAI conventions, as well as OpenTelemetry-compatible extensions like [OpenLLMetry](https://github.com/traceloop/openllmetry) (used by [Traceloop](https://www.traceloop.com/)/[Langfuse](https://langfuse.com/)).

For alternate conventions, use [span processors](https://opentelemetry.io/docs/specs/otel/trace/sdk/#span-processor) or the OpenTelemetry Collector's [transform processor](https://opentelemetry.io/docs/collector/configuration/#processors) with [OTTL](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor) to remap attributes to the GenAI semantic conventions. For non-OpenTelemetry sources, a custom SDK-level bridge is needed to convert to OpenTelemetry spans first.

## Context Propagation

Use [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing and [W3C Baggage](https://www.w3.org/TR/baggage/) to propagate:

- `user.id`: Identifier for the user who delegated authority (obfuscated if needed)
- `agent.id`: Identifier for the agent executing operations

W3C Baggage is used instead of `tracestate` because these are application-level identifiers needed for runtime decisions (authorization, rate limiting), not tracing vendor metadata.

## Proposed Extensions

Where OpenTelemetry does not define attributes for permission enforcement outcomes, we propose:

| Attribute | Values | Description |
|-----------|--------|-------------|
| `event.action` | `allow`, `deny` | Action taken due to the policy check |
| `event.outcome` | `success`, `failure`, `unknown` | Outcome of the check itself |

These can be considered for inclusion in an existing or new OpenTelemetry semantic convention registry.

### Per-Rule Evaluation Spans

For authorization and guardrail checks, emit an `mcp.authorization.rule` or `mcp.guardrail.rule` child span for each rule evaluated within the parent span. Each child span records:
- `security_rule.name` — the rule identifier
- `security_rule.match` — whether the rule matched (`true`/`false`)
- `event.action` — the action the rule would take (`allow`/`deny`)

Evaluation stops at the determining rule. For authorization, if no allow rule matches, emit an explicit `default-deny` rule span (`security_rule.name: default-deny`, `security_rule.match: true`, `event.action: deny`) to make the implicit default-deny behavior visible in traces.
Comment thread
david-martin marked this conversation as resolved.

## Retries

Agentic retries often involve changed parameters (different tool arguments, altered prompts, or alternate tools). Use a common trace ID to link retry attempts. Reference prompts by hash to avoid full logging.

## Examples

These examples illustrate how the conventions apply. Span attributes shown are not comprehensive; see the linked OpenTelemetry specifications for complete attribute definitions.

### Access policy enforcement

This shows a trace example of a permission rule checked prior to tool access through a gateway.

```
trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8

Claude AI Agent
│ LLM call
Span: gen_ai.agent.chat [span_id: 3d4e5f6a]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 342ms ━━━━━━━━━━━━
├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8
├─ span.kind: INTERNAL
├─ gen_ai.agent.id: claude-agent-prod-001
├─ gen_ai.agent.name: claude-sonnet-4.5
├─ gen_ai.operation.name: chat
├─ gen_ai.system: anthropic
├─ gen_ai.request.model: claude-sonnet-4-5-20250929
├─ gen_ai.usage.input_tokens: 1247
├─ gen_ai.usage.output_tokens: 89
└─ status: OK
└─► Span: gen_ai.tool.call [span_id: 4e5f6a7b]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12ms ━━━━━━━━━━━━━
├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id
├─ span.kind: INTERNAL
├─ parent_span_id: 3d4e5f6a
├─ gen_ai.tool.name: delete_customer_data
└─ status: OK
│ POST /mcp
│ Headers:
│ traceparent: 00-f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8-4e5f6a7b-01
│ Authorization: Bearer eyJhbGc...
│ ...
│ Body (JSON-RPC):
│ {
│ "jsonrpc": "2.0",
│ "method": "tools/call",
│ "params": {
│ "name": "delete_customer_data",
│ "arguments": { "customer_id": "12345" }
│ },
│ "id": 1
│ }

Span: mcp.gateway.request [span_id: 5e6f7a8b]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17ms ━━━━━━━━━━━━
├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id
├─ span.kind: SERVER
├─ parent_span_id: 4e5f6a7b
├─ http.response.status_code: 403
├─ gen_ai.operation.name: execute_tool
├─ gen_ai.tool.name: delete_customer_data
├─ mcp.method.name: tools/call
├─ mcp.session.id: sess_agent_2p7k4m
├─ error.type: PermissionDeniedError
└─ status: ERROR
├─► Span: mcp.authorization [span_id: 7a8b9c1d]
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15ms ━━━━━━━━━━━━
│ ├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id
│ ├─ span.kind: INTERNAL
│ ├─ parent_span_id: 5e6f7a8b
│ │
│ ├─ security_rule.ruleset.name: crm_data_access_policy
│ ├─ event.action: deny
│ ├─ event.outcome: success
│ │
│ ├─ error.type: PermissionDeniedError
│ ├─ error.message: "User role 'support_agent' lacks privileges for customer_data.delete"
│ ├─ status: ERROR
│ │
│ ├─► Span: mcp.authorization.rule [span_id: 9c1d2e3f]
│ │ ├─ security_rule.name: read_only_support
│ │ ├─ security_rule.match: false
│ │ ├─ event.action: allow
│ │ └─ event.outcome: success
│ │
│ ├─► Span: mcp.authorization.rule [span_id: 1d2e3f4a]
│ │ ├─ security_rule.name: admin_only_delete
│ │ ├─ security_rule.match: false
│ │ ├─ event.action: allow
│ │ └─ event.outcome: success
│ │
│ └─► Span: mcp.authorization.rule [span_id: 2e3f4a5b]
│ ├─ security_rule.name: default-deny
│ ├─ security_rule.match: true
│ ├─ event.action: allow
│ └─ event.outcome: success
└─► Span: mcp.audit.log [span_id: 8b9c1d2e]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2ms ━━━━━━━━━━━
├─ trace_id: f5a9d214e6b8c7a9d1e2f3a4b5c6d7e8 ← Same trace_id
├─ span.kind: INTERNAL
├─ parent_span_id: 5e6f7a8b
├─ audit.event.type: authorization_failure
├─ audit.event.category: security
├─ audit.event.outcome: failure
├─ ...
└─ status: OK

⛔ Error: 403 - Forbidden
🔒 Permission denied: insufficient privileges for customer_data.delete
```

### Guardrailing

This shows a trace example of a guardrail blocking a request at a gateway.
Comment on lines +184 to +186
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 from OpenTelemetry GenAI SIG

I don't have a ton of context on this proposal, but I wanted to share a proposal for Guardrails open-telemetry/semantic-conventions#3233


```
trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8

Claude AI Agent
│ LLM call
Span: gen_ai.agent.chat [span_id: 3d4e5f6a]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 342ms ━━━━━━━━━━━━
├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8
├─ span.kind: INTERNAL
├─ gen_ai.agent.id: claude-agent-prod-001
├─ gen_ai.agent.name: claude-sonnet-4.5
├─ gen_ai.operation.name: chat
├─ gen_ai.system: anthropic
├─ gen_ai.request.model: claude-sonnet-4-5-20250929
├─ gen_ai.usage.input_tokens: 427
├─ gen_ai.usage.output_tokens: 89
└─ status: OK
└─► Span: gen_ai.tool.call [span_id: 4e5f6a7b]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12ms ━━━━━━━━━━━━━
├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id
├─ span.kind: INTERNAL
├─ parent_span_id: 3d4e5f6a
├─ gen_ai.tool.name: send_email
└─ status: OK
│ POST /mcp
│ Headers:
│ traceparent: 00-c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8-4e5f6a7b-01
│ ...
│ Body (JSON-RPC):
│ {
│ "jsonrpc": "2.0",
│ "method": "tools/call",
│ "params": {
│ "name": "send_email",
│ "arguments": {
│ "to": "bob@example.com",
│ "body": "SSN: 123-45-6789, CC: 4532-1234-5678-9010"
│ }
│ },
│ "id": 2
│ }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the point where the traffic hits a k8s cluster?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the request is inside the MCP aware router/gateway, or in the MCP Server at this point.
Either of those could be in a k8s cluster, yes.
After this point, it's up to the MCP server where that traffic goes (if anywhere), before the response is returned.


Span: mcp.gateway.request [span_id: 5e6f7a8b]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 30ms ━━━━━━━━━━━━
├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id
├─ span.kind: SERVER
├─ parent_span_id: 4e5f6a7b
├─ http.response.status_code: 400
├─ gen_ai.operation.name: execute_tool
├─ gen_ai.tool.name: send_email
├─ mcp.method.name: tools/call
├─ mcp.session.id: sess_agent_5k9m2n
├─ error.type: GuardrailViolationError
└─ status: ERROR
├─► Span: mcp.guardrail.evaluate [span_id: 7a8b9c1d]
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25ms ━━━━━━━━━━━━
│ ├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id
│ ├─ span.kind: INTERNAL
│ ├─ parent_span_id: 5e6f7a8b
│ │
│ ├─ security_rule.ruleset.name: pii_detection_policy
│ ├─ event.action: deny
│ ├─ event.outcome: success
│ │
│ ├─ error.type: GuardrailViolationError
│ ├─ error.message: "PII detected: SSN, Credit Card"
│ ├─ status: ERROR
│ │
│ ├─► Span: mcp.guardrail.rule [span_id: 9c1d2e3f]
│ │ ├─ security_rule.name: block_profanity
│ │ ├─ security_rule.match: false
│ │ ├─ event.action: deny
│ │ └─ event.outcome: success
│ │
│ └─► Span: mcp.guardrail.rule [span_id: 1d2e3f4a]
│ ├─ security_rule.name: block_sensitive_pii
│ ├─ security_rule.match: true
│ ├─ event.action: deny
│ ├─ event.outcome: success
│ ├─ guardrail.pii.types_detected: [ssn, credit_card]
│ ├─ guardrail.pii.confidence: high
│ └─ guardrail.pii.field: arguments.body
└─► Span: mcp.audit.log [span_id: 8b9c1d2e]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5ms ━━━━━━━━━━━
├─ trace_id: c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8 ← Same trace_id
├─ span.kind: INTERNAL
├─ audit.event.type: guardrail_violation
├─ ...
├─ audit.pii.types: [ssn, credit_card]
├─ audit.severity: critical
└─ status: OK

⛔ 400 Bad Request - Guardrail violation: PII detected
🔒 Blocked: SSN and Credit Card found in request
```
2 changes: 2 additions & 0 deletions docs/user-journeys.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ As an AI Engineer I want to:

* Understand why my agent is getting denied when calling a certain tool

* Audit agent actions in the context of the user who delegated authority, so that I can attribute outcomes to both human intent and agent behaviour.

As an AI Platform Engineer I want to:

* Have an aggregated way of seeing failures/denials across the platform
Expand Down