gen-ai: add security guardian (apply_guardrail) span + finding event#3233
gen-ai: add security guardian (apply_guardrail) span + finding event#3233nagkumar91 wants to merge 36 commits intoopen-telemetry:mainfrom
Conversation
|
@nagkumar91 We have a similar use-case and when a security incident happens for a chat span we create a new span as a chat span and add an attributed called Here's the sample of how we add it as of now - https://github.com/signalfx/splunk-otel-python-contrib/tree/main/instrumentation-genai/opentelemetry-instrumentation-aidefense#trace-integration |
|
Also, it will be great if we can have another entry in the |
| `span.gen_ai.inference.client` or `span.gen_ai.execute_tool.internal`). | ||
|
|
||
| Multiple guardian spans MAY exist under a single operation span if multiple guardians are chained. | ||
| attributes: |
There was a problem hiding this comment.
Can we add an attribute called gen_ai.security.event_id?
There was a problem hiding this comment.
The event being proposed would be a gen_ai.security.finding
Apply guardrail span will have these for IDs:
- gen_ai.guardian.id
- gen_ai.security.target.id
- gen_ai.security.policy.id
Would any of those fit your need?
There was a problem hiding this comment.
Would this be better as an event (security finding event as proposed in this span)? Wondering why its a chat span?
Currently, the domain team is using event_id and it's as per their requirement. Also, in our case the event is generated elsewhere and not instrumentation side. We used chat span to use existing types in the genai-utils for now and manage span life cycle using it.
Would this be better as an event (security finding event as proposed in this span)? Wondering why its a chat span? |
| | <a id="gen-ai-operation-name" href="#gen-ai-operation-name">`gen_ai.operation.name`</a> |  | string | The name of the operation being performed. [4] | `chat`; `generate_content`; `text_completion` | | ||
| | <a id="gen-ai-output-messages" href="#gen-ai-output-messages">`gen_ai.output.messages`</a> |  | any | Messages returned by the model where each message represents a specific model response (choice, candidate). [5] | [<br> {<br> "role": "assistant",<br> "parts": [<br> {<br> "type": "text",<br> "content": "The weather in Paris is currently rainy with a temperature of 57°F."<br> }<br> ],<br> "finish_reason": "stop"<br> }<br>] | | ||
| | <a id="gen-ai-output-type" href="#gen-ai-output-type">`gen_ai.output.type`</a> |  | string | Represents the content type requested by the client. [6] | `text`; `json`; `image` | | ||
| | <a id="gen-ai-guardian-id" href="#gen-ai-guardian-id">`gen_ai.guardian.id`</a> |  | string | The unique identifier of the security guardian or guardrail. [3] | `guard_abc123`; `sgi5gkybzqak`; `content-filter-v2` | |
There was a problem hiding this comment.
One nit, the draft currently mixes guardian (evaluator/service) vs guardrail (policy/config) identity (e.g., mapping provider guardrail IDs into gen_ai.guardian.id). I suggest keeping gen_ai.guardian.* for the evaluating component, and mapping guardrail/config identifiers (like aws.bedrock.guardrail.id) to gen_ai.security.policy.id (and/or on gen_ai.security.finding when policy-triggered). Otherwise it becomes hard/impossible to differentiate a guardian from a guardrail in traces, and we end up with confusing cardinality/semantics. This keeps cross provider correlation clean.
There was a problem hiding this comment.
Thanks @habibam! I've addressed this in the commit (88c269b):
- Updated
gen_ai.guardian.*attribute documentation to clarify these identify the evaluating service/component, not the policy - Added explicit mapping guidance:
- Guardian = service doing evaluation (e.g., "Azure Content Safety", "Bedrock Guardrails")
- Policy = configuration being applied (use
gen_ai.security.policy.idfor ARNs, blocklist IDs, etc.)
- Updated examples to be clearer
This should make cross-provider correlation cleaner by keeping guardian and guardrail/policy semantics distinct. Let me know if the updated documentation captures your concern!
| # MCP Guardian Adapter | ||
| # ============================================================================ | ||
|
|
||
| class MCPGuardianAdapter(BaseGuardianAdapter[MCPContext]): |
There was a problem hiding this comment.
We should add in an example of generating a span from the response of elicitation
There was a problem hiding this comment.
Added in commit 88c269b!
New elicitation guard methods:
guard_elicitation_request()- Guards outbound requests to user (prevents info leakage, excessive elicitation)guard_elicitation_response()- Guards user's response (detects PII, injection attempts)
Both map to target_type=message since they're user-facing interactions.
See the updated module docstring and main examples (tests 6-9) for usage patterns.
…ools - Remove redundant files from git (kept locally): genai_guardrail_instrumentation_prototype.py, demo_chat.py, demo_tools.py - Reduce framework adapters from 6 to 2 (keep LangChain + MCP, others preserved locally) - Move trace viewer utilities to tools/ directory - Consolidate README from 750+ lines to ~165 lines - Update .gitignore to ignore archived files
…CP elicitation - Clarify guardian vs guardrail semantics in registry.yaml: - gen_ai.guardian.* is for the evaluating service/component - gen_ai.security.policy.* is for configuration/policy identifiers - Added mapping guidance for AWS Bedrock, Azure Content Safety - Add gen_ai.security.external_event_id attribute for SIEM correlation - Add MCP elicitation guard methods: - guard_elicitation_request: guard outbound requests to user - guard_elicitation_response: guard user's input responses - Regenerate markdown files from updated YAML Addresses feedback from @habibam and @adityamehra
81f1c73 to
1cb3e68
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The markdownlint CI gate requires doc files and folders to use hyphens, not underscores. Renamed security_implementation_gen_ai_spec.md to security-implementation-gen-ai-spec.md and updated the reference link. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alexgarden-mnemom
left a comment
There was a problem hiding this comment.
Great work on this, Nagkumar. We've been running shipping OTel exporters for AI agent safety checks (@mnemom/aip-otel-exporter on npm, aip-otel-exporter on PyPI) with production traces in Grafana Cloud Tempo — 50+ span attributes across six span types.
A few suggestions based on what we've learned from production:
1. Dedicated span type: Making safety checks a dedicated safety_check span operation (child of invoke_agent or chat) gives you independent observability — you can measure safety check latency, build verdict distribution dashboards, and alert without parsing parent spans.
2. Three-value verdict: Boolean triggered misses the middle ground. In production, we've found every major provider has intermediate states — Azure severity 2-5, Google LOW/MEDIUM/HIGH, OpenAI varying confidence. A pass/review/fail enum maps cleanly
to all of them.
3. Structured concern events: When a guardrail fires, operators need structured detail. We emit gen_ai.safety.concern events (one per finding) with category, severity, and description. Gives you queryable data vs a boolean.
4. Policy model: Beyond guardrail.id — a full policy model (id/name/version/mode) covers enterprise deployments where teams roll out governance gradually. policy.mode (monitor/warn/enforce/off) is critical for staged rollouts.
5. Continuous score: A safety.score (0.0-1.0) alongside the verdict gives you a continuous signal for trending and thresholding. Every provider has a numeric score underneath their categorical output.
6. Metrics: A duration histogram and verdict counter would let teams build standard dashboards for safety check performance without custom queries.
We have a full provider mapping showing how these conventions apply across the ecosystem — AWS Bedrock Guardrails, Azure Content Safety, OpenAI Moderation, Google Safety Ratings, and our own AIP/AAP protocols. The proposed attributes map cleanly to all of them.
I think there's a path to combining your guardrail-specific work with a broader safety_check framework that covers guardrails, content filtering, integrity analysis, alignment verification, and policy evaluation as check types under a unified convention.
aabmass
left a comment
There was a problem hiding this comment.
I got some feedback from the GCP Model Armor team, which I'm sharing here
| - id: span.gen_ai.apply_guardrail.internal | ||
| type: span | ||
| stability: development | ||
| span_kind: internal | ||
| brief: Describes a security guardian evaluation. |
There was a problem hiding this comment.
What do you think of aligning on guardrail throughout this proposal? The GCP Model Armor team found "guardian" a bit esoteric and I agree. At least we should be consistent throughout.
Lmk if this has already been discussed
There was a problem hiding this comment.
Good point — we're discussing this internally with the team. "Guardrail" is clearly the more established industry term (AWS Bedrock Guardrails, NVIDIA NeMo Guardrails, Guardrails AI). The operation name apply_guardrail already uses it, so renaming gen_ai.guardian.* → gen_ai.guardrail.* would make everything consistent.
Will follow up once we have internal alignment — likely agreeing with your suggestion.
There was a problem hiding this comment.
Agree that guardian is interesting :) We're moving towards "controls" as the canonical name instead of guardrails/content filters back in the day.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Aligns the boolean attribute name with the gen_ai.security.decision.type enum value 'modify'. The broader term 'modified' correctly covers redaction, sanitization, and rewriting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Supporting MaterialsThe non-normative implementation spec, prototype code, and story scripts have been moved to gists to keep this PR focused on the YAML model + generated docs.
Previously these were in draft PR #3434, which can now be closed. |
The non-normative implementation spec is now hosted as a gist: https://gist.github.com/nagkumar91/95efa05449c72b4c95958190f26f13ba This keeps the PR focused on the YAML model + generated docs, matching the pattern used by PR open-telemetry#3250 (memory operations). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| `gen_ai.operation.name` SHOULD be `apply_guardrail`. | ||
|
|
||
| **Span name** SHOULD be `apply_guardrail {gen_ai.guardian.name}`. | ||
| When `gen_ai.guardian.name` is not available, it SHOULD be `apply_guardrail {gen_ai.security.target.type}`. |
There was a problem hiding this comment.
It's better to add : "Semantic conventions for individual guardians and frameworks MAY specify different span name format."
| When `gen_ai.guardian.name` is not available, it SHOULD be `apply_guardrail {gen_ai.security.target.type}`. | ||
|
|
||
| Guardian spans SHOULD be children of the operation span they are protecting (for example, | ||
| `span.gen_ai.inference.client` or `span.gen_ai.execute_tool.internal`). |
There was a problem hiding this comment.
I believe it's better to leave a reference to the markdown sections here rather than the name of the registry.
| `span.gen_ai.inference.client` or `span.gen_ai.execute_tool.internal`). | |
| [inference](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md#inference) or [execute tool span](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md#execute-tool-span)). |
|
|
||
| Multiple guardian spans MAY exist under a single operation span if multiple guardians are chained. | ||
|
|
||
| **Span kind** SHOULD be `INTERNAL`. |
There was a problem hiding this comment.
The span kind is suitable to be CLIENT in case of applying the guardian running in another process.
|
|
||
| - id: gen_ai.security.target.type | ||
| stability: development | ||
| type: string |
There was a problem hiding this comment.
Why wouldn't member work here?
1. Span name: apply_guardrail {guardian.name} {target.type}
Makes chained guardrail spans unique (e.g., 'apply_guardrail PII Filter input'
vs 'apply_guardrail PII Filter output'). Addresses feedback from @aabmass
and @habibam.
2. guardian.provider.name: Changed from free-form string to open enum with
well-known values (azure.ai.content_safety, aws.bedrock, gcp.model_armor).
Keeps guardrail provider distinct from inference provider (gen_ai.provider.name).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
This looks great, modeling guardrail enforcement as a separate concern makes a lot of sense. One thing I noticed while reading through this: there seems to be a gap between evaluation and guardrail in terms of what actually gets captured in telemetry. Right now:
But there's no clear way to represent the discrete result an evaluator produces when it isn't tied to any enforcement decision. For example:
In that situation, the output isn't a continuous score, and it isn't an enforcement action either, it's a categorical evaluation result that isn't directly captured today. There may be room for a small, vendor-neutral way to capture this kind of discrete evaluation outcome, sitting between score and guardrail decision. |
| members: | ||
| - id: allow | ||
| value: "allow" | ||
| brief: Request permitted to proceed without modification. | ||
| stability: development | ||
| - id: deny | ||
| value: "deny" | ||
| brief: Request denied and operation halted. | ||
| stability: development | ||
| - id: modify | ||
| value: "modify" | ||
| brief: Request permitted with modifications (redaction, sanitization). | ||
| stability: development | ||
| - id: warn | ||
| value: "warn" | ||
| brief: Request permitted but flagged for review. | ||
| stability: development | ||
| - id: audit | ||
| value: "audit" | ||
| brief: Request logged for audit purposes only, no enforcement. | ||
| stability: development |
There was a problem hiding this comment.
Guardrail Services describe a verdict that should happen but don't enforce it. We should tighten the language here that these are the verdicts, which is not the same as the enforcements.
E.g. AACS can return a warn verdict, but the caller may choose to handle it as a block (a typical use case in production). In this case the verdict is warn, but the enforcement is block.
| - id: gen_ai.security.decision.type | ||
| stability: development | ||
| type: | ||
| members: |
There was a problem hiding this comment.
Missing an "escalate" verdict for human in the loop/approval workflows.
| - id: gen_ai.guardian.provider.name | ||
| stability: development | ||
| type: | ||
| members: |
There was a problem hiding this comment.
Should this be a string instead? We should support cases where someone is running custom guardrails implementation outside of the major providers.
| - `memory_store`: Data being written to memory. | ||
| - `memory_retrieve`: Data being read from memory. | ||
| - `knowledge_query`: Knowledge/RAG query being sent. | ||
| - `knowledge_result`: Knowledge/RAG results being returned. |
There was a problem hiding this comment.
Aren't these tool calls also? tool call should also be split into input and output.
| stability: development | ||
| type: | ||
| members: | ||
| - id: none |
There was a problem hiding this comment.
No risk is a none-existent state. Suggest to change to unspecified/unknown or something along those lines.
| - 'jailbreak' | ||
| - 'custom:financial_advice_violation' | ||
| - 'azure:hate_speech' | ||
| - id: gen_ai.security.risk.severity |
There was a problem hiding this comment.
I would remove severity and keep score. Severity means different things for different applications/use cases, while a score is more independent/factual.
| type: double | ||
| brief: Numeric risk/confidence score (0.0 to 1.0). | ||
| examples: [0.85, 0.95, 0.42] | ||
| - id: gen_ai.security.risk.metadata |
There was a problem hiding this comment.
What's the intended use for this?
| - ['field:bcc', 'pattern:email'] | ||
| - ['count:2', 'position:output.content'] | ||
|
|
||
| - id: gen_ai.security.policy.id |
There was a problem hiding this comment.
In the case of a detailed policy under the same identifier, this policy id might be underspecified since a section/part of the larger policy triggered the verdict. I think it's useful to have this metadata, not sure how to make it "tight enough".
| - id: span.gen_ai.apply_guardrail.internal | ||
| type: span | ||
| stability: development | ||
| span_kind: internal | ||
| brief: Describes a security guardian evaluation. |
There was a problem hiding this comment.
Agree that guardian is interesting :) We're moving towards "controls" as the canonical name instead of guardrails/content filters back in the day.
|
This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days. |
Summary
Add semantic conventions for GenAI security guardrail operations — spans, attributes, events, and metrics for security evaluation of GenAI inputs/outputs.
New Span
gen_ai.operation.name)apply_guardrailSpan name:
apply_guardrail {gen_ai.guardian.name} {gen_ai.security.target.type}(e.g.,apply_guardrail "PII Filter" llm_input). Whengen_ai.guardian.nameis unavailable, the fallback isapply_guardrail {gen_ai.security.target.type}.Guardrail spans are children of the operation they protect (e.g.,
chat,execute_tool). Multiple guardrail spans MAY exist under a single parent if guardrails are chained.New Attributes
Guardian (evaluating service):
gen_ai.guardian.idgen_ai.guardian.namegen_ai.guardian.versiongen_ai.guardian.provider.nameazure.ai.content_safety,aws.bedrock,gcp.model_armorSecurity decision:
gen_ai.security.decision.typeallow,deny,modify,warn,auditgen_ai.security.decision.reasonallow)gen_ai.security.decision.codeSecurity target:
gen_ai.security.target.typellm_input,llm_output,tool_call,tool_definition,memory_store,memory_retrieve,knowledge_query,knowledge_result,messagegen_ai.security.target.idPolicy:
gen_ai.security.policy.idgen_ai.security.policy.namegen_ai.security.policy.versionContent inspection (opt-in, PII-sensitive):
gen_ai.security.content.input.valuegen_ai.security.content.output.valuegen_ai.security.content.input.hashgen_ai.security.content.modifiedExternal correlation:
gen_ai.security.external_event_idNew Event
gen_ai.security.findingEvent attributes (risk assessment):
gen_ai.security.risk.categoryprompt_injection,pii,toxicity)gen_ai.security.risk.severitynone,low,medium,high,criticalgen_ai.security.risk.scoregen_ai.security.risk.metadatagen_ai.security.policy.idgen_ai.security.policy.namegen_ai.security.policy.versionSupporting Materials
Terminology Note
The semantic conventions use guardian (
gen_ai.guardian.*) to describe the evaluating service/component and guardrail for the operation name (apply_guardrail) and policy configuration. This distinction was deliberate:apply_guardrailto align with industry terminology (AWS Bedrock Guardrails, NVIDIA NeMo Guardrails)Example Traces
Single guardrail — input filtering
Chained guardrails — input + output pipeline
Tool call guardrail