Skip to content

gen-ai: add security guardian (apply_guardrail) span + finding event#3233

Draft
nagkumar91 wants to merge 36 commits intoopen-telemetry:mainfrom
nagkumar91:gen-ai-security-guardian
Draft

gen-ai: add security guardian (apply_guardrail) span + finding event#3233
nagkumar91 wants to merge 36 commits intoopen-telemetry:mainfrom
nagkumar91:gen-ai-security-guardian

Conversation

@nagkumar91
Copy link
Copy Markdown
Contributor

@nagkumar91 nagkumar91 commented Dec 22, 2025

Summary

Add semantic conventions for GenAI security guardrail operations — spans, attributes, events, and metrics for security evaluation of GenAI inputs/outputs.

New Span

Span (gen_ai.operation.name) Kind Description
apply_guardrail internal Security guardrail evaluation of content or actions

Span name: apply_guardrail {gen_ai.guardian.name} {gen_ai.security.target.type} (e.g., apply_guardrail "PII Filter" llm_input). When gen_ai.guardian.name is unavailable, the fallback is apply_guardrail {gen_ai.security.target.type}.

Guardrail spans are children of the operation they protect (e.g., chat, execute_tool). Multiple guardrail spans MAY exist under a single parent if guardrails are chained.

New Attributes

Guardian (evaluating service):

Attribute Type Req Level Description
gen_ai.guardian.id string conditionally_required Machine-readable identifier of the guardrail service
gen_ai.guardian.name string recommended Human-readable name of the guardrail service
gen_ai.guardian.version string recommended Version of the guardrail service
gen_ai.guardian.provider.name enum recommended Provider/vendor of the guardrail service. Open enum with well-known values: azure.ai.content_safety, aws.bedrock, gcp.model_armor

Security decision:

Attribute Type Req Level Description
gen_ai.security.decision.type enum required Decision made: allow, deny, modify, warn, audit
gen_ai.security.decision.reason string conditionally_required Human-readable reason (when not allow)
gen_ai.security.decision.code int recommended Numeric code for the security decision

Security target:

Attribute Type Req Level Description
gen_ai.security.target.type string required Free-form string identifying what was evaluated. Suggested values: llm_input, llm_output, tool_call, tool_definition, memory_store, memory_retrieve, knowledge_query, knowledge_result, message
gen_ai.security.target.id string conditionally_required Identifier of the specific target

Policy:

Attribute Type Req Level Description
gen_ai.security.policy.id string conditionally_required Policy that triggered the decision
gen_ai.security.policy.name string recommended Human-readable policy name
gen_ai.security.policy.version string recommended Policy version

Content inspection (opt-in, PII-sensitive):

Attribute Type Req Level Description
gen_ai.security.content.input.value any opt_in Input content before guardrail processing
gen_ai.security.content.output.value any opt_in Output content after guardrail processing
gen_ai.security.content.input.hash string conditionally_required Hash of input for forensic correlation
gen_ai.security.content.modified boolean conditionally_required Whether content was modified by the guardrail

External correlation:

Attribute Type Req Level Description
gen_ai.security.external_event_id string conditionally_required External correlation identifier for security events

New Event

Event Description
gen_ai.security.finding Emitted when a guardrail produces a finding (deny, modify, warn)

Event attributes (risk assessment):

Attribute Type Req Level Description
gen_ai.security.risk.category string required Category of security risk (e.g., prompt_injection, pii, toxicity)
gen_ai.security.risk.severity enum required Severity: none, low, medium, high, critical
gen_ai.security.risk.score double recommended Numeric risk/confidence score (0.0–1.0)
gen_ai.security.risk.metadata string[] recommended Non-content metadata about the risk
gen_ai.security.policy.id string conditionally_required Policy that triggered the finding
gen_ai.security.policy.name string recommended Human-readable policy name
gen_ai.security.policy.version string recommended Policy version

Supporting Materials

Resource Description
📄 Non-Normative Implementation Spec Detailed spec with framework mappings and trace examples
🔧 Prototype Framework Code OTel bootstrap, guardian utils, and LangChain adapter
📖 Story Scripts & Trace Coverage 5 end-to-end scenarios with trace coverage matrix

Terminology Note

The semantic conventions use guardian (gen_ai.guardian.*) to describe the evaluating service/component and guardrail for the operation name (apply_guardrail) and policy configuration. This distinction was deliberate:

  • A single guardian (e.g., Azure Content Safety) can enforce multiple policies/guardrails
  • The operation is called apply_guardrail to align with industry terminology (AWS Bedrock Guardrails, NVIDIA NeMo Guardrails)

Example Traces

Single guardrail — input filtering

chat "gpt-4"                                                         (1200ms)
├── apply_guardrail "Azure Content Safety" llm_input                 (45ms)
│   ├── gen_ai.operation.name: "apply_guardrail"
│   ├── gen_ai.guardian.id: "content-filter-v2"
│   ├── gen_ai.guardian.name: "Azure Content Safety"
│   ├── gen_ai.guardian.provider.name: "azure.ai.content_safety"
│   ├── gen_ai.security.decision.type: "allow"
│   ├── gen_ai.security.target.type: "llm_input"
│   └── gen_ai.security.policy.id: "default-policy"
│
└── (LLM inference)                                                  (1100ms)

Chained guardrails — input + output pipeline

chat "gpt-4"                                                         (1500ms)
│
├── apply_guardrail "PII Filter" llm_input                           (30ms)
│   ├── gen_ai.guardian.id: "pii-filter-v3"
│   ├── gen_ai.guardian.name: "Custom PII Filter"
│   ├── gen_ai.guardian.provider.name: "custom"
│   ├── gen_ai.security.decision.type: "modify"
│   ├── gen_ai.security.content.modified: true
│   └── gen_ai.security.target.type: "llm_input"
│
├── (LLM inference)                                                  (1100ms)
│
├── apply_guardrail "Toxicity Filter" llm_output                     (40ms)
│   ├── gen_ai.guardian.id: "toxicity-v2"
│   ├── gen_ai.guardian.name: "Azure Content Safety"
│   ├── gen_ai.guardian.provider.name: "azure.ai.content_safety"
│   ├── gen_ai.security.decision.type: "deny"
│   ├── gen_ai.security.decision.reason: "toxicity_detected"
│   └── gen_ai.security.target.type: "llm_output"
│
└── apply_guardrail "Prompt Shield" llm_input                        (25ms)
    ├── gen_ai.guardian.id: "prompt-shield-v1"
    ├── gen_ai.guardian.name: "Prompt Shield"
    ├── gen_ai.guardian.provider.name: "azure.ai.content_safety"
    ├── gen_ai.security.decision.type: "allow"
    └── gen_ai.security.target.type: "llm_input"

Tool call guardrail

invoke_agent "ResearchBot"                                           (3000ms)
├── chat "gpt-4"                                                     (800ms)
│
├── apply_guardrail "Tool Policy" tool_call                          (15ms)
│   ├── gen_ai.guardian.id: "tool-policy-v1"
│   ├── gen_ai.security.decision.type: "deny"
│   ├── gen_ai.security.decision.reason: "unauthorized_tool"
│   ├── gen_ai.security.target.type: "tool_call"
│   └── gen_ai.security.target.id: "delete_database"
│
└── execute_tool "web_search"                                        (500ms)

@github-actions github-actions Bot added enhancement New feature or request area:gen-ai labels Dec 23, 2025
@adityamehra
Copy link
Copy Markdown

@nagkumar91 We have a similar use-case and when a security incident happens for a chat span we create a new span as a chat span and add an attributed called gen_ai.security.event_id. The value for this attribute can either be in the response body of the inspection call, which is separate from the actual LLM call, or can be in the response header when inspection happens along with the LLM call. Is it possible to add support for this attribute in here? Thanks!

Here's the sample of how we add it as of now - https://github.com/signalfx/splunk-otel-python-contrib/tree/main/instrumentation-genai/opentelemetry-instrumentation-aidefense#trace-integration

@adityamehra
Copy link
Copy Markdown

Also, it will be great if we can have another entry in the otel-genai-util for this new span type like we have for chat span - https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py#L96. Or probably this new type can extend the LLMInvocation type

Comment thread model/gen-ai/spans.yaml
`span.gen_ai.inference.client` or `span.gen_ai.execute_tool.internal`).

Multiple guardian spans MAY exist under a single operation span if multiple guardians are chained.
attributes:
Copy link
Copy Markdown

@adityamehra adityamehra Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an attribute called gen_ai.security.event_id?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event being proposed would be a gen_ai.security.finding

Apply guardrail span will have these for IDs:

  • gen_ai.guardian.id
  • gen_ai.security.target.id
  • gen_ai.security.policy.id

Would any of those fit your need?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better as an event (security finding event as proposed in this span)? Wondering why its a chat span?

Currently, the domain team is using event_id and it's as per their requirement. Also, in our case the event is generated elsewhere and not instrumentation side. We used chat span to use existing types in the genai-utils for now and manage span life cycle using it.

@nagkumar91
Copy link
Copy Markdown
Contributor Author

@nagkumar91 We have a similar use-case and when a security incident happens for a chat span we create a new span as a chat span and add an attributed called gen_ai.security.event_id. The value for this attribute can either be in the response body of the inspection call, which is separate from the actual LLM call, or can be in the response header when inspection happens along with the LLM call. Is it possible to add support for this attribute in here? Thanks!

Here's the sample of how we add it as of now - https://github.com/signalfx/splunk-otel-python-contrib/tree/main/instrumentation-genai/opentelemetry-instrumentation-aidefense#trace-integration

Would this be better as an event (security finding event as proposed in this span)? Wondering why its a chat span?

Comment thread docs/registry/attributes/gen-ai.md Outdated
| <a id="gen-ai-operation-name" href="#gen-ai-operation-name">`gen_ai.operation.name`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The name of the operation being performed. [4] | `chat`; `generate_content`; `text_completion` |
| <a id="gen-ai-output-messages" href="#gen-ai-output-messages">`gen_ai.output.messages`</a> | ![Development](https://img.shields.io/badge/-development-blue) | any | Messages returned by the model where each message represents a specific model response (choice, candidate). [5] | [<br>&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;"role": "assistant",<br>&nbsp;&nbsp;&nbsp;&nbsp;"parts": [<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"type": "text",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"content": "The weather in Paris is currently rainy with a temperature of 57°F."<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;],<br>&nbsp;&nbsp;&nbsp;&nbsp;"finish_reason": "stop"<br>&nbsp;&nbsp;}<br>] |
| <a id="gen-ai-output-type" href="#gen-ai-output-type">`gen_ai.output.type`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | Represents the content type requested by the client. [6] | `text`; `json`; `image` |
| <a id="gen-ai-guardian-id" href="#gen-ai-guardian-id">`gen_ai.guardian.id`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The unique identifier of the security guardian or guardrail. [3] | `guard_abc123`; `sgi5gkybzqak`; `content-filter-v2` |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, the draft currently mixes guardian (evaluator/service) vs guardrail (policy/config) identity (e.g., mapping provider guardrail IDs into gen_ai.guardian.id). I suggest keeping gen_ai.guardian.* for the evaluating component, and mapping guardrail/config identifiers (like aws.bedrock.guardrail.id) to gen_ai.security.policy.id (and/or on gen_ai.security.finding when policy-triggered). Otherwise it becomes hard/impossible to differentiate a guardian from a guardrail in traces, and we end up with confusing cardinality/semantics. This keeps cross provider correlation clean.

Copy link
Copy Markdown
Contributor Author

@nagkumar91 nagkumar91 Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @habibam! I've addressed this in the commit (88c269b):

  • Updated gen_ai.guardian.* attribute documentation to clarify these identify the evaluating service/component, not the policy
  • Added explicit mapping guidance:
    • Guardian = service doing evaluation (e.g., "Azure Content Safety", "Bedrock Guardrails")
    • Policy = configuration being applied (use gen_ai.security.policy.id for ARNs, blocklist IDs, etc.)
  • Updated examples to be clearer

This should make cross-provider correlation cleaner by keeping guardian and guardrail/policy semantics distinct. Let me know if the updated documentation captures your concern!

# MCP Guardian Adapter
# ============================================================================

class MCPGuardianAdapter(BaseGuardianAdapter[MCPContext]):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add in an example of generating a span from the response of elicitation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in commit 88c269b!

New elicitation guard methods:

  • guard_elicitation_request() - Guards outbound requests to user (prevents info leakage, excessive elicitation)
  • guard_elicitation_response() - Guards user's response (detects PII, injection attempts)

Both map to target_type=message since they're user-facing interactions.

See the updated module docstring and main examples (tests 6-9) for usage patterns.

…ools

- Remove redundant files from git (kept locally): genai_guardrail_instrumentation_prototype.py, demo_chat.py, demo_tools.py
- Reduce framework adapters from 6 to 2 (keep LangChain + MCP, others preserved locally)
- Move trace viewer utilities to tools/ directory
- Consolidate README from 750+ lines to ~165 lines
- Update .gitignore to ignore archived files
…CP elicitation

- Clarify guardian vs guardrail semantics in registry.yaml:
  - gen_ai.guardian.* is for the evaluating service/component
  - gen_ai.security.policy.* is for configuration/policy identifiers
  - Added mapping guidance for AWS Bedrock, Azure Content Safety
- Add gen_ai.security.external_event_id attribute for SIEM correlation
- Add MCP elicitation guard methods:
  - guard_elicitation_request: guard outbound requests to user
  - guard_elicitation_response: guard user's input responses
- Regenerate markdown files from updated YAML

Addresses feedback from @habibam and @adityamehra
@nagkumar91 nagkumar91 requested a review from a team as a code owner February 6, 2026 18:13
@nagkumar91 nagkumar91 force-pushed the gen-ai-security-guardian branch from 81f1c73 to 1cb3e68 Compare February 6, 2026 18:24
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The markdownlint CI gate requires doc files and folders to use hyphens,
not underscores. Renamed security_implementation_gen_ai_spec.md to
security-implementation-gen-ai-spec.md and updated the reference link.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@alexgarden-mnemom alexgarden-mnemom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this, Nagkumar. We've been running shipping OTel exporters for AI agent safety checks (@mnemom/aip-otel-exporter on npm, aip-otel-exporter on PyPI) with production traces in Grafana Cloud Tempo — 50+ span attributes across six span types.

A few suggestions based on what we've learned from production:

1. Dedicated span type: Making safety checks a dedicated safety_check span operation (child of invoke_agent or chat) gives you independent observability — you can measure safety check latency, build verdict distribution dashboards, and alert without parsing parent spans.

2. Three-value verdict: Boolean triggered misses the middle ground. In production, we've found every major provider has intermediate states — Azure severity 2-5, Google LOW/MEDIUM/HIGH, OpenAI varying confidence. A pass/review/fail enum maps cleanly
to all of them.

3. Structured concern events: When a guardrail fires, operators need structured detail. We emit gen_ai.safety.concern events (one per finding) with category, severity, and description. Gives you queryable data vs a boolean.

4. Policy model: Beyond guardrail.id — a full policy model (id/name/version/mode) covers enterprise deployments where teams roll out governance gradually. policy.mode (monitor/warn/enforce/off) is critical for staged rollouts.

5. Continuous score: A safety.score (0.0-1.0) alongside the verdict gives you a continuous signal for trending and thresholding. Every provider has a numeric score underneath their categorical output.

6. Metrics: A duration histogram and verdict counter would let teams build standard dashboards for safety check performance without custom queries.

We have a full provider mapping showing how these conventions apply across the ecosystem — AWS Bedrock Guardrails, Azure Content Safety, OpenAI Moderation, Google Safety Ratings, and our own AIP/AAP protocols. The proposed attributes map cleanly to all of them.

I think there's a path to combining your guardrail-specific work with a broader safety_check framework that covers guardrails, content filtering, integrity analysis, alignment verification, and policy evaluation as check types under a unified convention.

Copy link
Copy Markdown
Member

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got some feedback from the GCP Model Armor team, which I'm sharing here

Comment thread model/gen-ai/registry.yaml Outdated
Comment thread model/gen-ai/spans.yaml
Comment on lines +433 to +437
- id: span.gen_ai.apply_guardrail.internal
type: span
stability: development
span_kind: internal
brief: Describes a security guardian evaluation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of aligning on guardrail throughout this proposal? The GCP Model Armor team found "guardian" a bit esoteric and I agree. At least we should be consistent throughout.

Lmk if this has already been discussed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — we're discussing this internally with the team. "Guardrail" is clearly the more established industry term (AWS Bedrock Guardrails, NVIDIA NeMo Guardrails, Guardrails AI). The operation name apply_guardrail already uses it, so renaming gen_ai.guardian.*gen_ai.guardrail.* would make everything consistent.

Will follow up once we have internal alignment — likely agreeing with your suggestion.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that guardian is interesting :) We're moving towards "controls" as the canonical name instead of guardrails/content filters back in the day.

Comment thread model/gen-ai/registry.yaml
Comment thread model/gen-ai/registry.yaml
Comment thread model/gen-ai/spans.yaml
nagkumar91 and others added 2 commits March 17, 2026 09:59
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Aligns the boolean attribute name with the gen_ai.security.decision.type
enum value 'modify'. The broader term 'modified' correctly covers
redaction, sanitization, and rewriting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nagkumar91
Copy link
Copy Markdown
Contributor Author

Supporting Materials

The non-normative implementation spec, prototype code, and story scripts have been moved to gists to keep this PR focused on the YAML model + generated docs.

Gist Description
📄 Non-Normative Implementation Spec Detailed spec with framework mappings, attribute definitions, and trace examples
🔧 Prototype Framework Code OTel bootstrap, guardian utils, base adapter, and LangChain guardian adapter
📖 Story Scripts & Trace Coverage 5 end-to-end scenarios (enterprise RAG, multi-tenant, multi-agent, progressive jailbreak, error handling) with trace coverage matrix

Previously these were in draft PR #3434, which can now be closed.

nagkumar91 and others added 2 commits March 17, 2026 11:19
The non-normative implementation spec is now hosted as a gist:
https://gist.github.com/nagkumar91/95efa05449c72b4c95958190f26f13ba

This keeps the PR focused on the YAML model + generated docs,
matching the pattern used by PR open-telemetry#3250 (memory operations).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
`gen_ai.operation.name` SHOULD be `apply_guardrail`.

**Span name** SHOULD be `apply_guardrail {gen_ai.guardian.name}`.
When `gen_ai.guardian.name` is not available, it SHOULD be `apply_guardrail {gen_ai.security.target.type}`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add : "Semantic conventions for individual guardians and frameworks MAY specify different span name format."

When `gen_ai.guardian.name` is not available, it SHOULD be `apply_guardrail {gen_ai.security.target.type}`.

Guardian spans SHOULD be children of the operation span they are protecting (for example,
`span.gen_ai.inference.client` or `span.gen_ai.execute_tool.internal`).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's better to leave a reference to the markdown sections here rather than the name of the registry.

Suggested change
`span.gen_ai.inference.client` or `span.gen_ai.execute_tool.internal`).
[inference](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md#inference) or [execute tool span](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md#execute-tool-span)).


Multiple guardian spans MAY exist under a single operation span if multiple guardians are chained.

**Span kind** SHOULD be `INTERNAL`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span kind is suitable to be CLIENT in case of applying the guardian running in another process.

Comment thread docs/gen-ai/gen-ai-security.md

- id: gen_ai.security.target.type
stability: development
type: string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't member work here?

Comment thread model/gen-ai/spans.yaml Outdated
1. Span name: apply_guardrail {guardian.name} {target.type}
   Makes chained guardrail spans unique (e.g., 'apply_guardrail PII Filter input'
   vs 'apply_guardrail PII Filter output'). Addresses feedback from @aabmass
   and @habibam.

2. guardian.provider.name: Changed from free-form string to open enum with
   well-known values (azure.ai.content_safety, aws.bedrock, gcp.model_armor).
   Keeps guardrail provider distinct from inference provider (gen_ai.provider.name).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Nick-heo-eg
Copy link
Copy Markdown

This looks great, modeling guardrail enforcement as a separate concern makes a lot of sense.

One thing I noticed while reading through this: there seems to be a gap between evaluation and guardrail in terms of what actually gets captured in telemetry.

Right now:

  • evaluation handles measurement signals (score, label)
  • guardrail handles enforcement actions (allow, deny, modify)

But there's no clear way to represent the discrete result an evaluator produces when it isn't tied to any enforcement decision.

For example:

  • an evaluator classifies content as "safe/unsafe" or "pass/fail"
  • that result exists on its own, without triggering a guardrail
  • and there's no numeric score involved

In that situation, the output isn't a continuous score, and it isn't an enforcement action either, it's a categorical evaluation result that isn't directly captured today.

There may be room for a small, vendor-neutral way to capture this kind of discrete evaluation outcome, sitting between score and guardrail decision.

Comment on lines +426 to +446
members:
- id: allow
value: "allow"
brief: Request permitted to proceed without modification.
stability: development
- id: deny
value: "deny"
brief: Request denied and operation halted.
stability: development
- id: modify
value: "modify"
brief: Request permitted with modifications (redaction, sanitization).
stability: development
- id: warn
value: "warn"
brief: Request permitted but flagged for review.
stability: development
- id: audit
value: "audit"
brief: Request logged for audit purposes only, no enforcement.
stability: development
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guardrail Services describe a verdict that should happen but don't enforce it. We should tighten the language here that these are the verdicts, which is not the same as the enforcements.

E.g. AACS can return a warn verdict, but the caller may choose to handle it as a block (a typical use case in production). In this case the verdict is warn, but the enforcement is block.

- id: gen_ai.security.decision.type
stability: development
type:
members:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an "escalate" verdict for human in the loop/approval workflows.

- id: gen_ai.guardian.provider.name
stability: development
type:
members:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a string instead? We should support cases where someone is running custom guardrails implementation outside of the major providers.

Comment on lines +480 to +483
- `memory_store`: Data being written to memory.
- `memory_retrieve`: Data being read from memory.
- `knowledge_query`: Knowledge/RAG query being sent.
- `knowledge_result`: Knowledge/RAG results being returned.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these tool calls also? tool call should also be split into input and output.

stability: development
type:
members:
- id: none
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No risk is a none-existent state. Suggest to change to unspecified/unknown or something along those lines.

- 'jailbreak'
- 'custom:financial_advice_violation'
- 'azure:hate_speech'
- id: gen_ai.security.risk.severity
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove severity and keep score. Severity means different things for different applications/use cases, while a score is more independent/factual.

type: double
brief: Numeric risk/confidence score (0.0 to 1.0).
examples: [0.85, 0.95, 0.42]
- id: gen_ai.security.risk.metadata
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intended use for this?

- ['field:bcc', 'pattern:email']
- ['count:2', 'position:output.content']

- id: gen_ai.security.policy.id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a detailed policy under the same identifier, this policy id might be underspecified since a section/part of the larger policy triggered the verdict. I think it's useful to have this metadata, not sure how to make it "tight enough".

Comment thread model/gen-ai/spans.yaml
Comment on lines +433 to +437
- id: span.gen_ai.apply_guardrail.internal
type: span
stability: development
span_kind: internal
brief: Describes a security guardian evaluation.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that guardian is interesting :) We're moving towards "controls" as the canonical name instead of guardrails/content filters back in the day.

@lmolkova lmolkova marked this pull request as draft April 14, 2026 16:06
@github-actions
Copy link
Copy Markdown

This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days.

@github-actions github-actions Bot added the Stale label Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Awaiting codeowners approval

Development

Successfully merging this pull request may close these issues.

9 participants