Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .chloggen/genai-judgment-boundary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
change_type: enhancement
component: gen_ai
note: Add `gen_ai.evaluation.outcome` and `gen_ai.evaluation.multiple_outcomes` attributes for minimal evaluation metadata observability.
issues: [3336]
1 change: 1 addition & 0 deletions .github/workflows/check-changes-ownership.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ jobs:
dir_names: "true"
files: model/**
separator: ","
base_sha: ${{ github.event.pull_request.base.sha }}

validate-area-ownership:
runs-on: ubuntu-latest
Expand Down
23 changes: 23 additions & 0 deletions docs/gen-ai/gen-ai-events.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ linkTitle: Events

- [Event: `gen_ai.client.inference.operation.details`](#event-gen_aiclientinferenceoperationdetails)
- [Event: `gen_ai.evaluation.result`](#event-gen_aievaluationresult)
- [Evaluation Outcome Attributes](#evaluation-outcome-attributes)
- [Example](#example)

<!-- tocstop -->

Expand Down Expand Up @@ -254,6 +256,8 @@ This event captures the result of evaluating GenAI output for quality, accuracy,
| [`gen_ai.evaluation.score.label`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` if applicable | string | Human readable label for evaluation. [2] | `relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail` |
| [`gen_ai.evaluation.score.value`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` if applicable | double | The evaluation score returned by the evaluator. | `4.0` |
| [`gen_ai.evaluation.explanation`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | A free-form explanation for the assigned score provided by the evaluator. | `The response is factually accurate but lacks sufficient detail to fully address the question.` |
| [`gen_ai.evaluation.multiple_outcomes`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | boolean | Indicates whether the evaluation process assessed multiple outcome categories or labels. | `true` |
| [`gen_ai.evaluation.outcome`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | The evaluation outcome label assigned by the evaluator. | `pass`; `fail`; `allow`; `block` |
| [`gen_ai.response.id`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` when available | string | The unique identifier for the completion. [3] | `chatcmpl-123` |

**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI Evaluation provider or the client library,
Expand All @@ -278,4 +282,23 @@ event with the corresponding operation when span id is not available.
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Evaluation Outcome Attributes

The evaluation outcome attributes provide metadata about evaluations where implementations may have assessed multiple categories or outcome labels.

#### Example

```json
{
"name": "gen_ai.evaluation.result",
"attributes": {
"gen_ai.evaluation.name": "ContentSafety",
"gen_ai.evaluation.score.value": 0.85,
"gen_ai.evaluation.score.label": "pass",
"gen_ai.evaluation.multiple_outcomes": true,
"gen_ai.evaluation.outcome": "pass"
}
}
```

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
2 changes: 2 additions & 0 deletions docs/registry/attributes/gen-ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ This document defines the attributes used to describe telemetry in the context o
| <a id="gen-ai-data-source-id" href="#gen-ai-data-source-id">`gen_ai.data_source.id`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The data source identifier. [1] | `H7STPQYOND` |
| <a id="gen-ai-embeddings-dimension-count" href="#gen-ai-embeddings-dimension-count">`gen_ai.embeddings.dimension.count`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of dimensions the resulting output embeddings should have. | `512`; `1024` |
| <a id="gen-ai-evaluation-explanation" href="#gen-ai-evaluation-explanation">`gen_ai.evaluation.explanation`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | A free-form explanation for the assigned score provided by the evaluator. | `The response is factually accurate but lacks sufficient detail to fully address the question.` |
| <a id="gen-ai-evaluation-multiple-outcomes" href="#gen-ai-evaluation-multiple-outcomes">`gen_ai.evaluation.multiple_outcomes`</a> | ![Development](https://img.shields.io/badge/-development-blue) | boolean | Indicates whether the evaluation process assessed multiple outcome categories or labels. | `true` |
| <a id="gen-ai-evaluation-name" href="#gen-ai-evaluation-name">`gen_ai.evaluation.name`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The name of the evaluation metric used for the GenAI response. | `Relevance`; `IntentResolution` |
| <a id="gen-ai-evaluation-outcome" href="#gen-ai-evaluation-outcome">`gen_ai.evaluation.outcome`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The evaluation outcome label assigned by the evaluator. | `pass`; `fail`; `allow`; `block` |
| <a id="gen-ai-evaluation-score-label" href="#gen-ai-evaluation-score-label">`gen_ai.evaluation.score.label`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | Human readable label for evaluation. [2] | `relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail` |
| <a id="gen-ai-evaluation-score-value" href="#gen-ai-evaluation-score-value">`gen_ai.evaluation.score.value`</a> | ![Development](https://img.shields.io/badge/-development-blue) | double | The evaluation score returned by the evaluator. | `4.0` |
| <a id="gen-ai-input-messages" href="#gen-ai-input-messages">`gen_ai.input.messages`</a> | ![Development](https://img.shields.io/badge/-development-blue) | any | The chat history provided to the model as an input. [3] | [<br>&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;"role": "user",<br>&nbsp;&nbsp;&nbsp;&nbsp;"parts": [<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"type": "text",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"content": "Weather in Paris?"<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;]<br>&nbsp;&nbsp;},<br>&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;"role": "assistant",<br>&nbsp;&nbsp;&nbsp;&nbsp;"parts": [<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"type": "tool_call",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"id": "call_VSPygqKTWdrhaFErNvMV18Yl",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"name": "get_weather",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"arguments": {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"location": "Paris"<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;]<br>&nbsp;&nbsp;},<br>&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;"role": "tool",<br>&nbsp;&nbsp;&nbsp;&nbsp;"parts": [<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"type": "tool_call_response",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"id": " call_VSPygqKTWdrhaFErNvMV18Yl",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"result": "rainy, 57°F"<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;]<br>&nbsp;&nbsp;}<br>] |
Expand Down
4 changes: 4 additions & 0 deletions model/gen-ai/events.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ groups:
The `error.type` SHOULD match the error code returned by the Generative AI Evaluation provider or the client library,
the canonical name of exception that occurred, or another low-cardinality error identifier.
Instrumentations SHOULD document the list of errors they report.
- ref: gen_ai.evaluation.multiple_outcomes
requirement_level: recommended
- ref: gen_ai.evaluation.outcome
Comment thread
Nick-heo-eg marked this conversation as resolved.
requirement_level: recommended

- id: event.gen_ai.client.operation.exception
name: gen_ai.client.operation.exception
Expand Down
10 changes: 10 additions & 0 deletions model/gen-ai/registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -659,6 +659,16 @@ groups:
type: string
brief: A free-form explanation for the assigned score provided by the evaluator.
examples: ["The response is factually accurate but lacks sufficient detail to fully address the question."]
- id: gen_ai.evaluation.multiple_outcomes
stability: development
type: boolean
brief: Indicates whether the evaluation process assessed multiple outcome categories or labels.
examples: [true]
- id: gen_ai.evaluation.outcome
stability: development
type: string
brief: The evaluation outcome label assigned by the evaluator.
examples: ["pass", "fail", "allow", "block"]
- id: gen_ai.prompt.name
type: string
brief: The name of the prompt that uniquely identifies it.
Expand Down
Loading