Skip to content

Commit 955fef2

Browse files
committed
update readme for metric override
1 parent 222f347 commit 955fef2

File tree

2 files changed

+34
-17
lines changed

2 files changed

+34
-17
lines changed

β€ŽREADME.mdβ€Ž

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -115,16 +115,18 @@ api:
115115
no_tools: null # Whether to bypass tools (optional)
116116
system_prompt: null # Custom system prompt (optional)
117117

118-
# Metrics Configuration with thresholds
118+
# Metrics Configuration with thresholds and defaults
119119
metrics_metadata:
120120
turn_level:
121-
"ragas:faithfulness":
122-
threshold: 0.8
123-
description: "How faithful the response is to the provided context"
124-
125121
"ragas:response_relevancy":
126122
threshold: 0.8
127123
description: "How relevant the response is to the question"
124+
default: true # Used by default when turn_metrics is null
125+
126+
"ragas:faithfulness":
127+
threshold: 0.8
128+
description: "How faithful the response is to the provided context"
129+
default: false # Only used when explicitly specified
128130

129131
"custom:tool_eval":
130132
description: "Tool call evaluation comparing expected vs actual tool calls (regex for arguments)"
@@ -160,16 +162,6 @@ visualization:
160162
- conversation_group_id: "test_conversation"
161163
description: "Sample evaluation"
162164
163-
# Turn-level metrics to evaluate
164-
turn_metrics:
165-
- "ragas:faithfulness"
166-
- "custom:answer_correctness"
167-
168-
# Metric-specific configuration
169-
turn_metrics_metadata:
170-
"ragas:faithfulness":
171-
threshold: 0.8
172-
173165
# Conversation-level metrics
174166
conversation_metrics:
175167
- "deepeval:conversation_completeness"
@@ -186,8 +178,23 @@ visualization:
186178
- OpenShift Virtualization is an extension of the OpenShift ...
187179
attachments: [] # Attachments (Optional)
188180
expected_response: OpenShift Virtualization is an extension of the OpenShift Container Platform that allows running virtual machines alongside containers
181+
182+
# Per-turn metrics (overrides system defaults)
183+
turn_metrics:
184+
- "ragas:faithfulness"
185+
- "custom:answer_correctness"
186+
187+
# Per-turn metric configuration
188+
turn_metrics_metadata:
189+
"ragas:faithfulness":
190+
threshold: 0.9 # Override system default
191+
# turn_metrics: null (omitted) β†’ Use system defaults (metrics with default=true)
192+
193+
- turn_id: id2
194+
query: Skip this turn evaluation
195+
turn_metrics: [] # Skip evaluation for this turn
189196
190-
- turn_id: id2
197+
- turn_id: id3
191198
query: How do I create a virtual machine in OpenShift Virtualization?
192199
response: null # Populated by API if enabled, otherwise provide
193200
contexts:
@@ -223,11 +230,21 @@ visualization:
223230
| `expected_response` | string | πŸ“‹ | Expected response for comparison | ❌ |
224231
| `expected_tool_calls` | list[list[dict]] | πŸ“‹ | Expected tool call sequences | ❌ |
225232
| `tool_calls` | list[list[dict]] | ❌ | Actual tool calls from API | βœ… (if API enabled) |
233+
| `turn_metrics` | list[string] | ❌ | Turn-specific metrics to evaluate | ❌ |
234+
| `turn_metrics_metadata` | dict | ❌ | Turn-specific metric configuration | ❌ |
226235

227236
Note: Context will be collected automatically in the future.
228237

229238
> πŸ“‹ **Required based on metrics**: Some fields are required only when using specific metrics
230239

240+
#### Metrics override behavior
241+
242+
| Override Value | Behavior |
243+
|---------------------|----------|
244+
| `null` (or omitted) | Use system defaults (metrics with `default: true`) |
245+
| `[]` (empty list) | Skip evaluation for this turn |
246+
| `["metric1", ...]` | Use specified metrics only |
247+
231248
Examples
232249
> - `expected_response`: Required for `custom:answer_correctness`
233250
> - `expected_tool_calls`: Required for `custom:tool_eval`

β€Žconfig/evaluation_data.yamlβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,4 +60,4 @@
6060
- turn_id: "2"
6161
query: "User Query 2"
6262
response: "API Response 2"
63-
# Use default metric from system config (Ex: response_relevancy)
63+
# turn_metrics: null (omitted) β†’ Use system defaults (metrics with default=true)

0 commit comments

Comments
Β (0)