.NET: Add Foundry Evaluation samples (Safety + Quality) by Copilot · Pull Request #3697 · microsoft/agent-framework

Copilot · 2026-02-05T13:43:13Z

Motivation and Context

Adds .NET evaluation samples for Foundry Agents, achieving parity with existing Python evaluation samples. Part of #3675 / #3440.

Description

This PR adds two new evaluation samples demonstrating how to assess agent safety and quality using Microsoft.Extensions.AI.Evaluation packages with Azure AI Foundry.

New Samples

Sample	Description
`FoundryAgents_Evaluations_Step01_RedTeaming`	Safety evaluation using `ContentHarmEvaluator`, `ViolenceEvaluator`, `HateAndUnfairnessEvaluator`, `ProtectedMaterialEvaluator`, `IndirectAttackEvaluator`
`FoundryAgents_Evaluations_Step02_SelfReflection`	Quality evaluation with self-reflection loop using `GroundednessEvaluator`, `RelevanceEvaluator`, `CoherenceEvaluator`

Key Changes

New evaluation samples following the FoundryAgents_Evaluations_StepXX_* naming convention (same level as other FoundryAgents samples)
Added evaluation NuGet packages to Directory.Packages.props:
- Microsoft.Extensions.AI.Evaluation 10.3.0
- Microsoft.Extensions.AI.Evaluation.Quality 10.3.0
- Microsoft.Extensions.AI.Evaluation.Safety 10.3.0-preview
Added projects to slnx solution file
Code conventions applied: explicit types (no var), collection expressions, DefaultAzureCredential, dotnet format clean
Robustness: try/finally blocks ensure agent cleanup even on evaluation failures
Step02 env var separation: AZURE_OPENAI_DEPLOYMENT_NAME for evaluator model (may differ from Foundry agent model)

Environment Variables

Variable	Used By	Description
`AZURE_FOUNDRY_PROJECT_ENDPOINT`	Both	Foundry project endpoint
`AZURE_FOUNDRY_PROJECT_DEPLOYMENT_NAME`	Both	Model for agent creation (default: gpt-4o-mini)
`AZURE_OPENAI_ENDPOINT`	Step02	Azure OpenAI endpoint for quality evaluators
`AZURE_OPENAI_DEPLOYMENT_NAME`	Step02	Model for evaluator LLM (falls back to agent model)

Regional Requirements

Step01 (Safety): Requires a region that supports content harm evaluation (East US 2, Sweden Central, France Central, US North Central, Switzerland West)
Step02 (Quality): Works in any region with Azure OpenAI deployment

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
dotnet format passes with no changes
Step02 verified end-to-end (Groundedness 5.0/5, Relevance 5.0, Coherence 4.0)
Step01 safety evaluators require a supported region for full testing
Is this a breaking change? No

Fixes .NET: Support for Foundry Observability and Evaluations #3675

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

…nted Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

- Rename Evaluation/Evaluation_StepXX to FoundryAgents_Evaluations_StepXX - Add evaluation projects to slnx - Fix var usage, apply dotnet format, use DefaultAzureCredential - Add try/finally for agent cleanup - Fix evaluator deployment name separation in Step02 - Update README references Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The .NET RedTeam API currently only supports model deployment targets via AzureOpenAIModelConfiguration. Agent-targeted red teaming with AzureAIAgentTarget is documented in concept docs but not yet available in the SDK's RedTeam constructor. Results appear in classic portal view. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

...ples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step02_SelfReflection/Program.cs

Clarify that this sample uses the classic Azure AI Foundry red teaming API (/redTeams/runs). The new Foundry portal uses a separate evaluation- based API not yet available in the .NET SDK. AzureAIAgentTarget exists in the SDK but is consumed by the Evaluation Taxonomy API, not RedTeam. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Pass full prompt (with context) to evaluator messages instead of just the question, so evaluator input matches what the agent received - Include previous response text in self-reflection refinement prompt so the LLM can meaningfully improve its answer across iterations - Inline CreateKnowledgeAgent helper (single use, single statement) - Add comment clarifying why RunCombinedQualityAndSafetyEvaluation intentionally passes only the question (no context) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Initial plan * Add Foundry evaluation samples for Red Teaming and Self-Reflection Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> * Refactor evaluation samples with real implementations in local functions Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> * Uncomment function signatures and bodies, keep only invocations commented Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> * Update Foundry evaluation samples with observability support * Restructure evaluation samples to follow FoundryAgents naming convention - Rename Evaluation/Evaluation_StepXX to FoundryAgents_Evaluations_StepXX - Add evaluation projects to slnx - Fix var usage, apply dotnet format, use DefaultAzureCredential - Add try/finally for agent cleanup - Fix evaluator deployment name separation in Step02 - Update README references Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rewrite Step01 to use Azure.AI.Projects RedTeam API and address review comments - Replace safety evaluator sample with actual Red Teaming using AIProjectClient.RedTeams - Use AttackStrategy (Easy, Moderate, Jailbreak) and RiskCategory from Azure.AI.Projects - Remove Microsoft.Extensions.AI.Evaluation.Safety dependency from Step01 - Add DefaultAzureCredential warning comments to Step02 - Remove unused bestResponse variable in Step02 - Add session isolation comments in self-reflection loop - Fix stale directory references in READMEs - Fix misleading evaluation overview link in main README Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add note about agent-targeted red teaming limitations in README The .NET RedTeam API currently only supports model deployment targets via AzureOpenAIModelConfiguration. Agent-targeted red teaming with AzureAIAgentTarget is documented in concept docs but not yet available in the SDK's RedTeam constructor. Results appear in classic portal view. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add classic Foundry disclaimer to red teaming sample README Clarify that this sample uses the classic Azure AI Foundry red teaming API (/redTeams/runs). The new Foundry portal uses a separate evaluation- based API not yet available in the .NET SDK. AzureAIAgentTarget exists in the SDK but is consumed by the Evaluation Taxonomy API, not RedTeam. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review comments on Step02 SelfReflection - Pass full prompt (with context) to evaluator messages instead of just the question, so evaluator input matches what the agent received - Include previous response text in self-reflection refinement prompt so the LLM can meaningfully improve its answer across iterations - Inline CreateKnowledgeAgent helper (single use, single statement) - Add comment clarifying why RunCombinedQualityAndSafetyEvaluation intentionally passes only the question (no context) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI assigned Copilot and rogerbarreto Feb 5, 2026

Copilot started work on behalf of rogerbarreto February 5, 2026 13:43 View session

Copilot AI changed the title ~~[WIP] Add .NET samples for Foundry observability and evaluations~~ .NET: Add Foundry Evaluation samples for Red Teaming and Self-Reflection Feb 5, 2026

Copilot AI requested a review from rogerbarreto February 5, 2026 13:59

Copilot finished work on behalf of rogerbarreto February 5, 2026 13:59

Copilot started work on behalf of rogerbarreto February 5, 2026 16:17 View session

Copilot AI changed the title ~~.NET: Add Foundry Evaluation samples for Red Teaming and Self-Reflection~~ Refactor evaluation samples: Replace instructional Console.WriteLine with real implementations Feb 5, 2026

Copilot finished work on behalf of rogerbarreto February 5, 2026 16:24

Copilot started work on behalf of rogerbarreto February 5, 2026 16:53 View session

Copilot AI changed the title ~~Refactor evaluation samples: Replace instructional Console.WriteLine with real implementations~~ Uncomment evaluation sample function definitions, keep only invocations commented Feb 5, 2026

Copilot finished work on behalf of rogerbarreto February 5, 2026 16:58

markwallace-microsoft added documentation Improvements or additions to documentation .NET labels Feb 10, 2026

rogerbarreto temporarily deployed to integration February 10, 2026 19:42 — with GitHub Actions Inactive

github-actions bot changed the title ~~Uncomment evaluation sample function definitions, keep only invocations commented~~ .NET: Uncomment evaluation sample function definitions, keep only invocations commented Feb 10, 2026

Copilot AI and others added 6 commits February 16, 2026 20:47

Initial plan

bcb947c

Add Foundry evaluation samples for Red Teaming and Self-Reflection

5948eff

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

Refactor evaluation samples with real implementations in local functions

1dd32f2

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

Uncomment function signatures and bodies, keep only invocations comme…

bfea890

…nted Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

Update Foundry evaluation samples with observability support

14ffe13

rogerbarreto force-pushed the copilot/support-foundry-observability branch from 343b0df to af12ba1 Compare February 16, 2026 20:56

rogerbarreto temporarily deployed to integration February 16, 2026 20:57 — with GitHub Actions Inactive

rogerbarreto changed the title ~~.NET: Uncomment evaluation sample function definitions, keep only invocations commented~~ .NET: Add Foundry Evaluation samples (Safety + Quality) Feb 16, 2026

rogerbarreto marked this pull request as ready for review February 16, 2026 20:59

Copilot AI review requested due to automatic review settings February 16, 2026 20:59

Copilot started reviewing on behalf of rogerbarreto February 16, 2026 20:59 View session

rogerbarreto temporarily deployed to integration February 16, 2026 21:21 — with GitHub Actions Inactive

westey-m reviewed Feb 17, 2026

View reviewed changes

...ples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step02_SelfReflection/Program.cs Outdated Show resolved Hide resolved

westey-m reviewed Feb 17, 2026

View reviewed changes

...ples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step02_SelfReflection/Program.cs Show resolved Hide resolved

westey-m approved these changes Feb 17, 2026

View reviewed changes

rogerbarreto temporarily deployed to integration February 17, 2026 11:00 — with GitHub Actions Inactive

SergeyMenshykh approved these changes Feb 17, 2026

View reviewed changes

rogerbarreto temporarily deployed to integration February 17, 2026 11:22 — with GitHub Actions Inactive

Merge branch 'main' into copilot/support-foundry-observability

8b154a1

rogerbarreto temporarily deployed to integration February 18, 2026 12:21 — with GitHub Actions Inactive

rogerbarreto enabled auto-merge February 18, 2026 12:22

SergeyMenshykh approved these changes Feb 18, 2026

View reviewed changes

rogerbarreto added this pull request to the merge queue Feb 18, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 18, 2026

westey-m approved these changes Feb 18, 2026

View reviewed changes

rogerbarreto added this pull request to the merge queue Feb 18, 2026

Merged via the queue into main with commit 4b3df9a Feb 18, 2026
15 checks passed

crickman deleted the copilot/support-foundry-observability branch February 19, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.NET: Add Foundry Evaluation samples (Safety + Quality)#3697

.NET: Add Foundry Evaluation samples (Safety + Quality)#3697
rogerbarreto merged 11 commits intomainfrom
copilot/support-foundry-observability

Copilot AI commented Feb 5, 2026 •

edited by rogerbarreto

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Copilot AI commented Feb 5, 2026 • edited by rogerbarreto Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

New Samples

Key Changes

Environment Variables

Regional Requirements

Contribution Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Copilot AI commented Feb 5, 2026 •

edited by rogerbarreto

Loading