Skip to content

.NET: Add Foundry Evaluation samples (Safety + Quality)#3697

Merged
rogerbarreto merged 11 commits intomainfrom
copilot/support-foundry-observability
Feb 18, 2026
Merged

.NET: Add Foundry Evaluation samples (Safety + Quality)#3697
rogerbarreto merged 11 commits intomainfrom
copilot/support-foundry-observability

Conversation

Copy link
Contributor

Copilot AI commented Feb 5, 2026

Motivation and Context

Adds .NET evaluation samples for Foundry Agents, achieving parity with existing Python evaluation samples. Part of #3675 / #3440.

Description

This PR adds two new evaluation samples demonstrating how to assess agent safety and quality using Microsoft.Extensions.AI.Evaluation packages with Azure AI Foundry.

New Samples

Sample Description
FoundryAgents_Evaluations_Step01_RedTeaming Safety evaluation using ContentHarmEvaluator, ViolenceEvaluator, HateAndUnfairnessEvaluator, ProtectedMaterialEvaluator, IndirectAttackEvaluator
FoundryAgents_Evaluations_Step02_SelfReflection Quality evaluation with self-reflection loop using GroundednessEvaluator, RelevanceEvaluator, CoherenceEvaluator

Key Changes

  • New evaluation samples following the FoundryAgents_Evaluations_StepXX_* naming convention (same level as other FoundryAgents samples)
  • Added evaluation NuGet packages to Directory.Packages.props:
    • Microsoft.Extensions.AI.Evaluation 10.3.0
    • Microsoft.Extensions.AI.Evaluation.Quality 10.3.0
    • Microsoft.Extensions.AI.Evaluation.Safety 10.3.0-preview
  • Added projects to slnx solution file
  • Code conventions applied: explicit types (no var), collection expressions, DefaultAzureCredential, dotnet format clean
  • Robustness: try/finally blocks ensure agent cleanup even on evaluation failures
  • Step02 env var separation: AZURE_OPENAI_DEPLOYMENT_NAME for evaluator model (may differ from Foundry agent model)

Environment Variables

Variable Used By Description
AZURE_FOUNDRY_PROJECT_ENDPOINT Both Foundry project endpoint
AZURE_FOUNDRY_PROJECT_DEPLOYMENT_NAME Both Model for agent creation (default: gpt-4o-mini)
AZURE_OPENAI_ENDPOINT Step02 Azure OpenAI endpoint for quality evaluators
AZURE_OPENAI_DEPLOYMENT_NAME Step02 Model for evaluator LLM (falls back to agent model)

Regional Requirements

  • Step01 (Safety): Requires a region that supports content harm evaluation (East US 2, Sweden Central, France Central, US North Central, Switzerland West)
  • Step02 (Quality): Works in any region with Azure OpenAI deployment

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • dotnet format passes with no changes
  • Step02 verified end-to-end (Groundedness 5.0/5, Relevance 5.0, Coherence 4.0)
  • Step01 safety evaluators require a supported region for full testing
  • Is this a breaking change? No

Copilot AI changed the title [WIP] Add .NET samples for Foundry observability and evaluations .NET: Add Foundry Evaluation samples for Red Teaming and Self-Reflection Feb 5, 2026
Copilot AI requested a review from rogerbarreto February 5, 2026 13:59
Copilot AI changed the title .NET: Add Foundry Evaluation samples for Red Teaming and Self-Reflection Refactor evaluation samples: Replace instructional Console.WriteLine with real implementations Feb 5, 2026
Copilot AI changed the title Refactor evaluation samples: Replace instructional Console.WriteLine with real implementations Uncomment evaluation sample function definitions, keep only invocations commented Feb 5, 2026
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation .NET labels Feb 10, 2026
@github-actions github-actions bot changed the title Uncomment evaluation sample function definitions, keep only invocations commented .NET: Uncomment evaluation sample function definitions, keep only invocations commented Feb 10, 2026
Copilot AI and others added 6 commits February 16, 2026 20:47
Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
…nted

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
- Rename Evaluation/Evaluation_StepXX to FoundryAgents_Evaluations_StepXX
- Add evaluation projects to slnx
- Fix var usage, apply dotnet format, use DefaultAzureCredential
- Add try/finally for agent cleanup
- Fix evaluator deployment name separation in Step02
- Update README references

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rogerbarreto rogerbarreto force-pushed the copilot/support-foundry-observability branch from 343b0df to af12ba1 Compare February 16, 2026 20:56
@rogerbarreto rogerbarreto changed the title .NET: Uncomment evaluation sample function definitions, keep only invocations commented .NET: Add Foundry Evaluation samples (Safety + Quality) Feb 16, 2026
@rogerbarreto rogerbarreto marked this pull request as ready for review February 16, 2026 20:59
Copilot AI review requested due to automatic review settings February 16, 2026 20:59
The .NET RedTeam API currently only supports model deployment targets
via AzureOpenAIModelConfiguration. Agent-targeted red teaming with
AzureAIAgentTarget is documented in concept docs but not yet available
in the SDK's RedTeam constructor. Results appear in classic portal view.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that this sample uses the classic Azure AI Foundry red teaming
API (/redTeams/runs). The new Foundry portal uses a separate evaluation-
based API not yet available in the .NET SDK. AzureAIAgentTarget exists
in the SDK but is consumed by the Evaluation Taxonomy API, not RedTeam.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Pass full prompt (with context) to evaluator messages instead of just
  the question, so evaluator input matches what the agent received
- Include previous response text in self-reflection refinement prompt
  so the LLM can meaningfully improve its answer across iterations
- Inline CreateKnowledgeAgent helper (single use, single statement)
- Add comment clarifying why RunCombinedQualityAndSafetyEvaluation
  intentionally passes only the question (no context)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rogerbarreto rogerbarreto added this pull request to the merge queue Feb 18, 2026
github-merge-queue bot pushed a commit that referenced this pull request Feb 18, 2026
* Initial plan

* Add Foundry evaluation samples for Red Teaming and Self-Reflection

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

* Refactor evaluation samples with real implementations in local functions

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

* Uncomment function signatures and bodies, keep only invocations commented

Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>

* Update Foundry evaluation samples with observability support

* Restructure evaluation samples to follow FoundryAgents naming convention

- Rename Evaluation/Evaluation_StepXX to FoundryAgents_Evaluations_StepXX
- Add evaluation projects to slnx
- Fix var usage, apply dotnet format, use DefaultAzureCredential
- Add try/finally for agent cleanup
- Fix evaluator deployment name separation in Step02
- Update README references

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rewrite Step01 to use Azure.AI.Projects RedTeam API and address review comments

- Replace safety evaluator sample with actual Red Teaming using AIProjectClient.RedTeams
- Use AttackStrategy (Easy, Moderate, Jailbreak) and RiskCategory from Azure.AI.Projects
- Remove Microsoft.Extensions.AI.Evaluation.Safety dependency from Step01
- Add DefaultAzureCredential warning comments to Step02
- Remove unused bestResponse variable in Step02
- Add session isolation comments in self-reflection loop
- Fix stale directory references in READMEs
- Fix misleading evaluation overview link in main README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add note about agent-targeted red teaming limitations in README

The .NET RedTeam API currently only supports model deployment targets
via AzureOpenAIModelConfiguration. Agent-targeted red teaming with
AzureAIAgentTarget is documented in concept docs but not yet available
in the SDK's RedTeam constructor. Results appear in classic portal view.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add classic Foundry disclaimer to red teaming sample README

Clarify that this sample uses the classic Azure AI Foundry red teaming
API (/redTeams/runs). The new Foundry portal uses a separate evaluation-
based API not yet available in the .NET SDK. AzureAIAgentTarget exists
in the SDK but is consumed by the Evaluation Taxonomy API, not RedTeam.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review comments on Step02 SelfReflection

- Pass full prompt (with context) to evaluator messages instead of just
  the question, so evaluator input matches what the agent received
- Include previous response text in self-reflection refinement prompt
  so the LLM can meaningfully improve its answer across iterations
- Inline CreateKnowledgeAgent helper (single use, single statement)
- Add comment clarifying why RunCombinedQualityAndSafetyEvaluation
  intentionally passes only the question (no context)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation .NET

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: Support for Foundry Observability and Evaluations

6 participants