Skip to content

Conversation

@MilesHolland
Copy link
Contributor

@MilesHolland MilesHolland commented Jan 8, 2025

Adds a new enum: Aggregation type, as well as a utility class which converts that enum to associated functions.

This enum is then leveraged in the base eval class to control the way that multi-turn conversations have their per-turn results aggregated into a single value. Also adds private functions to inject custom functions directly, and testing for all this.

In the future, this will likely be used to control how evaluation results across multiple evals are aggregated in the evaluate() function.

@MilesHolland MilesHolland requested a review from a team as a code owner January 8, 2025 20:50
@MilesHolland MilesHolland changed the title Jan25/eval/improvement/cs convo takes max Content safety evals aggregate max from conversations Jan 8, 2025
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jan 8, 2025
@azure-sdk
Copy link
Collaborator

azure-sdk commented Jan 9, 2025

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-ai-evaluation

@nagkumar91 nagkumar91 requested a review from Copilot January 9, 2025 21:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 10 changed files in this pull request and generated no comments.

Files not reviewed (5)
  • sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluate_test_data_conversation.jsonl: Language not supported
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_violence.py: Evaluated as low risk
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py: Evaluated as low risk
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_sexual.py: Evaluated as low risk
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py: Evaluated as low risk
Comments suppressed due to low confidence (3)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py:77

  • [nitpick] Update the docstring to reflect the correct class name if it is renamed to ConversationNumericAggregationType.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants._ConversationNumericAggregationType

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:41

  • Missing period at the end of the docstring.
Default is ~azure.ai.evaluation._constants.ConversationNumericAggregationType.MEAN.

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:42

  • The conversation_aggregation_type parameter should be explicitly mentioned in the constructor's docstring.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants.ConversationNumericAggregationType

@MilesHolland MilesHolland dismissed diondrapeck’s stale review January 14, 2025 19:39

Added change, but GH is still annoyed about it.

@MilesHolland MilesHolland merged commit 7f904a3 into Azure:main Jan 22, 2025
20 checks passed
w-javed pushed a commit that referenced this pull request Jan 23, 2025
* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip
w-javed pushed a commit to w-javed/azure-sdk-for-python that referenced this pull request Jan 23, 2025
* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip
w-javed added a commit that referenced this pull request Jan 27, 2025
* Azure AI Evaluation Release 1.2.0

* Azure AI Evaluation Release 1.2.0

* fix the intersphinx references for a new reference methodology (#39332)

* handle only deleted files in a <language> - pullrequest build (#39266)

Co-authored-by: Scott Beddall <[email protected]>

* fix tests weekly (#39338)

* [Storage] update perf tests core baseline (#39336)

* [Storage] update perf tests core baseline

* update storage file baselien

* [AutoRelease] t2-computeschedule-2025-01-10-50036(can only be merged by SDK owner) (#39105)

* code and test

* Update CHANGELOG.md for new model properties

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-quota-2025-01-16-93059(can only be merged by SDK owner) (#39215)

* code and test

* update testcases

* Update CHANGELOG.md to remove method details

* Update changelog for quota operations changes

* Update release date in CHANGELOG.md

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-servicenetworking-2025-01-21-47646(can only be merged by SDK owner) (#39322)

* code and test

* Remove duplicate method overloads from changelog

* Update CHANGELOG.md to remove instance variables

* Fix typo in changelog entry

* Update CHANGELOG for version 2.0.0

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* Fix urls (#39259)

* Fix urls (#39251)

* fix url (#39255)

* Fix urls (#39248)

* Fix urls (#39246)

* Content safety evals aggregate max from conversations (#39083)

* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip

* Fix urls (#39129)

* Fix urls (#39262)

* Sync eng/common directory with azure-sdk-tools for PR 9668 (#39347)

* Support incrementing semver prereleases with 'zero' versions

* Make tests more explicit

---------

Co-authored-by: Patrick Hallisey <[email protected]>

* [ServiceBus/EventHub] lock pending deliveries on send (#38067)

* [ServiceBus/EventHub] lock pending deliveries on send

* remove misc logging

* changelog + test

* fix tests, remove session lock

* remove logging from test

* sync with sb

* add todo in sender.py tfor temporary fix

* bumped versions after jan 22 patch release (#39355)

* Sync eng/common directory with azure-sdk-tools for PR 9656 (#39356)

* Added label handle sdk-gen pipeline template

Added common script to delete label from a PR

* Update eng/common/scripts/Invoke-GitHubAPI.ps1

Co-authored-by: Ben Broderick Phillips <[email protected]>

---------

Co-authored-by: ray chen <[email protected]>
Co-authored-by: Ben Broderick Phillips <[email protected]>

* Update package_utils.py (#39361)

* [AutoRelease] t2-web-2024-11-15-26155(can only be merged by SDK owner) (#38561)

* code and test

* Update app_service_environments_create_or_update_multi_role_pool.py

* udpate version

* update-testcase

* update testcases

* update format

* Update CHANGELOG.md

* Update CHANGELOG.md

* update version

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: msyyc <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* fix: Loosen psutil version requirement (#39354)

* Enable sample type checking for cosmos (#39334)

This is already passing so enabling in CI so we can continue to validate samples with mypy

* update change log

* change date format

* change date format

* change date format

---------

Co-authored-by: Scott Beddall <[email protected]>
Co-authored-by: Azure SDK Bot <[email protected]>
Co-authored-by: Scott Beddall <[email protected]>
Co-authored-by: Krista Pratico <[email protected]>
Co-authored-by: swathipil <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: Xiang Yan <[email protected]>
Co-authored-by: MilesHolland <[email protected]>
Co-authored-by: Patrick Hallisey <[email protected]>
Co-authored-by: Peter Wu <[email protected]>
Co-authored-by: ray chen <[email protected]>
Co-authored-by: Ben Broderick Phillips <[email protected]>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: msyyc <[email protected]>
Co-authored-by: kdestin <[email protected]>
allenkim0129 pushed a commit to allenkim0129/azure-sdk-for-python that referenced this pull request Jan 27, 2025
* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip
allenkim0129 pushed a commit to allenkim0129/azure-sdk-for-python that referenced this pull request Jan 27, 2025
* Azure AI Evaluation Release 1.2.0

* Azure AI Evaluation Release 1.2.0

* fix the intersphinx references for a new reference methodology (Azure#39332)

* handle only deleted files in a <language> - pullrequest build (Azure#39266)

Co-authored-by: Scott Beddall <[email protected]>

* fix tests weekly (Azure#39338)

* [Storage] update perf tests core baseline (Azure#39336)

* [Storage] update perf tests core baseline

* update storage file baselien

* [AutoRelease] t2-computeschedule-2025-01-10-50036(can only be merged by SDK owner) (Azure#39105)

* code and test

* Update CHANGELOG.md for new model properties

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-quota-2025-01-16-93059(can only be merged by SDK owner) (Azure#39215)

* code and test

* update testcases

* Update CHANGELOG.md to remove method details

* Update changelog for quota operations changes

* Update release date in CHANGELOG.md

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-servicenetworking-2025-01-21-47646(can only be merged by SDK owner) (Azure#39322)

* code and test

* Remove duplicate method overloads from changelog

* Update CHANGELOG.md to remove instance variables

* Fix typo in changelog entry

* Update CHANGELOG for version 2.0.0

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* Fix urls (Azure#39259)

* Fix urls (Azure#39251)

* fix url (Azure#39255)

* Fix urls (Azure#39248)

* Fix urls (Azure#39246)

* Content safety evals aggregate max from conversations (Azure#39083)

* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip

* Fix urls (Azure#39129)

* Fix urls (Azure#39262)

* Sync eng/common directory with azure-sdk-tools for PR 9668 (Azure#39347)

* Support incrementing semver prereleases with 'zero' versions

* Make tests more explicit

---------

Co-authored-by: Patrick Hallisey <[email protected]>

* [ServiceBus/EventHub] lock pending deliveries on send (Azure#38067)

* [ServiceBus/EventHub] lock pending deliveries on send

* remove misc logging

* changelog + test

* fix tests, remove session lock

* remove logging from test

* sync with sb

* add todo in sender.py tfor temporary fix

* bumped versions after jan 22 patch release (Azure#39355)

* Sync eng/common directory with azure-sdk-tools for PR 9656 (Azure#39356)

* Added label handle sdk-gen pipeline template

Added common script to delete label from a PR

* Update eng/common/scripts/Invoke-GitHubAPI.ps1

Co-authored-by: Ben Broderick Phillips <[email protected]>

---------

Co-authored-by: ray chen <[email protected]>
Co-authored-by: Ben Broderick Phillips <[email protected]>

* Update package_utils.py (Azure#39361)

* [AutoRelease] t2-web-2024-11-15-26155(can only be merged by SDK owner) (Azure#38561)

* code and test

* Update app_service_environments_create_or_update_multi_role_pool.py

* udpate version

* update-testcase

* update testcases

* update format

* Update CHANGELOG.md

* Update CHANGELOG.md

* update version

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: msyyc <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* fix: Loosen psutil version requirement (Azure#39354)

* Enable sample type checking for cosmos (Azure#39334)

This is already passing so enabling in CI so we can continue to validate samples with mypy

* update change log

* change date format

* change date format

* change date format

---------

Co-authored-by: Scott Beddall <[email protected]>
Co-authored-by: Azure SDK Bot <[email protected]>
Co-authored-by: Scott Beddall <[email protected]>
Co-authored-by: Krista Pratico <[email protected]>
Co-authored-by: swathipil <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: Xiang Yan <[email protected]>
Co-authored-by: MilesHolland <[email protected]>
Co-authored-by: Patrick Hallisey <[email protected]>
Co-authored-by: Peter Wu <[email protected]>
Co-authored-by: ray chen <[email protected]>
Co-authored-by: Ben Broderick Phillips <[email protected]>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: msyyc <[email protected]>
Co-authored-by: kdestin <[email protected]>
l0lawrence pushed a commit to l0lawrence/azure-sdk-for-python that referenced this pull request Feb 19, 2025
* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip
l0lawrence pushed a commit to l0lawrence/azure-sdk-for-python that referenced this pull request Feb 19, 2025
* Azure AI Evaluation Release 1.2.0

* Azure AI Evaluation Release 1.2.0

* fix the intersphinx references for a new reference methodology (Azure#39332)

* handle only deleted files in a <language> - pullrequest build (Azure#39266)

Co-authored-by: Scott Beddall <[email protected]>

* fix tests weekly (Azure#39338)

* [Storage] update perf tests core baseline (Azure#39336)

* [Storage] update perf tests core baseline

* update storage file baselien

* [AutoRelease] t2-computeschedule-2025-01-10-50036(can only be merged by SDK owner) (Azure#39105)

* code and test

* Update CHANGELOG.md for new model properties

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-quota-2025-01-16-93059(can only be merged by SDK owner) (Azure#39215)

* code and test

* update testcases

* Update CHANGELOG.md to remove method details

* Update changelog for quota operations changes

* Update release date in CHANGELOG.md

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-servicenetworking-2025-01-21-47646(can only be merged by SDK owner) (Azure#39322)

* code and test

* Remove duplicate method overloads from changelog

* Update CHANGELOG.md to remove instance variables

* Fix typo in changelog entry

* Update CHANGELOG for version 2.0.0

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* Fix urls (Azure#39259)

* Fix urls (Azure#39251)

* fix url (Azure#39255)

* Fix urls (Azure#39248)

* Fix urls (Azure#39246)

* Content safety evals aggregate max from conversations (Azure#39083)

* add convo agg type, and have harm evals use max

* analysis

* correct enum name in docs

* refactor checked enum into function field

* cl and analysis

* change enum name and update CL

* change function names to private, allow agg type retrieval

* PR comments

* test serialization

* CL

* CI adjustment

* try again

* perf

* skip perf

* remove skip

* Fix urls (Azure#39129)

* Fix urls (Azure#39262)

* Sync eng/common directory with azure-sdk-tools for PR 9668 (Azure#39347)

* Support incrementing semver prereleases with 'zero' versions

* Make tests more explicit

---------

Co-authored-by: Patrick Hallisey <[email protected]>

* [ServiceBus/EventHub] lock pending deliveries on send (Azure#38067)

* [ServiceBus/EventHub] lock pending deliveries on send

* remove misc logging

* changelog + test

* fix tests, remove session lock

* remove logging from test

* sync with sb

* add todo in sender.py tfor temporary fix

* bumped versions after jan 22 patch release (Azure#39355)

* Sync eng/common directory with azure-sdk-tools for PR 9656 (Azure#39356)

* Added label handle sdk-gen pipeline template

Added common script to delete label from a PR

* Update eng/common/scripts/Invoke-GitHubAPI.ps1

Co-authored-by: Ben Broderick Phillips <[email protected]>

---------

Co-authored-by: ray chen <[email protected]>
Co-authored-by: Ben Broderick Phillips <[email protected]>

* Update package_utils.py (Azure#39361)

* [AutoRelease] t2-web-2024-11-15-26155(can only be merged by SDK owner) (Azure#38561)

* code and test

* Update app_service_environments_create_or_update_multi_role_pool.py

* udpate version

* update-testcase

* update testcases

* update format

* Update CHANGELOG.md

* Update CHANGELOG.md

* update version

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: msyyc <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* fix: Loosen psutil version requirement (Azure#39354)

* Enable sample type checking for cosmos (Azure#39334)

This is already passing so enabling in CI so we can continue to validate samples with mypy

* update change log

* change date format

* change date format

* change date format

---------

Co-authored-by: Scott Beddall <[email protected]>
Co-authored-by: Azure SDK Bot <[email protected]>
Co-authored-by: Scott Beddall <[email protected]>
Co-authored-by: Krista Pratico <[email protected]>
Co-authored-by: swathipil <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: Xiang Yan <[email protected]>
Co-authored-by: MilesHolland <[email protected]>
Co-authored-by: Patrick Hallisey <[email protected]>
Co-authored-by: Peter Wu <[email protected]>
Co-authored-by: ray chen <[email protected]>
Co-authored-by: Ben Broderick Phillips <[email protected]>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: msyyc <[email protected]>
Co-authored-by: kdestin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants