Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding first version of AgentEval -- a framework for assessing task utility for LLM-powered applications #681

Merged
merged 109 commits into from
Nov 21, 2023

Conversation

julianakiseleva
Copy link
Contributor

@julianakiseleva julianakiseleva commented Nov 15, 2023

  • We introduce AgentEval — the first version of the framework to automatically assess task utility for an arbitrary application. It suggests criteria to explain task utility and then quantifies these criteria for logs of your system. AgentEval consists of two key components:

    • CriticAgent: This is an LLM-based agent that generates criteria to evaluate a given task.
    • QuantifierAgent: This agent quantifies the performance of any sample task based on the criteria designed by the CriticAgent.
  • We demonstrate the usage of our framework with the Math Problems dataset in notebook that allows for running on single logs as well as plotting the quantified estimated performance for a set of problems.

  • The model logs you can find them in agenteval branch

This PR has added:

  • notebook/agenteval_cq_math.ipynb to demonstrate AgentEval using the math problems
  • sample files that required to demonstrate the work of notebook in test/test_files/agenteval-in-out
  • website/blog/2012-11-11-AgentEval -- the blog post to explain the AgentEval

Related issue number

Checks

julianakiseleva and others added 30 commits November 14, 2023 01:15
@Narabzad
Copy link
Contributor

e.g., Previous Work is capitalized by other section titles use lower cas

@gagb done.

@sonichi sonichi enabled auto-merge November 21, 2023 01:40
auto-merge was automatically disabled November 21, 2023 01:47

Head branch was pushed to by a user without write access

@sonichi sonichi added this pull request to the merge queue Nov 21, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 21, 2023
@sonichi sonichi added this pull request to the merge queue Nov 21, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 21, 2023
@sonichi sonichi enabled auto-merge November 21, 2023 04:05
@sonichi sonichi added this pull request to the merge queue Nov 21, 2023
Merged via the queue into microsoft:main with commit 19c7da2 Nov 21, 2023
16 of 19 checks passed
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
…tility for LLM-powered applications (microsoft#681)

* add agenteval-notebook for math problems and the blog post about it

* update gitignore

* updates to notebook

* adding folder for the logs

* adding math problems logs

* adding folder for alfworld logs

* added limitiation and future work to blog post

* minor edits blog post

* adding changes

* reorg

* modify the main notebook

* modification of the main notebook

* remove wrong notebook

* uploading new notebook

* update agenteval notebook

* change the sample

* Update agenteval_cq_math.ipynb

* adding final changes to notebook

* updated framework picture

* Update index.mdx

* Update index.md

* Add files via upload

* updates to notebool

* revise the blog

* revise the blog

* update the agent img

* revise the blog

* revise the blog

* Excluded model logs from the main branch, you can find them in agenteval branch

* Fixed pre-commit formatting.

* Update website/blog/2023-11-11-AgentEval/index.mdx

Co-authored-by: Chi Wang <[email protected]>

* update gitignore

* update index.mdx

* update authors.yml by adding Negar and Julia

* remove md file

* remove md file

* update gitignore

* update authors file

* pre-commit checks

* pre-commit checks on authors.yml

* pre-commit checks on authors.yml

* update index.mdx

* update authors.yml by adding Negar and Julia

* updated the blog-post version 1

* updated the blog-post: TL;DR is ready

* updated the blog-post: first part of introduction is ready

* updated figures: typos on fig 1, changed terminology on the fig 2

* upadated the Framework part

* fixed redering issues

* upload zip file instead of single samples

* update prealgebra.zip

* update

* upload

* update z

* update naming

* update zip

* update the agenteval notebook

* update the notebook - removing unmercenary logs

* updated fig 1 and references to it

* updated fig 1

* incorporated PR comments

* merged agenteval branch

* final changes to the blog

* updated taxonomy

* update notebook

* minor changes to the blog

* Fixed formatting

* Update the link in agenteval_cq_math.ipynb

* update the blog and link in notebook

* Update index.mdx

* change folder name

* Changes to be committed:
	modified:    OAI_CONFIG_LIST_sample.txt

* add sample OAI file

* fix the url link to colab and typos

* fix the url link to colab and typos

* add authors

* update profile pic

* "update authors"

* fixing the problem in test_groupchat.py

* update the title lower case

* reverting changes in setup.py

* rerun pre-commit

---------

Co-authored-by: Negar Arabzadeh <[email protected]>
Co-authored-by: Julia Kiseleva <[email protected]>
Co-authored-by: afourney <[email protected]>
Co-authored-by: Chi Wang <[email protected]>
Co-authored-by: Qingyun Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants