-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding first version of AgentEval -- a framework for assessing task utility for LLM-powered applications #681
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…genteval merging with agenteval branch'
…genteval merging with agenteval
…genteval merging on agenteval
@gagb done. |
auto-merge was automatically disabled
November 21, 2023 01:47
Head branch was pushed to by a user without write access
julianakiseleva
had a problem deploying
to
openai1
November 21, 2023 03:23 — with
GitHub Actions
Failure
julianakiseleva
had a problem deploying
to
openai1
November 21, 2023 03:23 — with
GitHub Actions
Failure
julianakiseleva
had a problem deploying
to
openai1
November 21, 2023 03:23 — with
GitHub Actions
Failure
sonichi
approved these changes
Nov 21, 2023
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Nov 21, 2023
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Nov 21, 2023
3 tasks
whiskyboy
pushed a commit
to whiskyboy/autogen
that referenced
this pull request
Apr 17, 2024
…tility for LLM-powered applications (microsoft#681) * add agenteval-notebook for math problems and the blog post about it * update gitignore * updates to notebook * adding folder for the logs * adding math problems logs * adding folder for alfworld logs * added limitiation and future work to blog post * minor edits blog post * adding changes * reorg * modify the main notebook * modification of the main notebook * remove wrong notebook * uploading new notebook * update agenteval notebook * change the sample * Update agenteval_cq_math.ipynb * adding final changes to notebook * updated framework picture * Update index.mdx * Update index.md * Add files via upload * updates to notebool * revise the blog * revise the blog * update the agent img * revise the blog * revise the blog * Excluded model logs from the main branch, you can find them in agenteval branch * Fixed pre-commit formatting. * Update website/blog/2023-11-11-AgentEval/index.mdx Co-authored-by: Chi Wang <[email protected]> * update gitignore * update index.mdx * update authors.yml by adding Negar and Julia * remove md file * remove md file * update gitignore * update authors file * pre-commit checks * pre-commit checks on authors.yml * pre-commit checks on authors.yml * update index.mdx * update authors.yml by adding Negar and Julia * updated the blog-post version 1 * updated the blog-post: TL;DR is ready * updated the blog-post: first part of introduction is ready * updated figures: typos on fig 1, changed terminology on the fig 2 * upadated the Framework part * fixed redering issues * upload zip file instead of single samples * update prealgebra.zip * update * upload * update z * update naming * update zip * update the agenteval notebook * update the notebook - removing unmercenary logs * updated fig 1 and references to it * updated fig 1 * incorporated PR comments * merged agenteval branch * final changes to the blog * updated taxonomy * update notebook * minor changes to the blog * Fixed formatting * Update the link in agenteval_cq_math.ipynb * update the blog and link in notebook * Update index.mdx * change folder name * Changes to be committed: modified: OAI_CONFIG_LIST_sample.txt * add sample OAI file * fix the url link to colab and typos * fix the url link to colab and typos * add authors * update profile pic * "update authors" * fixing the problem in test_groupchat.py * update the title lower case * reverting changes in setup.py * rerun pre-commit --------- Co-authored-by: Negar Arabzadeh <[email protected]> Co-authored-by: Julia Kiseleva <[email protected]> Co-authored-by: afourney <[email protected]> Co-authored-by: Chi Wang <[email protected]> Co-authored-by: Qingyun Wu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We introduce AgentEval — the first version of the framework to automatically assess task utility for an arbitrary application. It suggests criteria to explain task utility and then quantifies these criteria for logs of your system. AgentEval consists of two key components:
We demonstrate the usage of our framework with the Math Problems dataset in notebook that allows for running on single logs as well as plotting the quantified estimated performance for a set of problems.
The model logs you can find them in agenteval branch
This PR has added:
Related issue number
Checks