Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the GAIA benchark to the Testbed. This PR depends on #792 #810

Merged
merged 22 commits into from
Dec 6, 2023

Conversation

afourney
Copy link
Member

Why are these changes needed?

This PR adds initial support for the GAIA benchmark to the Testbed.

Note: This PR depends on #792. Merge that one first.

Related issue number

N/A

Checks

@qingyun-wu
Copy link
Contributor

Nice PR! I am reviewing 792 rn, and will move on to this PR later today or tmr.

@codecov-commenter
Copy link

codecov-commenter commented Nov 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (73d7e92) 26.63% compared to head (08eec2b) 26.44%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #810      +/-   ##
==========================================
- Coverage   26.63%   26.44%   -0.19%     
==========================================
  Files          28       28              
  Lines        3725     3725              
  Branches      847      847              
==========================================
- Hits          992      985       -7     
- Misses       2660     2666       +6     
- Partials       73       74       +1     
Flag Coverage Δ
unittests 26.44% <ø> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sonichi sonichi requested a review from LeoLjl December 1, 2023 15:23
@sonichi sonichi requested a review from corbyrosset December 1, 2023 15:24
Copy link
Contributor

@qingyun-wu qingyun-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Following the practice here, should @LeoLjl move utils for AutoGPT to the utils folder? if so, @LeoLjl could you do that in a separate PR? Thank you!

Copy link
Collaborator

@LeoLjl LeoLjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LeoLjl
Copy link
Collaborator

LeoLjl commented Dec 6, 2023

LGTM! Following the practice here, should @LeoLjl move utils for AutoGPT to the utils folder? if so, @LeoLjl could you do that in a separate PR? Thank you!

@qingyun-wu I'll copy the utils to the utils folder. Will do it in a separate PR.

@qingyun-wu qingyun-wu enabled auto-merge December 6, 2023 01:41
@qingyun-wu qingyun-wu disabled auto-merge December 6, 2023 01:41
@qingyun-wu qingyun-wu enabled auto-merge December 6, 2023 01:42
@qingyun-wu qingyun-wu added this pull request to the merge queue Dec 6, 2023
Merged via the queue into main with commit f8b4b42 Dec 6, 2023
16 checks passed
@sonichi sonichi deleted the testbed_gaia branch December 7, 2023 00:28
rlam3 pushed a commit to rlam3/autogen that referenced this pull request Dec 19, 2023
… (microsoft#810)

* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Added initial support for GAIA benchmark.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Refined GAIA support, and broke scenarios down by difficulty.

* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.

* Added instructions for cloning GAIA

* Updated README to fix some typos.

* Added a script to format GAIA reslts for the leaderboard.

* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py

Co-authored-by: LeoLjl <[email protected]>

---------

Co-authored-by: Qingyun Wu <[email protected]>
Co-authored-by: LeoLjl <[email protected]>
@afourney afourney added the gaia label Jan 10, 2024
whiskyboy pushed a commit to whiskyboy/autogen that referenced this pull request Apr 17, 2024
… (microsoft#810)

* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Added initial support for GAIA benchmark.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Refined GAIA support, and broke scenarios down by difficulty.

* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.

* Added instructions for cloning GAIA

* Updated README to fix some typos.

* Added a script to format GAIA reslts for the leaderboard.

* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py

Co-authored-by: LeoLjl <[email protected]>

---------

Co-authored-by: Qingyun Wu <[email protected]>
Co-authored-by: LeoLjl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proj-autogenbench Issues related to AutoGenBench.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants