You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/blog/2024-01-25-AutoGenBench/index.mdx
+4-4
Original file line number
Diff line number
Diff line change
@@ -15,8 +15,8 @@ Today we are releasing AutoGenBench – a tool for evaluating AutoGen agents and
15
15
16
16
AutoGenBench is a standalone command line tool, installable from PyPI, which handles downloading, configuring, running, and reporting supported benchmarks. AutoGenBench works best when run alongside Docker, since it uses Docker to isolate tests from one another.
17
17
18
-
* See the [AutoGenBench README](https://github.com/microsoft/autogen/blob/main/samples/tools/testbed/README.md) for information on installation and running benchmarks.
19
-
* See the [AutoGenBench CONTRIBUTING guide](https://github.com/microsoft/autogen/blob/main/samples/tools/testbed/CONTRIBUTING.md) for information on developing or contributing benchmark datasets.
18
+
* See the [AutoGenBench README](https://github.com/microsoft/autogen/blob/main/samples/tools/autogenbench/README.md) for information on installation and running benchmarks.
19
+
* See the [AutoGenBench CONTRIBUTING guide](https://github.com/microsoft/autogen/blob/main/samples/tools/autogenbench/CONTRIBUTING.md) for information on developing or contributing benchmark datasets.
20
20
21
21
22
22
### Quick Start
@@ -110,7 +110,7 @@ Please do not cite these values in academic work without first inspecting and ve
110
110
111
111
From this output we can see the results of the three separate repetitions of each task, and final summary statistics of each run. In this case, the results were generated via GPT-4 (as defined in the OAI_CONFIG_LIST that was provided), and used the `TwoAgents` template. **It is important to remember that AutoGenBench evaluates *specific* end-to-end configurations of agents (as opposed to evaluating a model or cognitive framework more generally).**
112
112
113
-
Finally, complete execution traces and logs can be found in the `Results` folder. See the [AutoGenBench README](https://github.com/microsoft/autogen/blob/main/samples/tools/testbed/README.md) for more details about command-line options and output formats. Each of these commands also offers extensive in-line help via:
113
+
Finally, complete execution traces and logs can be found in the `Results` folder. See the [AutoGenBench README](https://github.com/microsoft/autogen/blob/main/samples/tools/autogenbench/README.md) for more details about command-line options and output formats. Each of these commands also offers extensive in-line help via:
114
114
115
115
-`autogenbench --help`
116
116
-`autogenbench clone --help`
@@ -128,4 +128,4 @@ While we are announcing AutoGenBench, we note that it is very much an evolving p
128
128
For an up to date tracking of our work items on this project, please see [AutoGenBench Work Items](https://github.com/microsoft/autogen/issues/973)
129
129
130
130
## Call for Participation
131
-
Finally, we want to end this blog post with an open call for contributions. AutoGenBench is still nascent, and has much opportunity for improvement. New benchmarks are constantly being published, and will need to be added. Everyone may have their own distinct set of metrics that they care most about optimizing, and these metrics should be onboarded. To this end, we welcome any and all contributions to this corner of the AutoGen project. If contributing is something that interests you, please see the [contributor’s guide](https://github.com/microsoft/autogen/blob/main/samples/tools/testbed/CONTRIBUTING.md) and join our [Discord](https://discord.gg/pAbnFJrkgZ) discussion in the [#autogenbench](https://discord.com/channels/1153072414184452236/1199851779328847902) channel!
131
+
Finally, we want to end this blog post with an open call for contributions. AutoGenBench is still nascent, and has much opportunity for improvement. New benchmarks are constantly being published, and will need to be added. Everyone may have their own distinct set of metrics that they care most about optimizing, and these metrics should be onboarded. To this end, we welcome any and all contributions to this corner of the AutoGen project. If contributing is something that interests you, please see the [contributor’s guide](https://github.com/microsoft/autogen/blob/main/samples/tools/autogenbench/CONTRIBUTING.md) and join our [Discord](https://discord.gg/pAbnFJrkgZ) discussion in the [#autogenbench](https://discord.com/channels/1153072414184452236/1199851779328847902) channel!
0 commit comments