Skip to content

Commit 45c2a78

Browse files
afourneyLeoLjlqingyun-wu
authored
Testbed folders (#792)
* Re-added completion logging when using older versions of autogen. * Extended scenario definitions and templating to include folders. * Prepare collate_human_eval.py for working with group chat scenarios. * Converted HumanEval to the folder-based approach, and added GroupChat scenarios. * Fixed the default termination message. * Fixed another termination condition. * Updated compatible autogen versions. * Fixed a bug in executing the finalize scripts. * Generalized the template further to support multiple folder copy operations. * Add tests from AutoGPT. * Update README.md * Fix typo * Update samples/tools/testbed/README.md --------- Co-authored-by: LeoLjl <[email protected]> Co-authored-by: Qingyun Wu <[email protected]>
1 parent ae7066b commit 45c2a78

37 files changed

+1048
-165
lines changed

samples/tools/testbed/README.md

+87-47
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The Autogen Testbed environment is a tool for repeatedly running a set of pre-defined Autogen scenarios in a setting with tightly-controlled initial conditions. With each run, Autogen will start from a blank slate, working out what code needs to be written, and what libraries or dependencies to install. The results of each run are logged, and can be ingested by analysis or metrics scripts (see the HumanEval example later in this README). By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.
44

5-
This Testbed sample has been tested in, and is known to work with, Autogen versions 0.1.14 and 0.2.0b5
5+
This Testbed sample has been tested in, and is known to work with, Autogen versions 0.1.14 and 0.2.0
66

77
## Setup
88

@@ -17,11 +17,10 @@ The Testbed also requires Docker (Desktop or Engine) AND the __python docker__ l
1717
## Running the Testbed
1818

1919
To run the Testbed, simply execute
20-
``python run_scenarios.py``
21-
22-
The default it to repeat this scenario 10 times. This can be costly. To run each scenario only once, use:
23-
``python run_scenarios.py --repeat 1``
20+
``python run_scenarios.py scenarios/Examples``
2421

22+
The default is to run each scenario once time. To run each scenario 10 times, use:
23+
``python run_scenarios.py --repeat 10 scenarios/Examples ``
2524

2625
The run_scenarios.py script also allows a number of command-line arguments to control various parameters of execution. Type ``python run_scenarios.py -h`` to explore these options:
2726

@@ -58,36 +57,37 @@ By default, the Testbed stores results in a folder heirarchy with the following
5857

5958
For example, consider the following folders:
6059

61-
``./results/default_two_agents/two_agent_stocks_gpt4/0``
62-
``./results/default_two_agents/two_agent_stocks_gpt4/1``
60+
``./results/default_two_agents_gpt35/two_agent_stocks/0``
61+
``./results/default_two_agents_gpt35/two_agent_stocks/1``
6362

6463
...
6564

66-
``./results/default_two_agents/two_agent_stocks_gpt4/9``
65+
``./results/default_two_agents_gpt35/two_agent_stocks/9``
6766

68-
This folder holds the results for the ``two_agent_stocks_gpt4`` instance of the ``default_two_agents`` scenario. The ``0`` folder contains the results of the first run. The ``1`` folder contains the results of the second run, and so on. You can think of the _instance_ as mapping to a prompt, or a unique set of parameters, while the _scenario_ defines the template in which those parameters are input.
67+
This folder holds the results for the ``two_agent_stocks`` instance of the ``default_two_agents_gpt35`` scenario. The ``0`` folder contains the results of the first run. The ``1`` folder contains the results of the second run, and so on. You can think of the _instance_ as mapping to a prompt, or a unique set of parameters, while the _scenario_ defines the template in which those parameters are input.
6968

7069
Within each folder, you will find the following files:
7170

7271
- *timestamp.txt*: records the date and time of the run, along with the version of the pyautogen library installed
7372
- *console_log.txt*: all console output produced by Docker when running autogen. Read this like you would a regular console.
74-
- *chat_completions.json*: a log of all OpenAI ChatCompletions, as logged by ``autogen.ChatCompletion.start_logging(compact=False)``
73+
- *chat_completions.json*: a log of all OpenAI ChatCompletions, as logged by `autogen.ChatCompletion.start_logging(compact=False)`
7574
- *[agent]_messages.json*: for each Agent, a log of their messages dictionaries
7675
- *./coding*: A directory containing all code written by Autogen, and all artifacts produced by that code.
7776

7877
## Scenario Templating
7978

80-
All scenarios are stored in JSONL files in the ``./scenarios'' directory. Each line of a scenario file is a JSON object with the following schema:
79+
All scenarios are stored in JSONL files (in subdirectories under `./scenarios`). Each line of a scenario file is a JSON object. The schema varies slightly based on if "template" specifies a _file_ or a _directory_.
8180

81+
If "template" points to a _file_, the format is:
8282
```
8383
{
8484
"id": string,
8585
"template": filename,
86-
"values" {
87-
"field_name1": string,
88-
"field_name2": string,
86+
"substitutions" {
87+
"find_string1": replace_string1,
88+
"find_string2": replace_string2,
8989
...
90-
"field_nameN": string
90+
"find_stringN": replace_stringN
9191
}
9292
}
9393
```
@@ -98,48 +98,88 @@ For example:
9898
{
9999
"id": "two_agent_stocks_gpt4",
100100
"template": "default_two_agents.py",
101-
"values": {
101+
"substitutions": {
102102
"\__MODEL\__": "gpt-4",
103103
"\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
104104
}
105105
}
106106
```
107107

108-
Where the ``id`` is the instance id used when saving results, ``template`` points to a python file that contains the scenario logic, and ``values`` contains a set of strings to find and replace when expanding the template.
109108

110-
An example templated python file is:
109+
If "template" points to a _directory_, the format is:
111110

112111
```
113-
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
114-
import os
115-
import json
116-
import testbed_utils
117-
118-
testbed_utils.init()
119-
##############################
120-
121-
config_list = config_list_from_json(
122-
"OAI_CONFIG_LIST", filter_dict={"model": ["\__MODEL\__"]},
123-
)
124-
125-
assistant = AssistantAgent("assistant", llm_config={
126-
"request_timeout": 180,
127-
"config_list": config_list}
128-
)
129-
user_proxy = UserProxyAgent("user_proxy",
130-
human_input_mode="NEVER",
131-
code_execution_config={
132-
"work_dir": "coding",
133-
"use_docker": False,
134-
},
135-
max_consecutive_auto_reply=10)
136-
user_proxy.initiate_chat(assistant, message="\__PROMPT\__")
137-
138-
139-
##############################
140-
testbed_utils.finalize(assistant, user_proxy)
112+
{
113+
"id": string,
114+
"template": dirname,
115+
"substitutions" {
116+
"filename1": {
117+
"find_string1_1": replace_string1_1,
118+
"find_string1_2": replace_string1_2,
119+
...
120+
"find_string1_M": replace_string1_N
121+
}
122+
"filename2": {
123+
"find_string2_1": replace_string2_1,
124+
"find_string2_2": replace_string2_2,
125+
...
126+
"find_string2_N": replace_string2_N
127+
}
128+
}
129+
}
130+
```
131+
132+
For example:
133+
134+
```
135+
{
136+
"id": "two_agent_stocks_gpt4",
137+
"template": "default_two_agents",
138+
"substitutions": {
139+
"scenario.py": {
140+
"\__MODEL\__": "gpt-4",
141+
},
142+
"prompt.txt": {
143+
"\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
144+
}
145+
}
146+
}
141147
```
142148

149+
In this example, the string `__MODEL__` will be replaced in the file `scenarios.py`, while the string `__PROMPT__` will be replaced in the `prompt.txt` file.
150+
151+
152+
## Scenario Expansion Algorithm
153+
154+
When the Testbed runs a scenario, it creates a local folder to share with Docker. As noted above, each instance and repetition gets its own folder along the path: ``./results/[scenario]/[instance_id]/[repetition]``
155+
156+
For the sake of brevity we will refer to this folder as the `DEST_FOLDER`.
157+
158+
The algorithm for populating the `DEST_FOLDER` is as follows:
159+
160+
1. Recursively copy the contents of `./incudes` to DEST_FOLDER. This folder contains all the basic starter files for running a scenario, including an ENV file which will set the Docker environment variables.
161+
2. Append the OAI_CONFIG_LIST to the ENV file so that autogen may access these secrets.
162+
3. Recursively copy the scenario folder (if `template` in the json scenario definition points to a folder) to DEST_FOLDER. If the `template` instead points to a file, copy the file, but rename it to `scenario.py`
163+
4. Apply any templating, as outlined in the prior section.
164+
5. Write a run.sh file to DEST_FOLDER that will be executed by Docker when it is loaded.
165+
166+
167+
## Scenario Execution Algorithm
168+
169+
Once the scenario has been expanded it is run (via run.sh). This script will execute the following steps:
170+
171+
1. Read and set the ENV environment variables
172+
2. If a file named `global_init.sh` is present, run it.
173+
3. If a file named `scenario_init.sh` is present, run it.
174+
4. Install the requirements file (if running in Docker)
175+
5. Run the Autogen scenario via `python scenario.py`
176+
6. Clean up (delete cache, etc.)
177+
7. If a file named `scenario_finalize.sh` is present, run it.
178+
8. If a file named `global_finalize.sh` is present, run it.
179+
9. echo "SCENARIO COMPLETE !#!#", signaling that all steps completed.
180+
181+
Notably, this means that scenarios can add custom init and teardown logic by including `scenario_init.sh` and `scenario_finalize.sh` files.
182+
143183

144184
## (Example) Running HumanEval
145185

@@ -149,7 +189,7 @@ Accessing this scenario-type requires downloading and converting the HumanEval d
149189

150190
```
151191
python utils/download_humaneval.py
152-
python ./run_scenarios.py --repeat 3 scenarios/human_eval_two_agents_gpt35.jsonl
192+
python ./run_scenarios.py scenarios/HumanEval/human_eval_two_agents_gpt35.jsonl
153193
python utils/collate_human_eval.py ./results/human_eval_two_agents_gpt35 | python utils/metrics_human_eval.py > human_eval_results_gpt35.csv
154194
cat human_eval_results_gpt35.csv
155195
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Global finalize.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
echo AUTOGEN_TESTBED_SETTING: [$AUTOGEN_TESTBED_SETTING]
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
pyautogen
1+
git+https://github.com/microsoft/autogen.git

0 commit comments

Comments
 (0)