You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Re-added completion logging when using older versions of autogen.
* Extended scenario definitions and templating to include folders.
* Prepare collate_human_eval.py for working with group chat scenarios.
* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.
* Fixed the default termination message.
* Fixed another termination condition.
* Updated compatible autogen versions.
* Fixed a bug in executing the finalize scripts.
* Generalized the template further to support multiple folder copy operations.
* Add tests from AutoGPT.
* Update README.md
* Fix typo
* Update samples/tools/testbed/README.md
---------
Co-authored-by: LeoLjl <[email protected]>
Co-authored-by: Qingyun Wu <[email protected]>
Copy file name to clipboardExpand all lines: samples/tools/testbed/README.md
+87-47
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
The Autogen Testbed environment is a tool for repeatedly running a set of pre-defined Autogen scenarios in a setting with tightly-controlled initial conditions. With each run, Autogen will start from a blank slate, working out what code needs to be written, and what libraries or dependencies to install. The results of each run are logged, and can be ingested by analysis or metrics scripts (see the HumanEval example later in this README). By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.
4
4
5
-
This Testbed sample has been tested in, and is known to work with, Autogen versions 0.1.14 and 0.2.0b5
5
+
This Testbed sample has been tested in, and is known to work with, Autogen versions 0.1.14 and 0.2.0
6
6
7
7
## Setup
8
8
@@ -17,11 +17,10 @@ The Testbed also requires Docker (Desktop or Engine) AND the __python docker__ l
17
17
## Running the Testbed
18
18
19
19
To run the Testbed, simply execute
20
-
``python run_scenarios.py``
21
-
22
-
The default it to repeat this scenario 10 times. This can be costly. To run each scenario only once, use:
23
-
``python run_scenarios.py --repeat 1``
20
+
``python run_scenarios.py scenarios/Examples``
24
21
22
+
The default is to run each scenario once time. To run each scenario 10 times, use:
The run_scenarios.py script also allows a number of command-line arguments to control various parameters of execution. Type ``python run_scenarios.py -h`` to explore these options:
27
26
@@ -58,36 +57,37 @@ By default, the Testbed stores results in a folder heirarchy with the following
This folder holds the results for the ``two_agent_stocks_gpt4`` instance of the ``default_two_agents`` scenario. The ``0`` folder contains the results of the first run. The ``1`` folder contains the results of the second run, and so on. You can think of the _instance_ as mapping to a prompt, or a unique set of parameters, while the _scenario_ defines the template in which those parameters are input.
67
+
This folder holds the results for the ``two_agent_stocks`` instance of the ``default_two_agents_gpt35`` scenario. The ``0`` folder contains the results of the first run. The ``1`` folder contains the results of the second run, and so on. You can think of the _instance_ as mapping to a prompt, or a unique set of parameters, while the _scenario_ defines the template in which those parameters are input.
69
68
70
69
Within each folder, you will find the following files:
71
70
72
71
-*timestamp.txt*: records the date and time of the run, along with the version of the pyautogen library installed
73
72
-*console_log.txt*: all console output produced by Docker when running autogen. Read this like you would a regular console.
74
-
-*chat_completions.json*: a log of all OpenAI ChatCompletions, as logged by ``autogen.ChatCompletion.start_logging(compact=False)``
73
+
-*chat_completions.json*: a log of all OpenAI ChatCompletions, as logged by `autogen.ChatCompletion.start_logging(compact=False)`
75
74
-*[agent]_messages.json*: for each Agent, a log of their messages dictionaries
76
75
-*./coding*: A directory containing all code written by Autogen, and all artifacts produced by that code.
77
76
78
77
## Scenario Templating
79
78
80
-
All scenarios are stored in JSONL files in the ``./scenarios'' directory. Each line of a scenario file is a JSON object with the following schema:
79
+
All scenarios are stored in JSONL files (in subdirectories under `./scenarios`). Each line of a scenario file is a JSON object. The schema varies slightly based on if "template" specifies a _file_ or a _directory_.
81
80
81
+
If "template" points to a _file_, the format is:
82
82
```
83
83
{
84
84
"id": string,
85
85
"template": filename,
86
-
"values" {
87
-
"field_name1": string,
88
-
"field_name2": string,
86
+
"substitutions" {
87
+
"find_string1": replace_string1,
88
+
"find_string2": replace_string2,
89
89
...
90
-
"field_nameN": string
90
+
"find_stringN": replace_stringN
91
91
}
92
92
}
93
93
```
@@ -98,48 +98,88 @@ For example:
98
98
{
99
99
"id": "two_agent_stocks_gpt4",
100
100
"template": "default_two_agents.py",
101
-
"values": {
101
+
"substitutions": {
102
102
"\__MODEL\__": "gpt-4",
103
103
"\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
104
104
}
105
105
}
106
106
```
107
107
108
-
Where the ``id`` is the instance id used when saving results, ``template`` points to a python file that contains the scenario logic, and ``values`` contains a set of strings to find and replace when expanding the template.
109
108
110
-
An example templated python file is:
109
+
If "template" points to a _directory_, the format is:
111
110
112
111
```
113
-
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
"\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
144
+
}
145
+
}
146
+
}
141
147
```
142
148
149
+
In this example, the string `__MODEL__` will be replaced in the file `scenarios.py`, while the string `__PROMPT__` will be replaced in the `prompt.txt` file.
150
+
151
+
152
+
## Scenario Expansion Algorithm
153
+
154
+
When the Testbed runs a scenario, it creates a local folder to share with Docker. As noted above, each instance and repetition gets its own folder along the path: ``./results/[scenario]/[instance_id]/[repetition]``
155
+
156
+
For the sake of brevity we will refer to this folder as the `DEST_FOLDER`.
157
+
158
+
The algorithm for populating the `DEST_FOLDER` is as follows:
159
+
160
+
1. Recursively copy the contents of `./incudes` to DEST_FOLDER. This folder contains all the basic starter files for running a scenario, including an ENV file which will set the Docker environment variables.
161
+
2. Append the OAI_CONFIG_LIST to the ENV file so that autogen may access these secrets.
162
+
3. Recursively copy the scenario folder (if `template` in the json scenario definition points to a folder) to DEST_FOLDER. If the `template` instead points to a file, copy the file, but rename it to `scenario.py`
163
+
4. Apply any templating, as outlined in the prior section.
164
+
5. Write a run.sh file to DEST_FOLDER that will be executed by Docker when it is loaded.
165
+
166
+
167
+
## Scenario Execution Algorithm
168
+
169
+
Once the scenario has been expanded it is run (via run.sh). This script will execute the following steps:
170
+
171
+
1. Read and set the ENV environment variables
172
+
2. If a file named `global_init.sh` is present, run it.
173
+
3. If a file named `scenario_init.sh` is present, run it.
174
+
4. Install the requirements file (if running in Docker)
175
+
5. Run the Autogen scenario via `python scenario.py`
176
+
6. Clean up (delete cache, etc.)
177
+
7. If a file named `scenario_finalize.sh` is present, run it.
178
+
8. If a file named `global_finalize.sh` is present, run it.
179
+
9. echo "SCENARIO COMPLETE !#!#", signaling that all steps completed.
180
+
181
+
Notably, this means that scenarios can add custom init and teardown logic by including `scenario_init.sh` and `scenario_finalize.sh` files.
182
+
143
183
144
184
## (Example) Running HumanEval
145
185
@@ -149,7 +189,7 @@ Accessing this scenario-type requires downloading and converting the HumanEval d
0 commit comments