Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a simple Testbed tool for repeatedly running templated Autogen scenarios with tightly-controlled initial conditions. #455

Merged
merged 17 commits into from
Nov 4, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions samples/tools/testbed/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Autogen Testbed Environment

The Autogen Testbed environment is a tool for repeatedly running a set of pre-defined Autogen scenarios in a setting with tightly-controlled initial conditions. With each run, Autogen will start from a blank slate, working out what code needs to be written, and what libraries or dependencies to install. The results of each run are logged, and can be ingested by analysis or metrics scripts. By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.

## Setup

Before you begin, you must configure your API keys for use with the Testbed. These keys extend beyond those typically found in an OAI_CONFIG_LIST, and can include such things as keys for the Bing Search API or other services used by the scenarios. There is an example ENV file in ``includes/ENV.example``. To get started:
gagb marked this conversation as resolved.
Show resolved Hide resolved

``cp includes/ENV.example includes/ENV``

Then edit ``includes/ENV`` as needed.

The Testbed also requires installation of the __python docker__ library:
afourney marked this conversation as resolved.
Show resolved Hide resolved

``pip install docker``

## Running the Testbed

To run the Testbed, simply execute
``python run_scenarios.py``

The default it to repeat this scenario 10 times. This can be costly. To run each scenario only once, use:
``python run_scenarios.py --repeat 1``


The run_scenarios.py script also allows a number of command-line arguments to control various parameters of execution. Type ``python run_scenarios.py -h`` to explore these options:

```
run_scenarios.py will run the specified autogen scenarios for a given number of repetitions and record all logs and trace information. When running in a Docker environment (default), each run will begin from a common, tightly controlled, environment. The resultant logs can then be further processed by other scripts to produce metrics.

positional arguments:
scenario The JSONL scenario file to run. If a directory is specified,
then all JSONL scenarios in the directory are run. (default:
./scenarios)

options:
-h, --help show this help message and exit

-r REPEAT, --repeat REPEAT
The number of repetitions to run for each scenario (default: 10).

--native Run the scenarios natively rather than in docker.
NOTE: This is not advisable, and should be done with great caution.
```

## Results

By default, the Testbed stores results in a folder heirarchy with the following template:

``./results/[scenario]/[instance_id]/[repetition]``

For example, consider the following folders:

``./results/default_two_agents/two_agent_stocks_gpt4/0``
``./results/default_two_agents/two_agent_stocks_gpt4/1``

...

``./results/default_two_agents/two_agent_stocks_gpt4/9``

This folder holds the results for the ``two_agent_stocks_gpt4`` instance of the ``default_two_agents`` scenario. The ``0`` folder contains the results of the first run. The ``1`` folder contains the results of the second run, and so on. You can think of the _instance_ as mapping to a prompt, or a unique set of parameters, while the _scenario_ defines the template in which those parameters are input.

Within each folder, you will find the following files:

- *timestamp.txt*: records the date and time of the run, along with the version of the pyautogen library installed
- *console_log.txt*: all console output produced by Docker when running autogen. Read this like you would a regular console.
- *chat_completions.json*: a log of all OpenAI ChatCompletions, as logged by ``autogen.ChatCompletion.start_logging(compact=False)``
- *[agent]_messages.json*: for each Agent, a log of their messages dictionaries
- *./coding*: A directory containing all code written by Autogen, and all artifacts produced by that code.

## Scenario Templating

All scenarios are stored in JSONL files in the ``./scenarios'' directory. Each line of a scenario file is a JSON object with the following schema:

```
{
"id": string,
"template": filename,
"values" {
"field_name1": string,
"field_name2": string,
...
"field_nameN": string
}
}
```

For example:

```
{
"id": "two_agent_stocks_gpt4",
"template": "default_two_agents.py",
"values": {
"\__MODEL\__": "gpt-4",
"\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
}
}
```

Where the ``id`` is the instance id used when saving results, ``template`` points to a python file that contains the scenario logic, and ``values`` contains a set of strings to find and replace when expanding the template.

An example templated python file is:

```
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
import os
import json
import testbed_utils

testbed_utils.init()
##############################

config_list = config_list_from_json(
"OAI_CONFIG_LIST", filter_dict={"model": ["\__MODEL\__"]},
)

assistant = AssistantAgent("assistant", llm_config={
"request_timeout": 180,
"config_list": config_list}
)
user_proxy = UserProxyAgent("user_proxy",
human_input_mode="NEVER",
code_execution_config={
"work_dir": "coding",
"use_docker": False,
},
max_consecutive_auto_reply=10)
user_proxy.initiate_chat(assistant, message="\__PROMPT\__")


##############################
testbed_utils.finalize(assistant, user_proxy)
```
15 changes: 15 additions & 0 deletions samples/tools/testbed/includes/ENV.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
export BING_API_KEY=
export OAI_CONFIG_LIST='
[
{
"model": "gpt-4",
"api_key": "",
"organization": ""
},
{
"model": "gpt-3.5-turbo-16k",
"api_key": "",
"organization": ""
}
]
'
56 changes: 56 additions & 0 deletions samples/tools/testbed/includes/testbed_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from importlib.metadata import version as lib_version
from datetime import datetime
import os
import autogen
import json


def init():
"""Helper function to initialize logging in a testbed scenario.
Specifically, write timestamp and version information, then
initialize autogen logging.

Args:
None

Returns:
None
"""

autogen.ChatCompletion.start_logging(compact=False)
afourney marked this conversation as resolved.
Show resolved Hide resolved

# Print some information about the run
with open("timestamp.txt", "wt") as f:
f.write("Timestamp: " + datetime.now().isoformat() + "\n")
f.write("pyautogen version: " + lib_version("pyautogen") + "\n")


def finalize(agents):
"""Helper function to finalize logging in a testbed scenario.
Calling this function will save all the chat completions logged
by Autogen to disk, and will save the messages dictionaries of
all agents passed via the agents argument.

Args:
agents (list): a list of the agents whose messages will be logged to disk.

Returns:
None
"""

script_dir = os.path.dirname(os.path.realpath(__file__))

with open(os.path.join(script_dir, "chat_completions.json"), "wt") as fh:
fh.write(json.dumps(autogen.ChatCompletion.logged_history, indent=4))
autogen.ChatCompletion.stop_logging()

def messages_to_json(agent):
messages = dict()
for item in agent.chat_messages.items():
messages[item[0].name] = item[1]
return json.dumps(messages, indent=4)

for agent in agents:
fname = agent.name + "_messages.json"
with open(os.path.join(script_dir, fname), "wt") as fh:
fh.write(messages_to_json(agent))
Loading
Loading