microsoft · sonichi · Nov 4, 2023 · Oct 24, 2023 · Oct 24, 2023 · Oct 24, 2023
diff --git a/samples/tools/testbed/README.md b/samples/tools/testbed/README.md
@@ -0,0 +1,134 @@
+# Autogen Testbed Environment
+
+The Autogen Testbed environment is a tool for repeatedly running a set of pre-defined Autogen scenarios in a setting with tightly-controlled initial conditions. With each run, Autogen will start from a blank slate, working out what code needs to be written, and what libraries or dependencies to install. The results of each run are logged, and can be ingested by analysis or metrics scripts. By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.
+
+## Setup
+
+Before you begin, you must configure your API keys for use with the Testbed. These keys extend beyond those typically found in an OAI_CONFIG_LIST, and can include such things as keys for the Bing Search API or other services used by the scenarios. There is an example ENV file in ``includes/ENV.example``. To get started:
+
+``cp includes/ENV.example includes/ENV``
+
+Then edit ``includes/ENV`` as needed.
+
+The Testbed also requires installation of the __python docker__ library:
+
+``pip install docker``
+
+## Running the Testbed
+
+To run the Testbed, simply execute
+``python run_scenarios.py``
+
+The default it to repeat this scenario 10 times. This can be costly. To run each scenario only once, use:
+``python run_scenarios.py --repeat 1``
+
+
+The run_scenarios.py script also allows a number of command-line arguments to control various parameters of execution. Type ``python run_scenarios.py -h`` to explore these options:
+
+```
+run_scenarios.py will run the specified autogen scenarios for a given number of repetitions and record all logs and trace information. When running in a Docker environment (default), each run will begin from a common, tightly controlled, environment. The resultant logs can then be further processed by other scripts to produce metrics.
+
+positional arguments:
+  scenario      The JSONL scenario file to run. If a directory is specified,
+                then all JSONL scenarios in the directory are run. (default:
+                ./scenarios)
+
+options:
+  -h, --help    show this help message and exit
+
+  -r REPEAT, --repeat REPEAT
+                The number of repetitions to run for each scenario (default: 10).
+
+  --native      Run the scenarios natively rather than in docker.
+                NOTE: This is not advisable, and should be done with great caution.
+```
+
+## Results
+
+By default, the Testbed stores results in a folder heirarchy with the following template:
+
+``./results/[scenario]/[instance_id]/[repetition]``
+
+For example, consider the following folders:
+
+``./results/default_two_agents/two_agent_stocks_gpt4/0``
+``./results/default_two_agents/two_agent_stocks_gpt4/1``
+
+...
+
+``./results/default_two_agents/two_agent_stocks_gpt4/9``
+
+This folder holds the results for the ``two_agent_stocks_gpt4`` instance of the ``default_two_agents`` scenario. The ``0`` folder contains the results of the first run. The ``1`` folder contains the results of the second run, and so on. You can think of the _instance_ as mapping to a prompt, or a unique set of parameters, while the _scenario_ defines the template in which those parameters are input.
+
+Within each folder, you will find the following files:
+
+- *timestamp.txt*: records the date and time of the run, along with the version of the pyautogen library installed
+- *console_log.txt*: all console output produced by Docker when running autogen. Read this like you would a regular console.
+- *chat_completions.json*: a log of all OpenAI ChatCompletions, as logged by ``autogen.ChatCompletion.start_logging(compact=False)``
+- *[agent]_messages.json*: for each Agent, a log of their messages dictionaries
+- *./coding*: A directory containing all code written by Autogen, and all artifacts produced by that code.
+
+## Scenario Templating
+
+All scenarios are stored in JSONL files in the ``./scenarios'' directory. Each line of a scenario file is a JSON object with the following schema:
+
+```
+{
+   "id": string,
+   "template": filename,
+   "values" {
+       "field_name1": string,
+       "field_name2": string,
+       ...
+       "field_nameN": string
+   }
+}
+```
+
+For example:
+
+```
+{
+    "id": "two_agent_stocks_gpt4",
+    "template": "default_two_agents.py",
+    "values": {
+        "\__MODEL\__": "gpt-4",
+        "\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
+    }
+}
+```
+
+Where the ``id`` is the instance id used when saving results, ``template`` points to a python file that contains the scenario logic, and ``values`` contains a set of strings to find and replace when expanding the template.
+
+An example templated python file is:
+
+```
+from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
+import os
+import json
+import testbed_utils
+
+testbed_utils.init()
+##############################
+
+config_list = config_list_from_json(
+        "OAI_CONFIG_LIST", filter_dict={"model": ["\__MODEL\__"]},
+)
+
+assistant = AssistantAgent("assistant", llm_config={
+    "request_timeout": 180,
+    "config_list": config_list}
+)
+user_proxy = UserProxyAgent("user_proxy",
+            human_input_mode="NEVER",
+            code_execution_config={
+                "work_dir": "coding",
+                "use_docker": False,
+            },
+            max_consecutive_auto_reply=10)
+user_proxy.initiate_chat(assistant, message="\__PROMPT\__")
+
+
+##############################
+testbed_utils.finalize(assistant, user_proxy)
+```
diff --git a/samples/tools/testbed/includes/ENV.example b/samples/tools/testbed/includes/ENV.example
@@ -0,0 +1,15 @@
+export BING_API_KEY=
+export OAI_CONFIG_LIST='
+[
+    {
+        "model": "gpt-4",
+        "api_key": "",
+        "organization": ""
+    },
+    {
+        "model": "gpt-3.5-turbo-16k",
+        "api_key": "",
+        "organization": ""
+    }
+]
+'
diff --git a/samples/tools/testbed/includes/testbed_utils.py b/samples/tools/testbed/includes/testbed_utils.py
@@ -0,0 +1,56 @@
+from importlib.metadata import version as lib_version
+from datetime import datetime
+import os
+import autogen
+import json
+
+
+def init():
+    """Helper function to initialize logging in a testbed scenario.
+    Specifically, write timestamp and version information, then
+    initialize autogen logging.
+
+    Args:
+        None
+
+    Returns:
+        None
+    """
+
+    autogen.ChatCompletion.start_logging(compact=False)
+
+    # Print some information about the run
+    with open("timestamp.txt", "wt") as f:
+        f.write("Timestamp: " + datetime.now().isoformat() + "\n")
+        f.write("pyautogen version: " + lib_version("pyautogen") + "\n")
+
+
+def finalize(agents):
+    """Helper function to finalize logging in a testbed scenario.
+    Calling this function will save all the chat completions logged
+    by Autogen to disk, and will save the messages dictionaries of
+    all agents passed via the agents argument.
+
+    Args:
+        agents (list): a list of the agents whose messages will be logged to disk.
+
+    Returns:
+        None
+    """
+
+    script_dir = os.path.dirname(os.path.realpath(__file__))
+
+    with open(os.path.join(script_dir, "chat_completions.json"), "wt") as fh:
+        fh.write(json.dumps(autogen.ChatCompletion.logged_history, indent=4))
+        autogen.ChatCompletion.stop_logging()
+
+    def messages_to_json(agent):
+        messages = dict()
+        for item in agent.chat_messages.items():
+            messages[item[0].name] = item[1]
+        return json.dumps(messages, indent=4)
+
+    for agent in agents:
+        fname = agent.name + "_messages.json"
+        with open(os.path.join(script_dir, fname), "wt") as fh:
+            fh.write(messages_to_json(agent))