Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full prompt test cases #1354

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
8525df6
Added initial prompt test
josephcmiller2 Apr 13, 2023
30f1ad5
Add prompt to weather test
josephcmiller2 Apr 13, 2023
61cf240
Updated test to produce a more consistent result
josephcmiller2 Apr 13, 2023
dc0339a
Add test prompt that includes filtering weather
josephcmiller2 Apr 13, 2023
1550d20
Most weather sites have the weather in pictures and not in text so it…
josephcmiller2 Apr 13, 2023
dccd193
updated test info
josephcmiller2 Apr 13, 2023
030955f
Added movie test case
josephcmiller2 Apr 13, 2023
4419b78
Added movie tests
josephcmiller2 Apr 13, 2023
7fb0dd4
Add a test that combines multiple instructions in a single goal
josephcmiller2 Apr 13, 2023
926812f
Merge branch 'master' into test-cases
josephcmiller2 Apr 13, 2023
274461a
Make the naming scheme more consistent
josephcmiller2 Apr 13, 2023
3b6d349
Add test configs for each case
josephcmiller2 Apr 13, 2023
6ae882c
Add test that must find movies and their ratings for additional compl…
josephcmiller2 Apr 13, 2023
ba3fc6c
Add more complexity by finding the times movies are playing on Friday
josephcmiller2 Apr 14, 2023
ae81698
Clarify the times are on Friday
josephcmiller2 Apr 14, 2023
f3940d1
Clarify the movies are in Denver
josephcmiller2 Apr 14, 2023
4d1740d
Removed test that is unreliable
josephcmiller2 Apr 14, 2023
520fbdf
Merge branch 'master' into test-cases
josephcmiller2 Apr 14, 2023
dcb1a5a
Initial test script
josephcmiller2 Apr 14, 2023
554adee
Add some helpful output
josephcmiller2 Apr 14, 2023
88d543c
Don't chdir before executing
josephcmiller2 Apr 14, 2023
c6ab18e
Fix invalid argument in test configs
josephcmiller2 Apr 14, 2023
a744d0a
Updated script to copy the ai_settings.yaml file and include some min…
josephcmiller2 Apr 14, 2023
7e89e16
Correct config typo in filename
josephcmiller2 Apr 14, 2023
28f7440
Adjust the argument requirements behavior
josephcmiller2 Apr 14, 2023
a9bf039
Add a README
josephcmiller2 Apr 14, 2023
63cb785
Update code formatting in README
josephcmiller2 Apr 14, 2023
4309fa1
Update code formatting in README
josephcmiller2 Apr 14, 2023
f501c4e
Update usage
josephcmiller2 Apr 14, 2023
235e77d
Add a check for the existence of the output files
josephcmiller2 Apr 14, 2023
c14e2f2
Add comments and check for empty output files
josephcmiller2 Apr 14, 2023
ce52712
Update test args to use --skip-reprompt and extend the number of runs…
josephcmiller2 Apr 14, 2023
d8b4cec
Merge branch 'master' into test-cases
josephcmiller2 Apr 14, 2023
85270d4
Only run tests on subdirectories
josephcmiller2 Apr 14, 2023
b4b0a27
Fix path
josephcmiller2 Apr 14, 2023
927ff7d
Merge branch 'master' into test-cases
josephcmiller2 Apr 14, 2023
d6ac959
Merge branch 'master' into test-cases
josephcmiller2 Apr 14, 2023
fc751d0
Only use subdirectories when listing tests
josephcmiller2 Apr 14, 2023
c361c56
Update test names so --list gives meaningful output
josephcmiller2 Apr 14, 2023
66c3440
Update execution method
josephcmiller2 Apr 14, 2023
89ec9a7
Add TODO for future enhancements
josephcmiller2 Apr 14, 2023
00b94dc
Merge branch 'master' into test-cases
josephcmiller2 Apr 14, 2023
751f5e3
Merge branch 'master' into test-cases
josephcmiller2 Apr 15, 2023
896d7bc
Merge branch 'master' into test-cases
josephcmiller2 Apr 19, 2023
82e48d1
Don't include example test outputs
josephcmiller2 Apr 19, 2023
7cf8d00
Use --ai-settings instead of copying the files
josephcmiller2 Apr 19, 2023
b81becc
Fix typo
josephcmiller2 Apr 19, 2023
2deb159
Removed unused import and sorted imports as required
josephcmiller2 Apr 19, 2023
1f7d694
Fix formatting as required
josephcmiller2 Apr 19, 2023
471ef84
Merge branch 'master' into test-cases
k-boikov May 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions tests/prompt_tests/000001-weather-denver/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"name": "Denver Weather Next Week",
"description": "Get the weather over the next week in Denver, CO. The information returned is not likely to be accurate, however, the task is completed.",
"yaml_prompt": "weather-denver-txt.yaml",
"output_files": [
"weather-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"20",
"--skip-reprompt"
],
"env": {
"TEMPERATURE": "0"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
ai_goals:
- Get the weather forecast in Denver, CO for each day over the next week.
- Output the information to a file called weather-denver.txt
ai_name: WeatherGPT
ai_role: an AI designed to get weather conditions in Denver, CO.
6 changes: 6 additions & 0 deletions tests/prompt_tests/000002-movie-denver/movie-denver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ai_goals:
- Select a movie theater in Denver, CO
- Select a move currently playing in that theater
- Output the movie title and a short description to 000002-movie-denver.txt
ai_name: MovieGPT
ai_role: an AI to find a movie playing in a theater in Denver, CO.
21 changes: 21 additions & 0 deletions tests/prompt_tests/000002-movie-denver/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "Denver Movie Playing",
"description": "Get a movie playing at a theater and save to a text file.",
"yaml_prompt": "movie-denver.yaml",
"output_files": [
"000002-movie-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"30",
"--skip-reprompt"
],
"env": {
"TEMPERATURE": "0",
"EXECUTE_LOCAL_COMMANDS": "True"
}
}
}
6 changes: 6 additions & 0 deletions tests/prompt_tests/000003-movies-denver/movies-denver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ai_goals:
- Select a movie theater in Denver, CO.
- Get a list of movies that are currently playing.
- Output each movie title to 000003-movie-denver.txt
ai_name: MovieGPT
ai_role: an AI to find a movie playing in a theater in Denver, CO.
21 changes: 21 additions & 0 deletions tests/prompt_tests/000003-movies-denver/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "List Movies in a Denver Theater",
"description": "Get a list of movies playing at a theater and save to a text file.",
"yaml_prompt": "movies-denver.yaml",
"output_files": [
"000003-movie-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"30",
"--skip-reprompt"
],
"env": {
"TEMPERATURE": "0",
"EXECUTE_LOCAL_COMMANDS": "True"
}
}
}
7 changes: 7 additions & 0 deletions tests/prompt_tests/000004-movies-denver/movies-denver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
ai_goals:
- Select a movie theater in Denver, CO.
- Get a list of movies that are currently playing.
- Output each movie title to 000004-movie-denver.txt
- After adding the movie titles, then add the name of the movie theater at the top of the file.
ai_name: MovieGPT
ai_role: an AI to find a movie playing in a theater in Denver, CO.
21 changes: 21 additions & 0 deletions tests/prompt_tests/000004-movies-denver/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "List Movies and Theater Name in Denver",
"description": "Get a list of movies playing at a theater and save to a text file along with the name of the theater.",
"yaml_prompt": "movies-denver.yaml",
"output_files": [
"000004-movie-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"30",
"--skip-reprompt"
],
"env": {
"TEMPERATURE": "0",
"EXECUTE_LOCAL_COMMANDS": "True"
}
}
}
6 changes: 6 additions & 0 deletions tests/prompt_tests/000005-movies-denver/movies-denver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ai_goals:
- Select a movie theater in Denver, CO.
- Get a list of movies that are currently playing.
- Output each movie title to 000005-movie-denver.txt. After adding the movie titles, then add the name of the movie theater at the top of the file.
ai_name: MovieGPT
ai_role: an AI to find a movie playing in a theater in Denver, CO.
21 changes: 21 additions & 0 deletions tests/prompt_tests/000005-movies-denver/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "List Movies and Theater Name in Denver with Multiple Actions in One Goal",
"description": "Get the movies playing at a theater and save to a text file. Include multiple instructions in a single goal.",
"yaml_prompt": "movies-denver.yaml",
"output_files": [
"000005-movie-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"30",
"--skip-reprompt"
],
"env": {
"TEMPERATURE": "0",
"EXECUTE_LOCAL_COMMANDS": "True"
}
}
}
7 changes: 7 additions & 0 deletions tests/prompt_tests/000006-movies-denver/movies-denver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
ai_goals:
- Select a movie theater in Denver, CO.
- Get a list of movies that are currently playing.
- Find the ratings of each movie (e.g. G, PG, PG-13, R).
- Output each movie title to 000006-movie-denver.txt along with its rating. After adding the movie titles, then add the name of the movie theater at the top of the file.
ai_name: MovieGPT
ai_role: an AI to find a movie playing in a theater in Denver, CO.
21 changes: 21 additions & 0 deletions tests/prompt_tests/000006-movies-denver/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "List Movies in Denver Theater with Ratings",
"description": "Get the movies and their respective rating playing at a theater in Denver and save to a text file.",
"yaml_prompt": "movies-denver.yaml",
"output_files": [
"000005-movie-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"30",
"--skip-reprompt"
],
"env": {
"TEMPERATURE": "0",
"EXECUTE_LOCAL_COMMANDS": "True"
}
}
}
65 changes: 65 additions & 0 deletions tests/prompt_tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Prompt Test Script

The prompt test script is a Python script that executes tests for a command line application. Each test is defined in a separate directory in the `tests/prompt_tests` directory, and is defined by a `test.json` configuration file.

## Usage

The script accepts the following arguments:

- `--test`: Specifies which test or tests to run. Required. Valid values are `all` (to run all tests) or the name of a specific test subdirectory in `tests/prompt_tests`.
- `--list`: Lists the available tests and their names in the format `subdirectory_name: test_name`. Optional.

To run the script, use the following command in the root directory:

```
python tests/prompt_tests/prompt-test.py --test [test_name|all] [--list]
```

## Creating Tests

To create your own tests, follow these steps:

1. Create a new directory in `tests/prompt_tests` with a descriptive name for your test.
2. Inside the test directory, create a `test.json` configuration file with the following fields:

- `name`: The name of the test.
- `description`: A description of what the test does.
- `yaml_prompt`: The filename of the YAML prompt file to use for the test.
- `output_files`: A list of filenames of the expected output files for the test.
- `exec`: An object with the following fields:

- `command`: The command to execute for the test. Should be the path to the executable file.
- `arguments`: A list of command line arguments to pass to the executable.
- `env` (optional): A dictionary of environment variables to set for the test.

Example full `test.json` file:
```
{
"name": "Denver Weather Next Week",
"description": "Get the weather over the next week in Denver, CO. The information returned is not likely to be accurate, however, the task is completed.",
"yaml_prompt": "weather-denver-txt.yaml",
"output_files": [
"weather-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"20",
"--use-yaml-file"
],
"env": {
"TEMPERATURE": "0"
}
}
}
```

3. Create the YAML prompt file for the test and save it in the test directory.
4. Create the expected output files for the test and save them in the test directory.
5. Run the test using the test runner script to ensure it is working as expected.

## TODO
- Create robust method to test PASS/FAIL for tests
- Generate a test report
128 changes: 128 additions & 0 deletions tests/prompt_tests/prompt-test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""
ChatGPT Prompt
==============
Write a python script that executes tests.
- Each test is in a subdirectory in tests/prompt_tests
- The script accepts the arguments: "--test", "--list"
- The argument "--test" is required and specifies either "all" for all tests or a specific subdirectory name for a single test
- The argument "--list" is optional and asks the script to list the tests that are available using the format "subdirectory_name: test_name"
- When "--test all" is specified, the script executes all tests, e.g. loading the test from each subdirectory.
- Each test has a config file in the subdirectory called test.json
- A function called run_test(test_subdirectory) is called for each test
- The run_test() function loads "{test_subdirectory}/test.json" and executes the test without changing the current directory
- The run_test() function first copies the "yaml_prompt" file from the test subdirectory to the current directory and renames it to "ai_settings.yaml"
- The run_test() function outputs the name of the test and the command it is about to execute before running.
- The "exec" field in the test.json file specifies the command to execute and the arguments to pass to the command. The command is called with the python3 interpreter.
Example test.json:
{
"name": "Denver Weather Next Week",
"description": "Get the weather over the next week in Denver, CO. The information returned is not likely to be accurate, however, the task is completed.",
"yaml_prompt": "weather-denver-txt.yaml",
"output_files": [
"weather-denver.txt"
],
"exec": {
"command": "scripts/main.py",
"arguments": [
"--continuous",
"--continuous-limit",
"20",
"--use-yaml-file"
],
"env": {
"TEMPERATURE": "0"
}
}
}
"""


import argparse
import json
import os
import subprocess


# Run a single test
def run_test(test_subdirectory):
test_path = os.path.join("tests", "prompt_tests", test_subdirectory)
config_path = os.path.join(test_path, "test.json")
with open(config_path, "r") as config_file:
config = json.load(config_file)
prompt_path = os.path.join(test_path, config["yaml_prompt"])

# Copy the yaml prompt to the current directory and rename it to ai_settings.yaml
command = ["python3", "-m", "autogpt", "--ai-settings", prompt_path] + config[
"exec"
]["arguments"]

# Start with our environment and update with any environment variables from the config
env = os.environ.copy()
env.update(config["exec"].get("env", {}))

# Run the command
print(f"Running test: {config['name']}")
print(f"Command: {' '.join(command)}")
subprocess.run(command, env=env)

# Check the output files
check_output_files(config["output_files"])


# Check that the output files exist and are not empty
def check_output_files(output_files_list):
missing_files = []
for filename in output_files_list:
file_path = os.path.join("auto_gpt_workspace", filename)

if not os.path.exists(file_path):
missing_files.append(filename)
elif os.stat(file_path).st_size == 0:
# File exists but is empty
missing_files.append(filename)

if missing_files:
print("Error: The following output files are missing or empty:")
for filename in missing_files:
print(f"- {filename}")


# List the available tests
def list_tests():
for test_subdirectory in os.listdir(os.path.join("tests", "prompt_tests")):
if os.path.isdir(os.path.join("tests", "prompt_tests", test_subdirectory)):
config_path = os.path.join(
"tests", "prompt_tests", test_subdirectory, "test.json"
)
with open(config_path, "r") as config_file:
config = json.load(config_file)
print(f"{test_subdirectory}: {config['name']}")


def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--test", choices=["all"] + os.listdir(os.path.join("tests", "prompt_tests"))
)
parser.add_argument("--list", action="store_true")
args = parser.parse_args()

# Check the arguments
if not args.test and not args.list:
parser.error("--test TEST_NAME is required")

if args.list:
list_tests()
return

# Run the tests
if args.test == "all":
for test_subdirectory in os.listdir(os.path.join("tests", "prompt_tests")):
if os.path.isdir(os.path.join("tests", "prompt_tests", test_subdirectory)):
run_test(test_subdirectory)
else:
run_test(args.test)


if __name__ == "__main__":
main()