Skip to content

Commit 5b14385

Browse files
init
0 parents  commit 5b14385

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+7401
-0
lines changed

.gitignore

+189
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
130+
131+
# Mac files
132+
*.DS_Store
133+
134+
# Custom
135+
keys.cfg
136+
137+
# iPython Notebooks
138+
*.ipynb
139+
140+
# Evaluation folders
141+
results/
142+
testbed/
143+
temp/
144+
145+
# Ignore all YAML files in data/
146+
data/*/ic-*
147+
data/*/single-issues
148+
149+
# Fine tuning data
150+
fine_tune/*.ipynb
151+
fine_tune/subtasks/*.jsonl
152+
temp*.jsonl
153+
154+
# Inspector
155+
inspector/*.json
156+
157+
# Ignore all files in the private folder
158+
private/
159+
160+
### Website
161+
162+
# dependencies
163+
website/frontend/node_modules
164+
website/frontend/package-lock.json
165+
website/frontend/.pnp
166+
*.pnp.js
167+
168+
# testing
169+
website/frontend/coverage
170+
171+
# production
172+
website/frontend/build
173+
174+
# misc
175+
*.env.local
176+
*.env.development.local
177+
*.env.test.local
178+
*.env.production.local
179+
.api_key
180+
*npm-debug.log*
181+
*yarn-debug.log*
182+
*yarn-error.log*
183+
184+
185+
# demo yamls (for editing)
186+
*.demo.yaml
187+
188+
# trajectory files
189+
trajectories/*

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 John Yang, Carlos E. Jimenez, Alexander Wettig, Shunyu Yao, Karthik Narasimhan, Ofir Press
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
<p align="center">
2+
<a href="https://www.swe-agent.com/">
3+
<img src="assets/swe-agent-banner.png" alt="swe-agent.com" />
4+
</a>
5+
</p>
6+
7+
8+
<p align="center">
9+
<a href="https://www.swe-agent.com"><strong>Website & Demo</strong></a>&nbsp; | &nbsp;
10+
<a href="https://discord.gg/AVEFbBn2rH"><strong>Discord</strong></a>&nbsp; | &nbsp;
11+
<strong>Paper [coming April 10th]</strong>
12+
</p>
13+
14+
15+
## 👋 Overview <a name="overview"></a>
16+
SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.
17+
18+
On the full [SWE-bench](https://github.com/princeton-nlp/SWE-bench) test set, SWE-agent resolves **12.29%** of issues, achieving the state-of-the-art performance on the full test set.
19+
20+
### ✨ Agent-Computer Interface (ACI) <a name="aci"></a>
21+
We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an **Agent-Computer Interface** (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.
22+
23+
Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.
24+
25+
SWE-agent contains features that we discovered to be immensly helpful during the agent-computer interface design process:
26+
1. We add a linter that runs when an edit command is issued, and do not let the edit command go through if the code isn't syntactically correct.
27+
2. We supply the agent with a special-built file viewer, instead of having it just ```cat``` files. We found that this file viewer works best when displaying just 100 lines in each turn. The file editor that we built has commands for scrolling up and down and for performing a search within the file.
28+
3. We supply the agent with a special-built full-directory string searching command. We found that it was important for this tool to succintly list the matches- we simply list each file that had at least one match. Showing the model more context about each match proved to be too confusing for the model.
29+
4. When commands have an empty output we return a message saying "Your command ran successfully and did not produce any output."
30+
31+
Read our paper for more details.
32+
33+
```
34+
@misc{yang2024sweagent,
35+
title={SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models},
36+
author={John Yang and Carlos E. Jimenez and Alexander Wettig and Shunyu Yao and Karthik Narasimhan and Ofir Press},
37+
year={2024},
38+
}
39+
```
40+
41+
## 🚀 Setup <a name="setup"></a>
42+
1. [Install Docker](https://docs.docker.com/engine/install/), then start Docker locally.
43+
2. [Install Miniconda](https://docs.anaconda.com/free/miniconda/miniconda-install/), then create the `swe-agent` environment with `conda env create -f environment.yml`
44+
3. Activate using `conda activate swe-agent`.
45+
4. Run `./setup.sh` to create the `swe-agent` docker image.
46+
5. Create a `keys.cfg` file at the root of this repository and fill in the following:
47+
```
48+
OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model (optional)'
49+
ANTHROPIC_API_KEY: 'Anthropic API Key Here if using Anthropic Model (optional)'
50+
GITHUB_TOKEN: 'GitHub Token Here (required)'
51+
```
52+
See the following links for tutorials on obtaining [Anthropic](https://docs.anthropic.com/claude/reference/getting-started-with-the-api), [OpenAI](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key), and [Github]() tokens.
53+
54+
## 💽 Usage <a name="usage"></a>
55+
There are two steps to the SWE-agent pipeline. First SWE-agent takes an input GitHub issue and returns a pull request that attempts to fix it. We call that step *inference*. The second step (currently, only available for issues in the SWE-bench benchmark) is to *evaluate* the pull request to verify that it has indeed fixed the issue.
56+
57+
### 👩‍💻 Inference <a name="inference"></a>
58+
**Inference on *any* GitHub Issue**: Using this script, you can run SWE-agent on any GitHub issue!
59+
```
60+
python run.py --model_name gpt4 \
61+
--data_path https://github.com/pvlib/pvlib-python/issues/1603 --config_file config/default_from_url.yaml
62+
```
63+
64+
**Inference on SWE-bench**: Run SWE-agent on [SWE-bench Lite](https://www.swebench.com/lite.html) and generate patches.
65+
```
66+
python run.py --model_name gpt4 \
67+
--per_instance_cost_limit 2.00 \
68+
--config_file ./config/default.yaml
69+
```
70+
71+
If you'd like to run on a *single* issue from SWE-bench, use the `--instance_filter` option as follows:
72+
```
73+
python run.py --model_name gpt4 \
74+
--instance_filter marshmallow-code__marshmallow-1359
75+
```
76+
* See the [`scripts/`](scripts/) folder for other useful scripts and details.
77+
* See the [`config/`](config/) folder for details about how you can define your own configuration!
78+
* See the [`swe-agent/agent/`](agent/) folder for details about the logic behind configuration based workflows.
79+
* See the [`swe-agent/environment/`](swe-agent/environment/) folder for details about the `SWEEnv` environment (interface + implementation).
80+
* See the [`trajectories/`](trajectories) folder for details about the output of `run.py`.
81+
82+
### 🧪 Evaluation <a name="evaluation"></a>
83+
This step is only available for issues from the SWE-bench set. To evaluate generated pull requests:
84+
```
85+
cd evaluation/
86+
./run_eval.sh <predictions_path>
87+
```
88+
Replace `<predictions_path>` with the path to the model's predictions, which should be generated from the *Inference* step. The `<predictions_path>` arguments should look like `../trajectories/<username>/<model>-<dataset>-<hyperparams>/all_preds.jsonl`
89+
* See the [`evaluation/`](evaluation/) folder for details about how evaluation works.
90+
91+
92+
## 💫 Contributions <a name="contributions"></a>
93+
- If you'd like to ask questions, learn about upcoming features, and participate in future development, join our [Discord community](https://discord.gg/AVEFbBn2rH)!
94+
- If you'd like to contribute to the codebase, we welcome [issues](https://github.com/princeton-nlp/SWE-agent/issues) and [pull requests](https://github.com/princeton-nlp/SWE-agent/pulls)!
95+
- If you'd like to see a post or tutorial about some topic, please let us know via an [issue](https://github.com/princeton-nlp/SWE-agent/issues).
96+
97+
## 🪪 License <a name="license"></a>
98+
MIT. Check `LICENSE`.

assets/inspector.png

116 KB
Loading

assets/swe-agent-banner.png

52.5 KB
Loading

build_deploy.sh

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# !bin/bash
2+
3+
python3 -m build
4+
5+
python3 -m twine upload --skip-existing --repository pypi dist/*
6+
# python3 -m twine upload --skip-existing --repository testpypi dist/*

config/README.md

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Configuration
2+
3+
This folder contains details describing how to write your own configurations to control how agents can interact with the `SWEEnv` environment.
4+
A configuration is represented as a single `.yaml` file, allowing you to...
5+
* Define the **commands** that agents may use to traverse + modify a codebase.
6+
* Write **prompts** that are determiniscally/conditionally shown to the agent over the course of a single trajectory.
7+
* Control the **input/output interface** that sits between the agent and `SWEEnv`.
8+
9+
## Configuration File Fields
10+
The configuration is a `.yaml` file that consists of several fields. They are fully represented in this following outline:
11+
12+
```yaml
13+
# Prompt Templates: Control how observations of environment are shown to agent
14+
system_template: | # .yaml syntax for multi-line string value
15+
First `system` message shown to agent
16+
instance_template: |- # .yaml syntax for multi-line string value w/ no new line
17+
Instance prompt, contains task instance-specific content
18+
next_step_template: |-
19+
Format template of per-turn observation (Contains standard output from agent's action)
20+
next_step_no_output_template: |-
21+
Format template of observation when there is no standard output from the agent's action
22+
format_error_template: |-
23+
Format template of error message (Used when agent's action causes an error)
24+
demonstration_template: |
25+
Format template for showing a demonstration to the agent
26+
demonstrations:
27+
- `trajectories/<username>/<experiment folder>/*.traj`
28+
- File is a demonstration of how to solve a task. This could an agent generated trajectory.
29+
- You can include 1+ demonstrations
30+
31+
# Environment States: Define features of the SWEEnv environment
32+
env_variables:
33+
# Default variables for SWEEnv at the beginning of each instance
34+
CURRENT_FILE: 0
35+
CURRENT_LINE:
36+
OVERLAP:
37+
SEARCH_FILES:
38+
SEARCH_INDEX:
39+
SEARCH_RESULTS:
40+
WINDOW_SIZE:
41+
START_INDEX:
42+
END_INDEX:
43+
START_CURSOR:
44+
END_CUROSR:
45+
START_CURSORS_MARK:
46+
END_CURSOR_MARK:
47+
state_command: |
48+
# `state_command` allows you to update state variables to reflect any aspect of the environment (e.g. current working directory)
49+
name: state
50+
code: |
51+
state() { echo '{"pwd": "'$PWD'"}';
52+
53+
# Action Interface: Define how an agent interacts with the SWEEnv environment
54+
command_files:
55+
- path/to/bash_file.sh
56+
- Each file contains a list of commands implemented in bash
57+
- You can include 1+ command files
58+
parse_command: Reference to functionality for defining command documentation
59+
history_processor: Reference to functionality for controlling agent's message history
60+
parse_function: Parser run on agent output
61+
```
62+
63+
We recommend looking at...
64+
* `configs/` for examples of properly formatted configuration files. Each configuration differs in its set of commands, input/output format, demonstrations, etc.
65+
* `commands/` for the bash implementations of the custom commands that SWE-agent uses to navigate + edit the codebase.
66+
67+
## How a Configuration File is Processed
68+
Some notes on processing that occurs on config fields when SWE-agent is run:
69+
* Commands specified in `command_files` will be parsed into a single block of documentation text that can be referenced as `{command_docs}`.
70+
* `env_variables` are the default variables for the bash environment at the beginning of each instance.
71+
* `state_command` is used to extract state information from the bash environment (formatted as json) to be used in the templates given to the agent.
72+
73+
Possible variables that can be used in templates are:
74+
- `{command_docs}` (an automatically compiled collection of available commands + their docstrings)
75+
- any variable given in `env_variables` (same spelling), e.g., `{WINDOW_SIZE}`
76+
- any variable extracted as json as part of the `state_command` function
77+
- the last observation `{observation}`
78+
- ... this list will grow as we implement more features!

0 commit comments

Comments
 (0)