A framework for AI-assisted program synthesis. Given a problem description and some input-output examples, the framework generates a program that solves the problem.
You can find an in-depth discussion of this tool, the philosophy it implements and its usage in our paper, Fully Autonomous Programming with Large Language Models. Consider citing it if you use SEIDR in your research.
from seidr import dev
help(dev)
The experiments are contained in benchmark.py
and benchmark_humaneval.py
files. When you run this file, the AI-generated programs are commited to a dedicated github repository, while the metrics (i.e. how many tests every program passes) will be logged in your Weights and Biases
- Create an account on Weights and Biases
- Install the Weights and Biases library
- Run
wandb login
and follow the instructions
- Go to github, log in to the account that's going to push AI-generated code. Remember the $username and $email for that account.
- Go here and generate an access $token
- Set
GIT_USER
to "Bot" or whatever the name of the committer shall be - Set
GIT_EMAIL
to $email - Set
GIT_REMOTE
to https://$username:$[email protected]/$repo
Note that you can use a non-GitHub git hosting.
OpenAI account is needed with access to gpt-3.5-turbo
and
an OPENAI_API_KEY
environment variable
set to your OpenAI API access token.
Run Ollama with Llama 3-8B or another model locally
or on a server.
In the latter case, start the Ollama server with the following commands and note the URL:PORT
pair:
OLLAMA_HOST=URL:PORT ollama serve &
OLLAMA_HOST=URL:PORT ollama pull llama3 &
Example .config
file layout:
# Github
export GIT_REMOTE=https://USERNAME:[email protected]/SOLUTIONS_REPO
export GIT_USER=...
export GIT_EMAIL=...
# Data
export DATA_PATH=...
# OpenAI
export OPENAI_API_KEY=...
export OPENAI_ORG=...
# WandB
export WANDB_ENTITY=...
export WANDB_DIR=...
If you're using Slurm, write a run.sh
file with python benchmark.py
and run it with sbatch run.sh --array=1-500
.
If not, run TASK_ID=n python benchmark.py
to re-run one of our experiments exactly,
or set the parameters yourself as below.
For example, for basement problem in PSB2, run SEIDR without lexicase selection as follows:
python3 benchmark.py \
--task_id 0 \
--problem bowling \
--language Python \
--branching_factor 2 \
--max_programs 100 \
--drafts_per_prompt 2 \
--explanations_per_program 2 \
--repairs_per_explanation 2 \
--beam_width 2 \
--log INFO \
--lexicase_selection False \
--dataset humaneval \
--model_name gpt-3.5-turbo \
--valid_examples 50 \
--experiment_id 0
To run an example with SEIDR with Llama 3 served by Ollama at URL:PORT
on HumanEval with lexicase, run the following:
python3 benchmark_humaneval.py \
--task_id 0 \
--problem Python/0 \
--language Python \
--branching_factor 2 \
--max_programs 100 \
--drafts_per_prompt 2 \
--explanations_per_program 2 \
--repairs_per_explanation 2 \
--beam_width 2 \
--log INFO \
--lexicase_selection True \
--dataset humaneval \
--model_name llama3 \
--experiment_id 0 \
--ollama_url "http://URL:PORT"
Example Slurm scripts are stored in scripts/
and tables with hyperparameters in /config