Auto-Enhance

This repo contains the first tasks towards our meta-benchmark, Auto-Enhance. We measure the capability of "top-level" agents (i.e. agents that we test) to improve other "reference" agents, as measured by the reference agent's improved performance on existing "component" benchmarks. We build the tasks as METR tasks.

Our work was accepted to three NeurIPS '24 workshops: SoLaR, SafeGenAi and Towards Safe and Trustworthy Agents. Check out the write-up here.

Tasks

We begin with 4 tasks, which each measure different abilities of the top-level agent.

Prompt-Injection

Task: improve another agent's resilience to prompt injection attacks. Based on the CyberSecEval2 benchmark.

Task Implementation

WMDP

Task: perform unlearning of cybersecurity knowledge on Llama3 8B using the RMU algorithm. Based on the WMDP benchmark.

Task Implementation

MLAgentBench (MLAB)

Task: make improvements to the scaffolding of the MLAgentBench research agent. Based on the MLAgentBench benchmark.

Task Implementation

SWE-Bench

Task: select the LLM which achieves best performance when operating a given scaffold to solve Github issues. Based on the SWE-bench benchmark.

Task Implementation

Setup

Make sure docker-engine is running. Then run:

git clone https://github.com/samizdis/impact-academy
cd drivers && npm install
cd ..\workbench && npm install

Instructions on running tasks are available in the task implementation directories.

Citations

If you find our work helpful, please use the following citations.

@misc{
    brown2024autoenhance,
    title={Auto-{E}nhance: Towards a Meta-Benchmark to Evaluate {AI} Agents' Ability to Improve Other Agents},
    author={Samuel F. Brown and Basil Labib and Codruta Lugoj and Sai Sasank Y.},
    booktitle={Socially Responsible Language Modelling Research ({SoLaR}) Workshop @ NeurIPS 2024},
    year={2024},
    url={https://openreview.net/forum?id=8WM3sqWdQ4}
}

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
bin_compile		bin_compile
directory_maze		directory_maze
drivers		drivers
mlagentbench		mlagentbench
plotting		plotting
prompt_engineering		prompt_engineering
prompt_injection_uplift		prompt_injection_uplift
python-package		python-package
swe_bench_enhance		swe_bench_enhance
wmdp		wmdp
workbench		workbench
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto-Enhance

Tasks

Prompt-Injection

WMDP

MLAgentBench (MLAB)

SWE-Bench

Setup

Citations

About

Releases

Packages

Contributors 4

Languages

License

samizdis/impact-academy

Folders and files

Latest commit

History

Repository files navigation

Auto-Enhance

Tasks

Prompt-Injection

WMDP

MLAgentBench (MLAB)

SWE-Bench

Setup

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages