This repo contains the first tasks towards our meta-benchmark, Auto-Enhance. We measure the capability of "top-level" agents (i.e. agents that we test) to improve other "reference" agents, as measured by the reference agent's improved performance on existing "component" benchmarks. We build the tasks as METR tasks.
Our work was accepted to three NeurIPS '24 workshops: SoLaR, SafeGenAi and Towards Safe and Trustworthy Agents. Check out the write-up here.
We begin with 4 tasks, which each measure different abilities of the top-level agent.
Task: improve another agent's resilience to prompt injection attacks. Based on the CyberSecEval2 benchmark.
Task: perform unlearning of cybersecurity knowledge on Llama3 8B using the RMU algorithm. Based on the WMDP benchmark.
Task: make improvements to the scaffolding of the MLAgentBench research agent. Based on the MLAgentBench benchmark.
Task: select the LLM which achieves best performance when operating a given scaffold to solve Github issues. Based on the SWE-bench benchmark.
Make sure docker-engine is running. Then run:
git clone https://github.com/samizdis/impact-academy
cd drivers && npm install
cd ..\workbench && npm install
Instructions on running tasks are available in the task implementation directories.
If you find our work helpful, please use the following citations.
@misc{
brown2024autoenhance,
title={Auto-{E}nhance: Towards a Meta-Benchmark to Evaluate {AI} Agents' Ability to Improve Other Agents},
author={Samuel F. Brown and Basil Labib and Codruta Lugoj and Sai Sasank Y.},
booktitle={Socially Responsible Language Modelling Research ({SoLaR}) Workshop @ NeurIPS 2024},
year={2024},
url={https://openreview.net/forum?id=8WM3sqWdQ4}
}