Skip to content

Auto-Enhance meta-benchmark, to measure the ability of LLM agents to improve other LLM agents

License

Notifications You must be signed in to change notification settings

samizdis/impact-academy

Repository files navigation

Auto-Enhance

image

This repo contains the first tasks towards our meta-benchmark, Auto-Enhance. We measure the capability of "top-level" agents (i.e. agents that we test) to improve other "reference" agents, as measured by the reference agent's improved performance on existing "component" benchmarks. We build the tasks as METR tasks.

Our work was accepted to three NeurIPS '24 workshops: SoLaR, SafeGenAi and Towards Safe and Trustworthy Agents. Check out the write-up here.

Tasks

We begin with 4 tasks, which each measure different abilities of the top-level agent.

Prompt-Injection

Task: improve another agent's resilience to prompt injection attacks. Based on the CyberSecEval2 benchmark.

Task Implementation

WMDP

Task: perform unlearning of cybersecurity knowledge on Llama3 8B using the RMU algorithm. Based on the WMDP benchmark.

Task Implementation

MLAgentBench (MLAB)

Task: make improvements to the scaffolding of the MLAgentBench research agent. Based on the MLAgentBench benchmark.

Task Implementation

SWE-Bench

Task: select the LLM which achieves best performance when operating a given scaffold to solve Github issues. Based on the SWE-bench benchmark.

Task Implementation

Setup

Make sure docker-engine is running. Then run:

git clone https://github.com/samizdis/impact-academy
cd drivers && npm install
cd ..\workbench && npm install

Instructions on running tasks are available in the task implementation directories.

Citations

If you find our work helpful, please use the following citations.

@misc{
    brown2024autoenhance,
    title={Auto-{E}nhance: Towards a Meta-Benchmark to Evaluate {AI} Agents' Ability to Improve Other Agents},
    author={Samuel F. Brown and Basil Labib and Codruta Lugoj and Sai Sasank Y.},
    booktitle={Socially Responsible Language Modelling Research ({SoLaR}) Workshop @ NeurIPS 2024},
    year={2024},
    url={https://openreview.net/forum?id=8WM3sqWdQ4}
}

About

Auto-Enhance meta-benchmark, to measure the ability of LLM agents to improve other LLM agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •