Skip to content

Tasks for ML Research Benchmark, a benchmark designed to evaluate the capabilities of AI agents in accelerating AI research and development.

License

Notifications You must be signed in to change notification settings

AlgorithmicResearchGroup/ML-Research-Agent-Tasks

Repository files navigation

ML Research Benchmark Tasks

This repository contains the tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.

Introduction

The MLRB aims to measure the acceleration of AI agents in ML research and development. It focuses on competition-level tasks that reflect the current frontiers of ML research, providing a more nuanced and challenging evaluation environment than existing benchmarks.

arXiv

Installation

pip install mlrb-agent-tasks

Usage

The library exposes a single function, get_task

get_task:

  • path: path to copy the task to
  • benchmark: name of the benchmark
  • task: name of the task

This function will copy the task to the specified path and return a dictionary with the task name and prompt.

{
    "name": str, - name of the task
    "prompt": str, - prompt for the task
}

Example Usage

from mlrb_agent_tasks import get_task

# Example usage
result = get_task("./", "full_benchmark", "llm_efficiency")
print(result['prompt'])

Contributing

We welcome contributions to the ML Research Benchmark! Please read our CONTRIBUTING.md file for guidelines on how to submit issues, feature requests, and pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or feedback, please open an issue in this repository or contact [email protected].

About

Tasks for ML Research Benchmark, a benchmark designed to evaluate the capabilities of AI agents in accelerating AI research and development.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published