ML Research Benchmark Tasks

This repository contains the tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.

Introduction

The MLRB aims to measure the acceleration of AI agents in ML research and development. It focuses on competition-level tasks that reflect the current frontiers of ML research, providing a more nuanced and challenging evaluation environment than existing benchmarks.

Installation

pip install mlrb-agent-tasks

Usage

The library exposes a single function, get_task

get_task:

path: path to copy the task to
benchmark: name of the benchmark
task: name of the task

This function will copy the task to the specified path and return a dictionary with the task name and prompt.

{
    "name": str, - name of the task
    "prompt": str, - prompt for the task
}

Example Usage

from mlrb_agent_tasks import get_task

# Example usage
result = get_task("./", "full_benchmark", "llm_efficiency")
print(result['prompt'])

Contributing

We welcome contributions to the ML Research Benchmark! Please read our CONTRIBUTING.md file for guidelines on how to submit issues, feature requests, and pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or feedback, please open an issue in this repository or contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
mlrb_agent_tasks		mlrb_agent_tasks
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Research Benchmark Tasks

Introduction

Installation

Usage

Example Usage

Contributing

License

Contact

About

Releases

Packages

Languages

License

AlgorithmicResearchGroup/ML-Research-Agent-Tasks

Folders and files

Latest commit

History

Repository files navigation

ML Research Benchmark Tasks

Introduction

Installation

Usage

Example Usage

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages