This is our public baseline research and development agent. It is an agentic system designed to serve as a baseline for various AI and machine learning tasks. This agent provides a foundation for comparing and evaluating machine learning research and development tasks that agents can perform. This agent is a simple, single-agent system that uses a task planner and a tools to perform machine learning tasks.
- Supports multiple AI/ML tasks
- Compatible with different LLM providers (OpenAI, Anthropic)
- Dockerized for easy deployment and reproducibility
The AI Research Benchmark Baseline Agent comes equipped with a variety of tools to assist in different AI and machine learning tasks:
-
Bash Tool: Executes bash commands and scripts.
-
Code Tool: Manages code operations including writing, inserting, replacing, and deleting code.
-
GitHub Tool: Interacts with GitHub repositories to get README files, list files, and retrieve file contents.
-
Semantic Scholar Tool: Searches for academic papers, retrieves paper details, citations, and downloads papers.
-
Python Tool: Executes Python code.
-
Return Function Tool: Handles task completion.
-
Scratchpad Tool: Provides a scratchpad for experiment note-taking and temporary storage.
-
Thought Tool: Allows the agent to process and record thoughts.
-
Long-Term Memory Tool: Manages long-term memory storage and retrieval.
These tools can be used individually or in combination to tackle a wide range of AI research and benchmark tasks. The agent can seamlessly switch between tools as needed for complex operations.
- Python 3.x
- Docker (for containerized execution)
-
Clone this repository:
git clone https://github.com/AlgorithmicResearchGroup/ML-Research-Agent-Public.git cd ML-Research-Agent-Public
-
Install dependencies:
pip install -r requirements.txt
Step 1: Create a .env file with the following environment variables:
OPENAI = <your openai api key>
ANTHROPIC = <your anthropic api key>
YOU_API_KEY = <your you.com api key>
GITHUB_ACCESS_TOKEN = <your github access token>
Step 2a: Run the agent: To run the agent without Docker, use the following command:
python3 run.py --prompt "<your prompt>" --provider "<openai or anthropic>"
Step 2b: Run the agent with Docker:
Build for CPU:
docker build --build-arg BASE_IMAGE=ubuntu:22.04 -t <image_name> .
Build for GPU:
docker build --build-arg BASE_IMAGE=nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04 -t <image_name> .
bash run.sh <image_name> \
<prompt> \
<provider> \
<"cpu" or gpu_ids eg. 0> \
<huggingface_token> \
<env_file_path>
Example on CPU:
bash run.sh ghcr.io/algorithmicresearchgroup/ml-research-agent-public \
"train an mlp on the mnist dataset" \
openai \
"cpu" \
<huggingface_token> \
/root/ML-Research-Agent-Public/.env
Example on GPU:
bash run.sh ghcr.io/algorithmicresearchgroup/ml-research-agent-public \
"train an mlp on the mnist dataset" \
openai \
0 \
<your huggingface token> \
/path/to/.env
Contributions to improve the baseline agent or add new tasks are welcome. Please submit a pull request or open an issue to discuss proposed changes.
AGPL-3.0
For questions or support, please contact Algorithmic Research Group at [email protected]