ToolCommander: Adversarial Tool Scheduling Framework

This repository contains the official implementation of the paper, "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection". The paper introduces ToolCommander, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.

Data

The dataset used in this project is located in the data directory. The files follow this naming convention:

g1_<train/eval>_<a/b/c>.json

Where:

g1 refers to the original category from the ToolBench dataset.
train and eval denote the training and evaluation sets, respectively.
a, b, and c represent different keywords used to generate the data:
- a: YouTube
- b: Email
- c: Stock

ToolBench Dataset

In addition to the provided data, you will need to download the ToolBench dataset from its official repository. Specifically, you will need the following components:

corpus.tsv
tools folder

Once downloaded, place the dataset in the data/toolbench directory. The final directory structure should look like this:

/data
├── toolbench
│   ├── corpus.tsv
│   └── tools
│       ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...

Prerequisites

To set up the environment, first install the required dependencies:

pip install -r requirements.txt

OpenAI API Setup

For evaluation using OpenAI's models, you need to set the OPENAI_API_KEY environment variable with your OpenAI API key. Detailed instructions can be found in the OpenAI API documentation.

Usage

We provide several scripts to help reproduce the results presented in the paper.

Running the Adversarial Attack

To execute the adversarial injection attack and evaluate the results, use the following command:

bash attack_all.sh && bash eval_all.sh

attack_all.sh: Executes the adversarial injection attack across all retrievers and datasets.
eval_all.sh: Evaluates the performance of the retrievers after the attack.

The results will be printed directly in the console.

Baselines

We compare ToolCommander against the PoisonedRAG baseline. For more details, visit the PoisonedRAG repository.

Baseline Data

The attack results generated by PoisonedRAG have been provided in the data directory as:

g1_train_{a/b/c}_poisonedRAG_generated.pkl

Baseline Evaluation

To evaluate the baseline performance, run the following command:

python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
data		data
scripts		scripts
README.md		README.md
attack.py		attack.py
core.py		core.py
evaluate.py		evaluate.py
llm.py		llm.py
mcg.py		mcg.py
tool_attacker.py		tool_attacker.py
toolbench.py		toolbench.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToolCommander: Adversarial Tool Scheduling Framework

Table of Contents

Data

ToolBench Dataset

Prerequisites

OpenAI API Setup

Usage

Running the Adversarial Attack

Baselines

Baseline Data

Baseline Evaluation

About

Contributors 2

Languages

NicerWang/ToolCommander

Folders and files

Latest commit

History

Repository files navigation

ToolCommander: Adversarial Tool Scheduling Framework

Table of Contents

Data

ToolBench Dataset

Prerequisites

OpenAI API Setup

Usage

Running the Adversarial Attack

Baselines

Baseline Data

Baseline Evaluation

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages