This repository contains the official implementation of the paper, "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection". The paper introduces ToolCommander, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.
The dataset used in this project is located in the data
directory. The files follow this naming convention:
g1_<train/eval>_<a/b/c>.json
Where:
g1
refers to the original category from the ToolBench dataset.train
andeval
denote the training and evaluation sets, respectively.a
,b
, andc
represent different keywords used to generate the data:a
: YouTubeb
: Emailc
: Stock
In addition to the provided data, you will need to download the ToolBench dataset from its official repository. Specifically, you will need the following components:
corpus.tsv
tools
folder
Once downloaded, place the dataset in the data/toolbench
directory. The final directory structure should look like this:
/data
├── toolbench
│ ├── corpus.tsv
│ └── tools
│ ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...
To set up the environment, first install the required dependencies:
pip install -r requirements.txt
For evaluation using OpenAI's models, you need to set the OPENAI_API_KEY
environment variable with your OpenAI API key. Detailed instructions can be found in the OpenAI API documentation.
We provide several scripts to help reproduce the results presented in the paper.
To execute the adversarial injection attack and evaluate the results, use the following command:
bash attack_all.sh && bash eval_all.sh
attack_all.sh
: Executes the adversarial injection attack across all retrievers and datasets.eval_all.sh
: Evaluates the performance of the retrievers after the attack.
The results will be printed directly in the console.
We compare ToolCommander against the PoisonedRAG
baseline. For more details, visit the PoisonedRAG repository.
The attack results generated by PoisonedRAG
have been provided in the data
directory as:
g1_train_{a/b/c}_poisonedRAG_generated.pkl
To evaluate the baseline performance, run the following command:
python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl