Skip to content

Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".

Notifications You must be signed in to change notification settings

NicerWang/ToolCommander

Repository files navigation

ToolCommander: Adversarial Tool Scheduling Framework

Paper Here

This repository contains the official implementation of the paper, "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection". The paper introduces ToolCommander, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.

Table of Contents


Data

The dataset used in this project is located in the data directory. The files follow this naming convention:

g1_<train/eval>_<a/b/c>.json

Where:

  • g1 refers to the original category from the ToolBench dataset.
  • train and eval denote the training and evaluation sets, respectively.
  • a, b, and c represent different keywords used to generate the data:
    • a: YouTube
    • b: Email
    • c: Stock

ToolBench Dataset

In addition to the provided data, you will need to download the ToolBench dataset from its official repository. Specifically, you will need the following components:

  • corpus.tsv
  • tools folder

Once downloaded, place the dataset in the data/toolbench directory. The final directory structure should look like this:

/data
├── toolbench
│   ├── corpus.tsv
│   └── tools
│       ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...

Prerequisites

To set up the environment, first install the required dependencies:

pip install -r requirements.txt

OpenAI API Setup

For evaluation using OpenAI's models, you need to set the OPENAI_API_KEY environment variable with your OpenAI API key. Detailed instructions can be found in the OpenAI API documentation.


Usage

We provide several scripts to help reproduce the results presented in the paper.

Running the Adversarial Attack

To execute the adversarial injection attack and evaluate the results, use the following command:

bash attack_all.sh && bash eval_all.sh
  • attack_all.sh: Executes the adversarial injection attack across all retrievers and datasets.
  • eval_all.sh: Evaluates the performance of the retrievers after the attack.

The results will be printed directly in the console.


Baselines

We compare ToolCommander against the PoisonedRAG baseline. For more details, visit the PoisonedRAG repository.

Baseline Data

The attack results generated by PoisonedRAG have been provided in the data directory as:

g1_train_{a/b/c}_poisonedRAG_generated.pkl

Baseline Evaluation

To evaluate the baseline performance, run the following command:

python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl

About

Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".

Resources

Stars

Watchers

Forks