Antler: A python tool for automatically generating jailbreak prompts

ANalytical Tool for evaluating Large language model Efficiency in Repelling malicious instructions

The antler tool is an automatic tool for generating and evaluating jailbreak attacks against a target LLM. It is designed to aid LLM red teamers in testing models and applications.

Introduction

Built as part of a thesis project, the antler tool attempts to find a combination and permutation of jailbreak techniques, that can successfully jailbreak a given target model. The successfulness of a jailbreak is evaluated by feeding the target LLM different probe questions, wrapped in the prompts constructed by the techniques. The probes have been sampled from the SimpleSafetyTests project by bertiev, which is licensed under the Creative Commons Attribution 4.0 International License. Each probe has been paired with detectors, which mostly consist of string matching of keywords. This is done to automatically detect a malicous answer for each given probe.

The combination and permutation space of the techniques can be explored with different algorithms, with a sensible default set for different query amounts. The options are Greedy hill climb, Simulated annealing, and Random search.

Installation

Standard install with pip

pip install antler

Clone the repo to install the development version

Navigate to a suitable directory and clone the repo:

git clone [email protected]:martinebl/antler.git
cd antler

Install an editable version with pip, possibly after opening a virtual environment

pip install -e .

Geting started

The currently supported LLM providers are: OpenAI, Replicate, OctoAI and Ollama.

OpenAI

When using the openai provider, an API key is needed. This can be passed with the --api_key parameter, or set as an environment variable called "OPENAI_API_TOKEN". When specifying model name, simply use the name of the model e.g. "gpt-3.5-turbo".

Replicate

When using the replicate provider, an API key is needed. This can be passed with the --api_key parameter, or set as an environment variable called "REPLICATE_API_TOKEN". When specifying model name, it is required to include the provider e.g. "mistralai/mixtral-8x7b-instruct-v0.1".

OctoAI

When using the octoai provider, an API key is needed. This can be passed with the --api_key parameter, or set as an environment variable called "OCTOAI_TOKEN". When specifying model name, simply use the name of the model e.g. "meta-llama-3.1-8b-instruct".

Ollama

When using the ollama provider no API key is needed. It is required to have an accessible ollama instance running. If the instance is running on the default port, no further configuration is needed. If not, the correct domain and port needs to be provided as an option named "host", e.g. --options '{"host":"127.0.0.1:11434"}'. Note that the json object is required to have double quotes around parameter names. It is adviced to run queries against an Ollama instance sequentially to avoid timeouts. This is done by specifying --processes 1.

Command line Options

--provider, -p The target provider. Examples: openai, ollama, replicate
--model, -m The target model. Examples: gpt-3.5-turbo, 'llama2:7b'
--max, -M The maximum amount of queries to run, against the target LLM. Default: 100.
--explorer, -e The explorer strategy. Possible values: simulatedannealing, randomsearch and greedyhillclimb. Default depends on max queries.
--repetitions, -r The repetitions of each prompt/probe queries. Default: 3
--processes, -P The number of processes to run in parallel. Currently only has an effect when = 1, activating sequential querying.
--api_key The api_key for the target provider. Optional for locally running LLMs.
--options, -o The options for the target provider, in json format.

Examples of run parameters:

antler -p openai -m gpt-3.5-turbo --max 500 --api_key SECRET_TOKEN

antler -p ollama -m mistral -r 1 -P 1 --options '{"host":"127.0.0.1:11434"}'

Sample run and results

The gif above shows a sample run at 2x speed. The printed results, shows how the different transforms (list of techniques) performed on the target model mixtral-8x7b. The transforms that performed well (>= 50% ASR) are colored green, medium (> 0% ASR) orange, and poor (0% ASR) red. The average ASR's of each technique and probe are also displayed.

When running the program two directories will be created in the current working directory: reports and logs. In the logs directory, will be a json file containing all prompts sent to the model paired with all the different answers recieved to said prompts. In the reports directory a txt version of the printed report in the terminal is stored.

Authored by M. Borup-Larsen and C. Christoffersen

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
antler		antler
graphs		graphs
resources		resources
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
report_example.txt		report_example.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Antler: A python tool for automatically generating jailbreak prompts

Introduction

Installation

Standard install with pip

Clone the repo to install the development version

Geting started

Command line Options

Sample run and results

About

Releases

Packages

Contributors 2

Languages

License

martinebl/antler

Folders and files

Latest commit

History

Repository files navigation

Antler: A python tool for automatically generating jailbreak prompts

Introduction

Installation

Standard install with pip

Clone the repo to install the development version

Geting started

Command line Options

Sample run and results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages