⚠️ On February 22, 2024 a research paper was released on ASCII Art based Jailbreak attaches agains aligned (aka censored) LLMs. The paper highlights the vulnerability of large language models (LLMs) to ASCII art-based attacks, challenging existing safety measures. It introduced ArtPrompt, a practical jailbreak attack leveraging LLMs' poor recognition of ASCII art. The paper made an evaluation on five top LLMs and demonstrated ArtPrompt's effectiveness in inducing undesired behaviors.
💡 NB: By the time you are reading this, commercial LLMs could be patched and this method might not work! So we thought it might be fun for you to try out this technique. We will see two kinds of ASCII art attack, how we can generate ASCII art from text and finally try it on both closed and open source models.
⚠️ Remember, this is for educational purpose only and we trust you to use it responsibly.
This free ASCII art generator is one of the best out there, but feel free to use any online or command line tool you want. For example PyFiglet and FigLet
Description: Spice up your robots.txt file with creative ASCII art using our generator tool. Choose from a variety of fonts and add a unique touch to your website’s personality. ASCII art generator
💡 As most LLMs including commercial ones are not that good with recognizing text from ASCII, you might need to prompt several times to get it to work with this kind of ASCII art. However, the higher the model, the higher the chance of getting it to work.
As you can see, you have to be extra specific in directing the LLM to recognizes the ASCII art. The below is an example from the paper.
Page 15 of the paper
This method is quite simple. You only need to follow the below format in preparing your prompt. Note the way the characters are separated with |
This technique is credited to Daedalus
Try the above method on HuggingChat, the free open source version of "ChatGPT"
Description: Making the community’s best AI chat models available to everyone. HuggingChat
Example:
- Mistral 7B : With Direct Prompt
- Mistral 7B on Hugging Chat with ASCII Art Attack
⚠️ Remember: Use this for educational and research purpose only!