[notebook] Data Exfiltration from Slack AI via indirect prompt injection #19

mbalunovic · 2024-09-09T17:52:32Z

We should create a notebook motivated by a recent blog post on indirect prompt injection in Slack.
The notebook should consist of several parts:

We should create a toy Slack simulation. It's basically a class that simulates Slack workspace similar to what we did in AgentDojo (https://github.com/ethz-spylab/agentdojo/blob/main/src/agentdojo/default_suites/v1/tools/slack.py), but we don't really need all Slack features from there. Most important are reading messages from channels, ability to have private/public channels and RAG-based search that simulates Slack AI where user can search over all channels they have access to.
After we have a Slack simulation, we should implement the data exfiltration attack from the blog post. It's an indirect prompt injection where the attacker posts prompt injection into a public channel, which is then used to exfiltrate data from a private channel when RAG is performed.
We should discuss how Invariant can protect against this vulnerability. Basically the policy should be that in the output of RAG system there should not be a URL that is not verbatim present in one of the retrieved sources.

mbalunovic added good first issue Good for newcomers examples Examples of using Invariant labels Sep 9, 2024

feixieliz mentioned this issue Oct 3, 2024

slackai_attack_simulation jupyter notebook #20

Draft

3 tasks

Provide feedback