Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[notebook] Data Exfiltration from Slack AI via indirect prompt injection #19

Open
mbalunovic opened this issue Sep 9, 2024 · 0 comments
Labels
examples Examples of using Invariant good first issue Good for newcomers

Comments

@mbalunovic
Copy link
Contributor

We should create a notebook motivated by a recent blog post on indirect prompt injection in Slack.
The notebook should consist of several parts:

  1. We should create a toy Slack simulation. It's basically a class that simulates Slack workspace similar to what we did in AgentDojo (https://github.com/ethz-spylab/agentdojo/blob/main/src/agentdojo/default_suites/v1/tools/slack.py), but we don't really need all Slack features from there. Most important are reading messages from channels, ability to have private/public channels and RAG-based search that simulates Slack AI where user can search over all channels they have access to.

  2. After we have a Slack simulation, we should implement the data exfiltration attack from the blog post. It's an indirect prompt injection where the attacker posts prompt injection into a public channel, which is then used to exfiltrate data from a private channel when RAG is performed.

  3. We should discuss how Invariant can protect against this vulnerability. Basically the policy should be that in the output of RAG system there should not be a URL that is not verbatim present in one of the retrieved sources.

@mbalunovic mbalunovic added good first issue Good for newcomers examples Examples of using Invariant labels Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Examples of using Invariant good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant