You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should create a notebook motivated by a recent blog post on indirect prompt injection in Slack.
The notebook should consist of several parts:
We should create a toy Slack simulation. It's basically a class that simulates Slack workspace similar to what we did in AgentDojo (https://github.com/ethz-spylab/agentdojo/blob/main/src/agentdojo/default_suites/v1/tools/slack.py), but we don't really need all Slack features from there. Most important are reading messages from channels, ability to have private/public channels and RAG-based search that simulates Slack AI where user can search over all channels they have access to.
After we have a Slack simulation, we should implement the data exfiltration attack from the blog post. It's an indirect prompt injection where the attacker posts prompt injection into a public channel, which is then used to exfiltrate data from a private channel when RAG is performed.
We should discuss how Invariant can protect against this vulnerability. Basically the policy should be that in the output of RAG system there should not be a URL that is not verbatim present in one of the retrieved sources.
The text was updated successfully, but these errors were encountered:
We should create a notebook motivated by a recent blog post on indirect prompt injection in Slack.
The notebook should consist of several parts:
We should create a toy Slack simulation. It's basically a class that simulates Slack workspace similar to what we did in AgentDojo (https://github.com/ethz-spylab/agentdojo/blob/main/src/agentdojo/default_suites/v1/tools/slack.py), but we don't really need all Slack features from there. Most important are reading messages from channels, ability to have private/public channels and RAG-based search that simulates Slack AI where user can search over all channels they have access to.
After we have a Slack simulation, we should implement the data exfiltration attack from the blog post. It's an indirect prompt injection where the attacker posts prompt injection into a public channel, which is then used to exfiltrate data from a private channel when RAG is performed.
We should discuss how Invariant can protect against this vulnerability. Basically the policy should be that in the output of RAG system there should not be a URL that is not verbatim present in one of the retrieved sources.
The text was updated successfully, but these errors were encountered: