Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator enhancements and minor improvements #391

Merged
merged 5 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 38 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,30 +114,35 @@ Send PRs & open issues. Happy hunting!

## Intro to generators

### huggingface
### Hugging Face

* `--model_type huggingface` (for transformers models to run locally)
* `--model_name` - use the model name from Hub. Only generative models will work. If it fails and shouldn't, please open an issue and paste in the command you tried + the exception!

* `--model_type huggingface.InferenceAPI` (for API-based model access)
* `--model_name` - the model name from Hub, e.g. `"mosaicml/mpt-7b-instruct"`
* `--model_type huggingface.InferenceEndpoint` (for private endpoints)
* `--model_name` - the endpoint URL, e.g. `https://xxx.us-east-1.aws.endpoints.huggingface.cloud`

* (optional) set the `HF_INFERENCE_TOKEN` environment variable to a Hugging Face API token with the "read" role; see https://huggingface.co/settings/tokens when logged in

### openai
### OpenAI

* `--model_type openai`
* `--model_name` - the OpenAI model you'd like to use. `text-babbage-001` is fast and fine for testing; `gpt-4` seems weaker to many of the more subtle attacks.
* set the `OPENAI_API_KEY` environment variable to your OpenAI API key (e.g. "sk-19763ASDF87q6657"); see https://platform.openai.com/account/api-keys when logged in

Recognised model types are whitelisted, because the plugin needs to know which sub-API to use. Completion or ChatCompletion models are OK. If you'd like to use a model not supported, you should get an informative error message, and please send a PR / open an issue.

### replicate
### Replicate

* `--model_type replicate`
* `--model_name` - the Replicate model name and hash, e.g. `"stability-ai/stablelm-tuned-alpha-7b:c49dae36"`
* `--model_type replicate.InferenceEndpoint` (for private endpoints)
* `--model_name` - username/model-name slug from the deployed endpoint, e.g. `elim/elims-llama2-7b`
* set the `REPLICATE_API_TOKEN` environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in

### cohere
### Cohere

* `--model_type cohere`
* `--model_name` (optional, `command` by default) - The specific Cohere model you'd like to test
Expand All @@ -149,7 +154,15 @@ Recognised model types are whitelisted, because the plugin needs to know which s
* `--model_name` - The path to the ggml model you'd like to load, e.g. `/home/leon/llama.cpp/models/7B/ggml-model-q4_0.bin`
* set the `GGML_MAIN_PATH` environment variable to the path to your ggml `main` executable

### test
### OctoAI

* `--model_type octo`
* `--model_name` - the OctoAI public endpoint for the model, e.g. `mistral-7b-instruct-fp16`
* `--model_type octo.InferenceEndpoint` (for private endpoints)
* `--model_name` - the deployed endpoint URL, e.g. `https://llama-2-70b-chat-xxx.octoai.run/v1/chat/completions`
* set the `OCTO_API_TOKEN` environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in

### Test

* `--model_type test`
* (alternatively) `--model_name test.Blank`
Expand All @@ -160,53 +173,26 @@ For testing. This generator repeats back the prompt it received.

## Intro to probes

### blank

A simple probe that always sends an empty prompt.

### continuation

Probes that test if the model will continue a probably undesirable word

### dan

Various [DAN]() and DAN-like attacks

### encoding

Prompt injection through text encoding

### malwaregen

Attempts to have the model generate code for building malware

### knownbadsignatures

Probes that attempt to make the model output malicious content signatures

### lmrc

Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes

### misleading

Attempts to make a model support misleading and false claims

### promptinject

Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022)

### realtoxicityprompts

Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run)

### snowball

[Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process

### art

Auto Red-Team. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/leondz/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now).
| Probe | Description |
| --- | --- |
| blank | A simple probe that always sends an empty prompt. |
| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/leondz/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
| continuation | Probes that test if the model will continue a probably undesirable word |
| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
| encoding | Prompt injection through text encoding |
| gcg | Disrupt a system prompt by appending an adversarial suffix. |
| glitch | Probe model for glitch tokens that provoke unusual behavior. |
| goodside | Implementations of Riley Goodside attacks. |
| knownbadsignatures | Probes that attempt to make the model output malicious content signatures |
| leakerplay | Evaluate if a model will replay training data. |
| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
| malwaregen | Attempts to have the model generate code for building malware |
| misleading | Attempts to make a model support misleading and false claims |
| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfilteration. |

## Logging

Expand Down
57 changes: 53 additions & 4 deletions garak/generators/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,9 @@
class HFRateLimitException(Exception):
pass


class HFLoadingException(Exception):
pass


class HFInternalServerError(Exception):
pass

Expand Down Expand Up @@ -106,7 +104,6 @@ def _call_model(self, prompt: str) -> List[str]:
else:
return [re.sub("^" + re.escape(prompt), "", i) for i in generations]


class OptimumPipeline(Pipeline):
"""Get text generations from a locally-run Hugging Face pipeline using NVIDIA Optimum"""

Expand Down Expand Up @@ -149,7 +146,6 @@ def __init__(self, name, do_sample=True, generations=10, device=0):
if _config.run.deprefix is True:
self.deprefix_prompt = True


class InferenceAPI(Generator):
"""Get text generations from Hugging Face Inference API"""

Expand Down Expand Up @@ -256,7 +252,60 @@ def _call_model(self, prompt: str) -> List[str]:
def _pre_generate_hook(self):
self.wait_for_model = False

class InferenceEndpoint(InferenceAPI):
"""Interface for Hugging Face private endpoints
Pass the model URL as the name, e.g. https://xxx.aws.endpoints.huggingface.cloud
"""

supports_multiple_generations = False
import requests

def __init__(self, name="", generations=10):
super().__init__(name, generations=generations)
self.api_url = name

@backoff.on_exception(
backoff.fibo,
(
HFRateLimitException,
HFLoadingException,
HFInternalServerError,
requests.Timeout,
),
max_value=125,
)
def _call_model(self, prompt: str) -> List[str]:
import requests

payload = {
"inputs": prompt,
"parameters": {
"return_full_text": not self.deprefix_prompt,
"max_time": self.max_time,
},
"options": {
"wait_for_model": self.wait_for_model,
},
}
if self.max_tokens:
payload["parameters"]["max_new_tokens"] = self.max_tokens

if self.generations > 1:
payload["parameters"]["do_sample"] = True

response = requests.post(
self.api_url,
headers=self.headers,
json=payload
).json()
try:
output = response[0]["generated_text"]
except:
raise IOError(
"Hugging Face 🤗 endpoint didn't generate a response. Make sure the endpoint is active."
)
return output

class Model(Generator):
"""Get text generations from a locally-run Hugging Face model"""

Expand Down
70 changes: 53 additions & 17 deletions garak/generators/octo.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,20 @@


class OctoGenerator(Generator):
"""Interface for OctoML models
"""Interface for OctoAI public endpoints

Pass the model URL as the name, e.g. https://llama-2-70b-chat-demo-kk0powt97tmb.octoai.run/v1/chat/completions

This module tries to guess the internal model name in self.octo_model.
We don't have access to private model so don't know the format.
If garak guesses wrong, please please open a ticket.
Pass the model name as `name`, e.g. llama-2-13b-chat-fp16.
For more details, see https://octoai.cloud/tools/text.
"""

generator_family_name = "OctoML"
generator_family_name = "OctoAI"
supports_multiple_generations = False

max_tokens = 128
presence_penalty = 0
temperature = 0.1
top_p = 1

def __init__(self, name, generations=10):
from octoai.client import Client

Expand All @@ -37,22 +39,57 @@ def __init__(self, name, generations=10):
if hasattr(_config.run, "seed"):
self.seed = _config.run.seed

self.octo_model = "-".join(
self.name.replace("-demo", "").replace("https://", "").split("-")[:-1]
)

super().__init__(name, generations=generations)

if os.getenv("OCTO_API_KEY", default=None) is None:
octoai_token = os.getenv("OCTO_API_TOKEN", default=None)
if octoai_token is None:
raise ValueError(
'Put the Replicate API token in the OCTOAI_TOKEN environment variable (this was empty)\n \
e.g.: export OCTOAI_TOKEN="kjhasdfuhasi8djgh"'
'🛑 Put the OctoAI API token in the OCTO_API_TOKEN environment variable (this was empty)\n \
e.g.: export OCTO_API_TOKEN="kjhasdfuhasi8djgh"'
)
self.octoml = Client(token=os.getenv("OCTO_API_KEY", default=None))
self.client = Client(token=octoai_token)

@backoff.on_exception(backoff.fibo, octoai.errors.OctoAIServerError, max_value=70)
def _call_model(self, prompt):
outputs = self.octoml.infer(
outputs = self.client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful assistant. Keep your responses limited to one short paragraph if possible."
},
{
"role": "user",
"content": prompt
}
],
model=self.name,
max_tokens=self.max_tokens,
presence_penalty=self.presence_penalty,
temperature=self.temperature,
top_p=self.top_p,
)

return outputs.choices[0].message.content

class InferenceEndpoint(OctoGenerator):
"""Interface for OctoAI private endpoints

Pass the model URL as the name, e.g. https://llama-2-70b-chat-xxx.octoai.run/v1/chat/completions

This module tries to guess the internal model name in self.octo_model.
We don't have access to private model so don't know the format.
If garak guesses wrong, please please open a ticket.
"""

def __init__(self, name, generations=10):
super().__init__(name, generations=generations)
self.octo_model = "-".join(
self.name.replace("-demo", "").replace("https://", "").split("-")[:-1]
)

@backoff.on_exception(backoff.fibo, octoai.errors.OctoAIServerError, max_value=70)
def _call_model(self, prompt):
outputs = self.client.infer(
endpoint_url=self.name,
inputs={
"model": self.octo_model,
Expand All @@ -68,5 +105,4 @@ def _call_model(self, prompt):
)
return outputs.get("choices")[0].get("message").get("content")


default_class = "OctoGenerator"
38 changes: 34 additions & 4 deletions garak/generators/replicate.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@


class ReplicateGenerator(Generator):
"""Wrapper for the Replicate hosted models (replicate.com). Expects API key in REPLICATE_API_TOKEN environment variable."""
"""
Interface for public endpoints of models hosted in Replicate (replicate.com).
Expects API key in REPLICATE_API_TOKEN environment variable.
"""

generator_family_name = "Replicate"
temperature = 1
Expand All @@ -31,7 +34,7 @@ class ReplicateGenerator(Generator):

def __init__(self, name, generations=10):
self.name = name
self.fullname = f"Replicate {self.name}"
self.fullname = f"{self.generator_family_name} {self.name}"
self.seed = 9
if hasattr(_config.run, "seed"):
self.seed = _config.run.seed
Expand All @@ -40,7 +43,7 @@ def __init__(self, name, generations=10):

if os.getenv("REPLICATE_API_TOKEN", default=None) is None:
raise ValueError(
'Put the Replicate API token in the REPLICATE_API_TOKEN environment variable (this was empty)\n \
'🛑 Put the Replicate API token in the REPLICATE_API_TOKEN environment variable (this was empty)\n \
e.g.: export REPLICATE_API_TOKEN="r8-123XXXXXXXXXXXX"'
)
self.replicate = importlib.import_module("replicate")
Expand All @@ -62,5 +65,32 @@ def _call_model(self, prompt):
)
return "".join(response_iterator)


class InferenceEndpoint(ReplicateGenerator):
"""
Interface for private Replicate endpoints.
Expects `name` in the format of `username/deployed-model-name`.
"""
@backoff.on_exception(
backoff.fibo, replicate.exceptions.ReplicateError, max_value=70
)
def _call_model(self, prompt):
deployment = self.replicate.deployments.get(self.name)
prediction = deployment.predictions.create(
input={
"prompt": prompt,
"max_length": self.max_tokens,
"temperature": self.temperature,
"top_p": self.top_p,
"repetition_penalty": self.repetition_penalty,
},
)
prediction.wait()
try:
response = "".join(prediction.output)
except TypeError:
raise IOError(
"Replicate endpoint didn't generate a response. Make sure the endpoint is active."
)
return response

default_class = "ReplicateGenerator"
Loading