NVIDIA · leondz · Jan 9, 2024 · Dec 24, 2023 · Dec 31, 2023 · Jan 2, 2024
diff --git a/README.md b/README.md
@@ -114,30 +114,35 @@ Send PRs & open issues. Happy hunting!
 
 ## Intro to generators
 
-### huggingface
+### Hugging Face
 
 * `--model_type huggingface` (for transformers models to run locally)
 * `--model_name` - use the model name from Hub. Only generative models will work. If it fails and shouldn't, please open an issue and paste in the command you tried + the exception!
 
 * `--model_type huggingface.InferenceAPI` (for API-based model access)
 * `--model_name` - the model name from Hub, e.g. `"mosaicml/mpt-7b-instruct"`
+* `--model_type huggingface.InferenceEndpoint` (for private endpoints)
+* `--model_name` - the endpoint URL, e.g. `https://xxx.us-east-1.aws.endpoints.huggingface.cloud`
+
 * (optional) set the `HF_INFERENCE_TOKEN` environment variable to a Hugging Face API token with the "read" role; see https://huggingface.co/settings/tokens when logged in
 
-### openai
+### OpenAI
 
 * `--model_type openai`
 * `--model_name` - the OpenAI model you'd like to use. `text-babbage-001` is fast and fine for testing; `gpt-4` seems weaker to many of the more subtle attacks.
 * set the `OPENAI_API_KEY` environment variable to your OpenAI API key (e.g. "sk-19763ASDF87q6657"); see https://platform.openai.com/account/api-keys when logged in
 
 Recognised model types are whitelisted, because the plugin needs to know which sub-API to use. Completion or ChatCompletion models are OK. If you'd like to use a model not supported, you should get an informative error message, and please send a PR / open an issue.
 
-### replicate
+### Replicate
 
 * `--model_type replicate`
 * `--model_name` - the Replicate model name and hash, e.g. `"stability-ai/stablelm-tuned-alpha-7b:c49dae36"`
+* `--model_type replicate.InferenceEndpoint` (for private endpoints)
+* `--model_name` - username/model-name slug from the deployed endpoint, e.g. `elim/elims-llama2-7b`
 * set the `REPLICATE_API_TOKEN` environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in
 
-### cohere
+### Cohere
 
 * `--model_type cohere`
 * `--model_name` (optional, `command` by default) - The specific Cohere model you'd like to test
@@ -149,7 +154,15 @@ Recognised model types are whitelisted, because the plugin needs to know which s
 * `--model_name` - The path to the ggml model you'd like to load, e.g. `/home/leon/llama.cpp/models/7B/ggml-model-q4_0.bin`
 * set the `GGML_MAIN_PATH` environment variable to the path to your ggml `main` executable
 
-### test
+### OctoAI
+
+* `--model_type octo`
+* `--model_name` - the OctoAI public endpoint for the model, e.g. `mistral-7b-instruct-fp16`
+* `--model_type octo.InferenceEndpoint` (for private endpoints)
+* `--model_name` - the deployed endpoint URL, e.g. `https://llama-2-70b-chat-xxx.octoai.run/v1/chat/completions`
+* set the `OCTO_API_TOKEN` environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in
+
+### Test
 
 * `--model_type test`
 * (alternatively) `--model_name test.Blank`
@@ -160,53 +173,26 @@ For testing. This generator repeats back the prompt it received.
 
 ## Intro to probes
 
-### blank
-
-A simple probe that always sends an empty prompt.
-
-### continuation
-
-Probes that test if the model will continue a probably undesirable word
-
-### dan
-
-Various [DAN]() and DAN-like attacks
-
-### encoding
-
-Prompt injection through text encoding
-
-### malwaregen
-
-Attempts to have the model generate code for building malware
-
-### knownbadsignatures
-
-Probes that attempt to make the model output malicious content signatures
-
-### lmrc
-
-Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes
-
-### misleading
-
-Attempts to make a model support misleading and false claims
-
-### promptinject
-
-Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022)
-
-### realtoxicityprompts
-
-Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run)
-
-### snowball
-
-[Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process
-
-### art
-
-Auto Red-Team. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/leondz/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now).
+| Probe | Description |
+| --- | --- |
+| blank | A simple probe that always sends an empty prompt. |
+| atkgen | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https://huggingface.co/leondz/artgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |
+| continuation | Probes that test if the model will continue a probably undesirable word |
+| dan | Various [DAN](https://adguard.com/en/blog/chatgpt-dan-prompt-abuse.html) and DAN-like attacks |
+| encoding | Prompt injection through text encoding |
+| gcg | Disrupt a system prompt by appending an adversarial suffix. |
+| glitch | Probe model for glitch tokens that provoke unusual behavior. |
+| goodside | Implementations of Riley Goodside attacks. |
+| knownbadsignatures | Probes that attempt to make the model output malicious content signatures |
+| leakerplay | Evaluate if a model will replay training data. |
+| lmrc | Subsample of the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) probes |
+| malwaregen | Attempts to have the model generate code for building malware |
+| misleading | Attempts to make a model support misleading and false claims |
+| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages. |
+| promptinject | Implementation of the Agency Enterprise [PromptInject](https://github.com/agencyenterprise/PromptInject/tree/main/promptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |
+| realtoxicityprompts | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run) |
+| snowball | [Snowballed Hallucination](https://ofir.io/snowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |
+| xss | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfilteration. |
 
 ## Logging
 

diff --git a/garak/generators/huggingface.py b/garak/generators/huggingface.py
@@ -33,11 +33,9 @@
 class HFRateLimitException(Exception):
     pass
 
-
 class HFLoadingException(Exception):
     pass
 
-
 class HFInternalServerError(Exception):
     pass
 
@@ -106,7 +104,6 @@ def _call_model(self, prompt: str) -> List[str]:
         else:
             return [re.sub("^" + re.escape(prompt), "", i) for i in generations]
 
-
 class OptimumPipeline(Pipeline):
     """Get text generations from a locally-run Hugging Face pipeline using NVIDIA Optimum"""
 
@@ -149,7 +146,6 @@ def __init__(self, name, do_sample=True, generations=10, device=0):
             if _config.run.deprefix is True:
                 self.deprefix_prompt = True
 
-
 class InferenceAPI(Generator):
     """Get text generations from Hugging Face Inference API"""
 
@@ -256,7 +252,60 @@ def _call_model(self, prompt: str) -> List[str]:
     def _pre_generate_hook(self):
         self.wait_for_model = False
 
+class InferenceEndpoint(InferenceAPI):
+    """Interface for Hugging Face private endpoints
+    Pass the model URL as the name, e.g. https://xxx.aws.endpoints.huggingface.cloud
+    """
+
+    supports_multiple_generations = False
+    import requests
+
+    def __init__(self, name="", generations=10):
+        super().__init__(name, generations=generations)
+        self.api_url = name
 
+    @backoff.on_exception(
+        backoff.fibo,
+        (
+            HFRateLimitException,
+            HFLoadingException,
+            HFInternalServerError,
+            requests.Timeout,
+        ),
+        max_value=125,
+    )
+    def _call_model(self, prompt: str) -> List[str]:
+        import requests
+
+        payload = {
+            "inputs": prompt,
+            "parameters": {
+                "return_full_text": not self.deprefix_prompt,
+                "max_time": self.max_time,
+            },
+            "options": {
+                "wait_for_model": self.wait_for_model,
+            },
+        }
+        if self.max_tokens:
+            payload["parameters"]["max_new_tokens"] = self.max_tokens
+
+        if self.generations > 1:
+            payload["parameters"]["do_sample"] = True
+
+        response = requests.post(
+            self.api_url, 
+            headers=self.headers, 
+            json=payload
+        ).json()
+        try:
+            output = response[0]["generated_text"]
+        except:
+            raise IOError(
+                "Hugging Face 🤗 endpoint didn't generate a response. Make sure the endpoint is active."
+            )
+        return output        
+
 class Model(Generator):
     """Get text generations from a locally-run Hugging Face model"""
 

diff --git a/garak/generators/octo.py b/garak/generators/octo.py
@@ -16,18 +16,20 @@
 
 
 class OctoGenerator(Generator):
-    """Interface for OctoML models
+    """Interface for OctoAI public endpoints
 
-    Pass the model URL as the name, e.g. https://llama-2-70b-chat-demo-kk0powt97tmb.octoai.run/v1/chat/completions
-
-    This module tries to guess the internal model name in self.octo_model.
-    We don't have access to private model so don't know the format.
-    If garak guesses wrong, please please open a ticket.
+    Pass the model name as `name`, e.g. llama-2-13b-chat-fp16.
+    For more details, see https://octoai.cloud/tools/text.
     """
 
-    generator_family_name = "OctoML"
+    generator_family_name = "OctoAI"
     supports_multiple_generations = False
 
+    max_tokens = 128
+    presence_penalty = 0
+    temperature = 0.1
+    top_p = 1
+
     def __init__(self, name, generations=10):
         from octoai.client import Client
 
@@ -37,22 +39,57 @@ def __init__(self, name, generations=10):
         if hasattr(_config.run, "seed"):
             self.seed = _config.run.seed
 
-        self.octo_model = "-".join(
-            self.name.replace("-demo", "").replace("https://", "").split("-")[:-1]
-        )
-
         super().__init__(name, generations=generations)
 
-        if os.getenv("OCTO_API_KEY", default=None) is None:
+        octoai_token = os.getenv("OCTO_API_TOKEN", default=None)
+        if octoai_token is None:
             raise ValueError(
-                'Put the Replicate API token in the OCTOAI_TOKEN environment variable (this was empty)\n \
-                e.g.: export OCTOAI_TOKEN="kjhasdfuhasi8djgh"'
+                '🛑 Put the OctoAI API token in the OCTO_API_TOKEN environment variable (this was empty)\n \
+                e.g.: export OCTO_API_TOKEN="kjhasdfuhasi8djgh"'
             )
-        self.octoml = Client(token=os.getenv("OCTO_API_KEY", default=None))
+        self.client = Client(token=octoai_token)
 
     @backoff.on_exception(backoff.fibo, octoai.errors.OctoAIServerError, max_value=70)
     def _call_model(self, prompt):
-        outputs = self.octoml.infer(
+        outputs = self.client.chat.completions.create(
+            messages=[
+                {
+                    "role": "system",
+                    "content": "You are a helpful assistant. Keep your responses limited to one short paragraph if possible."
+                },
+                {
+                    "role": "user",
+                    "content": prompt
+                }
+            ],
+            model=self.name,
+            max_tokens=self.max_tokens,
+            presence_penalty=self.presence_penalty,
+            temperature=self.temperature,
+            top_p=self.top_p,
+        )
+
+        return outputs.choices[0].message.content
+
+class InferenceEndpoint(OctoGenerator):
+    """Interface for OctoAI private endpoints
+
+    Pass the model URL as the name, e.g. https://llama-2-70b-chat-xxx.octoai.run/v1/chat/completions
+
+    This module tries to guess the internal model name in self.octo_model.
+    We don't have access to private model so don't know the format.
+    If garak guesses wrong, please please open a ticket.
+    """
+
+    def __init__(self, name, generations=10):
+        super().__init__(name, generations=generations)
+        self.octo_model = "-".join(
+            self.name.replace("-demo", "").replace("https://", "").split("-")[:-1]
+        )
+
+    @backoff.on_exception(backoff.fibo, octoai.errors.OctoAIServerError, max_value=70)
+    def _call_model(self, prompt):
+        outputs = self.client.infer(
             endpoint_url=self.name,
             inputs={
                 "model": self.octo_model,
@@ -68,5 +105,4 @@ def _call_model(self, prompt):
         )
         return outputs.get("choices")[0].get("message").get("content")
 
-
 default_class = "OctoGenerator"
diff --git a/garak/generators/replicate.py b/garak/generators/replicate.py
@@ -21,7 +21,10 @@
 
 
 class ReplicateGenerator(Generator):
-    """Wrapper for the Replicate hosted models (replicate.com). Expects API key in REPLICATE_API_TOKEN environment variable."""
+    """
+    Interface for public endpoints of models hosted in Replicate (replicate.com).
+    Expects API key in REPLICATE_API_TOKEN environment variable.
+    """
 
     generator_family_name = "Replicate"
     temperature = 1
@@ -31,7 +34,7 @@ class ReplicateGenerator(Generator):
 
     def __init__(self, name, generations=10):
         self.name = name
-        self.fullname = f"Replicate {self.name}"
+        self.fullname = f"{self.generator_family_name} {self.name}"
         self.seed = 9
         if hasattr(_config.run, "seed"):
             self.seed = _config.run.seed
@@ -40,7 +43,7 @@ def __init__(self, name, generations=10):
 
         if os.getenv("REPLICATE_API_TOKEN", default=None) is None:
             raise ValueError(
-                'Put the Replicate API token in the REPLICATE_API_TOKEN environment variable (this was empty)\n \
+                '🛑 Put the Replicate API token in the REPLICATE_API_TOKEN environment variable (this was empty)\n \
                 e.g.: export REPLICATE_API_TOKEN="r8-123XXXXXXXXXXXX"'
             )
         self.replicate = importlib.import_module("replicate")
@@ -62,5 +65,32 @@ def _call_model(self, prompt):
         )
         return "".join(response_iterator)
 
-
+class InferenceEndpoint(ReplicateGenerator):
+    """
+    Interface for private Replicate endpoints.
+    Expects `name` in the format of `username/deployed-model-name`.
+    """
+    @backoff.on_exception(
+        backoff.fibo, replicate.exceptions.ReplicateError, max_value=70
+    )
+    def _call_model(self, prompt):
+        deployment = self.replicate.deployments.get(self.name)
+        prediction = deployment.predictions.create(
+            input={
+                "prompt": prompt,
+                "max_length": self.max_tokens,
+                "temperature": self.temperature,
+                "top_p": self.top_p,
+                "repetition_penalty": self.repetition_penalty,
+            },
+        )
+        prediction.wait()
+        try:
+            response = "".join(prediction.output)
+        except TypeError:
+            raise IOError(
+                "Replicate endpoint didn't generate a response. Make sure the endpoint is active."
+            )
+        return response
+
 default_class = "ReplicateGenerator"