Skip to content

Commit

Permalink
add faq items
Browse files Browse the repository at this point in the history
  • Loading branch information
leondz committed Aug 28, 2024
1 parent 97892fe commit 8a64a66
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Both 'a's like a in English "hat", or à in French, or /æ/ in IPA.

`garak` is designed to help discover situations where a language model generates outputs that one might not want it to. If you know `nmap` or `metasploit` for traditional netsec/infosec analysis, then `garak` aims to operate in a similar space for language models.

## How does it work?
## How does garak work?

`garak` has probes that try to look for different "vulnerabilities". Each probs sends specific prompts to models, and gets multiple generations for each prompt. LLM output is often stochastic, so a single test isn't very informative. These generations are then processed by "detectors", which will look for "hits". If a detector registers a hit, that attempt is registered as failing. Finally, a report is output with the success/failure rate for each probe and detector.

Expand All @@ -23,7 +23,7 @@ No. The scores from any probe don't operate on any kind of normalised scale. Hig

Each detector is different. Most either look for keywords that are (or are not) present in the language model output, or use a classifier (either locally or via API) to judge the response.

## Does garak allow for additional prompts
## Does garak allow for additional prompts ?

Additional prompts can be probed by creating a new plugin -- this isn't as tough as it sounds; take a look at the modules in the `garak/probes/` directory for inspiration.

Expand Down Expand Up @@ -63,6 +63,33 @@ The things garak probes for are generally not like traditional cybersec vulnerab

Some garak probes still check for traditional cybersecurity vulnerabilities within the scope of what can be extracted from APIs also used for inference.

## I tried to scan a model from HuggingFace, but for some reason, the process got killed when loading checkpoint shards. I ran the scan in my Jupyter notebook locally, the model had already been downloaded during a previous run. I couldn't get past 75% without the process being killed.

This sounds like hitting a resource limit - something external to garak, e.g. the kernel, has taken action. Does your process have access to the required system RAM and GPU memory

## How can I use garak to scan a NIM of an LLM? What should the "model_type" be? And how do we pass the NIM endpoint url to garak?

`model_type` should be "nim" for chat-type models (which is most of them - this selects the right class automatically. Then, set model_name to [organisation]/[model name] from [build.nvidia.com](https://build.nvidia.com) (the JSON example is authoritative). For example, `--model_type nim --model_name meta/llama-3.1-8b-instruct`. You will need to put the API key in the `NIM_API_KEY` environment variable, or in the config.

## If I have already scanned a model on HuggingFace, and I use the same model somewhere else, say in a container, is it necessary for me to scan the container with garak as well?

No, if the model is the same, you should get the same results - though there are some probes that scan the model files themself, which work on Hugging Face but not via a container.

## How can I scan a RAG pipeline with garak?

Currently the major attack we hear about in RAG systems is indirect prompt injection, and garak already scans for a few of those.

## There are so many probes in garak, I was trying to scan a model for all probes, but it took hours and I eventually had to kill that scan. What is the recommended practice on scanning a model? Which typical probes are recommended?

Recommended practice: it's really context dependent. The builtin "fast" config works pretty well (`--config fast`). It's also useful to run with `--parallel_attempts` (using a value of e.g. 20 or 40) if the model isn't local.

## Once a model is scanned, there is really no need to scan it again for the same probe(s) unless the model has been customized/finetuned?

We update garak by improving existing probes or adding new ones quite frequently, and so scores will go down over time - garak isn't a benchmark, and the more we learn about failures in LLMs, the harder garak gets. But if you're looking at a short period of just a month or two, then the scores will probably stay pretty much the same. We do not recommend relying on scores over six months old.




<!-- ## Why the name?
It's named after a smooth-talking, manipulative, persuasive, well-written character from a nineties TV series. Because we need tools like that to dissect LLM behavior. -->

0 comments on commit 8a64a66

Please sign in to comment.