generator: NIM #637

leondz · 2024-04-29T12:40:39Z

Add NIM generator support. This PR only covers NVIDIA-hosted NIMs

…sses

leondz · 2024-04-29T12:41:23Z

See https://build.nvidia.com/explore/reasoning for items to test on

garak/generators/nim.py

Signed-off-by: Leon Derczynski <[email protected]>

jmartin-tech

Based on having #645 landed we might refactor this initial support as OpenAICompatible and later extend to add a NemoLLMCompatible implementation.

There is a little more work needed beyond my comments here to enable, context_len for unknown models to fallback to None when not known to the tool since OpenAICompatible currently just rejects any model without a known context length in the garak.generators.openai.context_lengths dictionary.

See: https://github.com/leondz/garak/blob/075796f044c2877977383dcf561b5a54f865c009/garak/generators/openai.py#L135-L138

garak/generators/nim.py

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

… tests to catch handling behaviour

garak/generators/nim.py

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

jmartin-tech

Update look good, some nice to have items noted.

garak/generators/openai.py

jmartin-tech · 2024-05-06T21:42:21Z

tests/generators/test_nim.py

+@pytest.mark.skipif(
+    os.getenv(NVOpenAIChat.ENV_VAR, None) is None,
+    reason=f"NIM API key is not set in {NVOpenAIChat.ENV_VAR}",
+)


In a future update I would like to see these tests converted to sub in a mock response fixture when the ENV_VAR is not set.

Consider this tests, could assert the correct type is created by injecting a mock key similar to the test in test_ggml.py

https://github.com/leondz/garak/blob/77ebd8ab242b07dd9ec0d551154933831dbb8092/tests/generators/test_ggml.py#L12-L28

Integration level tests that actually call out to a service can provide a mock when no key was in the ENV at launch and intercept the request call via request-mock in this class otherwise allowing the code to still validate changes even when a true endpoint is not available.

That's a great idea, yes. I would also like to see this play better with generator internal api_keys, after the generator has validated / loaded its config, rather than going straight for the environment variable or _config. Tagging #589 as a note that this can be considered & addressed when handling that issue.

tests/generators/test_openai.py

garak/generators/base.py

jmartin-tech · 2024-05-06T22:19:40Z

garak/generators/test.py

Not something to address in this PR, however I am noticing this file is used for testing, I wonder if it is possible to move it into the tests path as a fixture only loaded during testing.

Is there added value in having this available all the time?

I was thinking the same. No, there's no added value - the primary purpose of the plugin package test modules is to help integrators and users, not to ensure code integrity.

garak/generators/nim.py

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

…ature/nim

…rators not supporting multiple generations gave a double-wrapped list; add checks

jmartin-tech

Functional testing passes however when executing the tests with a valid NIM_API_KEY in place tests/test_attempt.py reports an attempt to write to a file that no longer is closed.

======================================================================== FAILURES =========================================================================
_______________________________________________________________ test_attempt_sticky_params ________________________________________________________________

capsys = <_pytest.capture.CaptureFixture object at 0x33939e570>

    def test_attempt_sticky_params(capsys):
>       garak.cli.main(
            "-m test.Blank -g 1 -p atkgen,dan.Dan_6_0 --report_prefix _garak_test_attempt_sticky_params".split()
        )

tests/test_attempt.py:13:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
garak/cli.py:486: in main
    command.probewise_run(generator, probe_names, evaluator, buff_names)
garak/command.py:212: in probewise_run
    probewise_h.run(generator, probe_names, evaluator, buffs)
garak/harnesses/probewise.py:106: in run
    h.run(model, [probe], detectors, evaluator, announce_probe=False)
garak/harnesses/base.py:124: in run
    evaluator.evaluate(attempt_results)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <garak.evaluators.base.ThresholdEvaluator object at 0x3389ad370>, attempts = [<garak.attempt.Attempt object at 0x3394dbad0>]

    def evaluate(self, attempts: List[garak.attempt.Attempt]) -> None:
        """
        evaluate feedback from detectors
        expects a list of attempts that correspond to one probe
        outputs results once per detector
        """

        if len(attempts) == 0:
            logging.debug(
                "evaluators.base.Evaluator.evaluate called with 0 attempts, expected 1+"
            )
            return

        self.probename = attempts[0].probe_classname
        detector_names = attempts[0].detector_results.keys()

        for detector in detector_names:
            all_passes = []
            all_outputs = []
            for attempt in attempts:
                passes = [
                    1 if self.test(r) else 0
                    for r in map(float, attempt.detector_results[detector])
                ]
                all_passes += passes
                all_outputs += attempt.outputs
                for idx, score in enumerate(attempt.detector_results[detector]):
                    if not self.test(score):  # if we don't pass
                        if not _config.transient.hitlogfile:
                            if not _config.reporting.report_prefix:
                                hitlog_filename = f"{_config.reporting.report_dir}/garak.{_config.transient.run_id}.hitlog.jsonl"
                            else:
                                hitlog_filename = (
                                    _config.reporting.report_prefix + ".hitlog.jsonl"
                                )
                            logging.info("hit log in %s", hitlog_filename)
                            _config.transient.hitlogfile = open(
                                hitlog_filename,
                                "w",
                                buffering=1,
                                encoding="utf-8",
                            )

                        trigger = None
                        if "trigger" in attempt.notes:
                            trigger = attempt.notes["trigger"]
                        elif "triggers" in attempt.notes:
                            if (
                                isinstance(attempt.notes["triggers"], list)
                                and len(attempt.notes["triggers"]) == 1
                            ):  # a list of one can be reported just as a string
                                trigger = attempt.notes["triggers"][0]
                            else:
                                trigger = attempt.notes["triggers"]
>                       _config.transient.hitlogfile.write(
                            json.dumps(
                                {
                                    "goal": attempt.goal,
                                    "prompt": attempt.prompt,
                                    "output": attempt.outputs[idx],
                                    "trigger": trigger,
                                    "score": score,
                                    "run_id": str(_config.transient.run_id),
                                    "attempt_id": str(attempt.uuid),
                                    "attempt_seq": attempt.seq,
                                    "attempt_idx": idx,
                                    "generator": f"{_config.plugins.model_type} {_config.plugins.model_name}",
                                    "probe": self.probename,
                                    "detector": detector,
                                    "generations_per_prompt": _config.run.generations,
                                }
                            )
                            + "\n"  # generator,probe,prompt,trigger,result,detector,score,run id,attemptid,
                        )
E                       ValueError: I/O operation on closed file.

garak/evaluators/base.py:91: ValueError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------
garak LLM security probe v0.9.0.12.post1 ( https://github.com/leondz/garak ) at 2024-05-07T15:06:41.549252
📜 reporting to _garak_test_attempt_sticky_params.report.jsonl
🦜 loading generator: Test: Blank
🕵️  queue of probes: atkgen.Tox, dan.Dan_6_0
🔴🪖  🦜 loading generator: Hugging Face 🤗 pipeline: leondz/artgpt2tox
atkgen.Tox                                                                toxicity.ToxicCommentModel: PASS  ok on    2/   2
dan.Dan_6_0                                                                                  dan.DAN: PASS  ok on    1/   1
------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------

================================================================= short test summary info =================================================================
FAILED tests/test_attempt.py::test_attempt_sticky_params - ValueError: I/O operation on closed file.
================================================== 1 failed, 908 passed, 5 skipped in 191.22s (0:03:11) ===================================================

This consistently fails in local testing on my end. However executing the single test file in isolation passes:

% python -m pytest tests/test_attempt.py
=================================================================== test session starts ===================================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/jemartin/Projects/nvidia/garak
configfile: pyproject.toml
plugins: anyio-4.3.0, requests-mock-1.12.1
collected 1 item

tests/test_attempt.py .                                                                                                                             [100%]

==================================================================== 1 passed in 4.77s ====================================================================

And fails in the same manor when skipped nim tests execute first:

python -m pytest tests/generators/test_nim.py tests/test_attempt.py

This suggests there is some sort of data leakage or stale state in global config introduced by the skipped tests.

It seems reasonable to file a known issue with tests when NIM_API_KEY is in the env to let this move forward. Other efforts are planned to adjust _config scoping that may address this issue in #646.

leondz · 2024-05-08T05:32:23Z

Interesting. Thanks for writing this up. Yes, agree with your hypothesis.

The failure may be related to these - tagging for reference:

test order affects test success/speed. it shouldn't test order affects test success/speed. it shouldn't #473
_config object contains state not suitable for multiprocessing _config object contains state not suitable for multiprocessing #646
Hitlogfile write exception Hitlogfile write exception #599

If we're lucky, addressing #646 may clear up a bunch of these.

leondz · 2024-05-08T06:51:25Z

Now getting the exact same test fail error, also in test_attempt_sticky_params. Will prioritise fixing this for release milestone.

leondz added 3 commits April 12, 2024 14:21

stub for misinfo probe

751bb04

make base plugin inheritance check agnostic to order of inherited cla…

9fb2cea

…sses

use class url prefix

0147a9e

leondz added the generators Interfaces with LLMs label Apr 29, 2024

leondz linked an issue Apr 29, 2024 that may be closed by this pull request

generator: add NIM support #611

Closed

leondz requested a review from jmartin-tech April 29, 2024 12:40

leondz added 2 commits April 29, 2024 14:46

rm stray file

0470304

nim docs, rm stray changes

42edd40

jmartin-tech reviewed Apr 29, 2024

View reviewed changes

garak/generators/nim.py Outdated Show resolved Hide resolved

garak/generators/nim.py Outdated Show resolved Hide resolved

leondz changed the title ~~Feature/nim~~ generator: NIM Apr 30, 2024

leondz added the new plugin Describes an entirely new probe, detector, generator or harness label Apr 30, 2024

leondz commented Apr 30, 2024

View reviewed changes

garak/generators/nim.py Show resolved Hide resolved

leondz added 3 commits April 30, 2024 10:23

fix top_k

b724cec

Signed-off-by: Leon Derczynski <[email protected]>

add nim docs

d4cd309

flexible output extraction

5685be0

leondz requested a review from jmartin-tech May 2, 2024 10:41

jmartin-tech reviewed May 2, 2024

View reviewed changes

leondz and others added 12 commits May 3, 2024 17:14

add ENV_VAR constant

741620a

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

NVHostedNimGenerator inherits from OpenAICompatible

0b22475

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

Update garak/generators/nim.py

ebbf0f0

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

replace requests implementation w/ openai module one

1501363

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

use openai module for nims

fa0e85a

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

use stub endpoint uri

80d62a0

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

further prune nim implementation

f49718d

Merge branch 'main' into feature/nim

f8d3530

relocate openai-specific validation out of generic OAI API class

f910fcd

scope down exception in invalid oai model name test

a2ba7fc

factor out nim oai validation

09bf520

fix bug with generations that don't support multiple generations; add…

141629e

… tests to catch handling behaviour

leondz marked this pull request as draft May 6, 2024 15:45

jmartin-tech reviewed May 6, 2024

View reviewed changes

garak/generators/nim.py Outdated Show resolved Hide resolved

Fix typo in nim.py

f7a8f46

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

jmartin-tech reviewed May 6, 2024

View reviewed changes

leondz and others added 9 commits May 7, 2024 09:44

move verbose response to missing model_name into _load_client

c0f17ed

scope openai tests more tightly

ad9755d

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

don't update local var in generator.base.Generator.generate()

9f2bf22

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

rm sneaky shebang

877c5c2

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>

Merge branch 'feature/nim' of https://github.com/leondz/garak into fe…

4de8070

…ature/nim

clarify that test.Single's test string is a test string

452cf4e

fix bug where calling generate() with generations_this_call=1 on gene…

41f154e

…rators not supporting multiple generations gave a double-wrapped list; add checks

get exception phrase checks to match current openai validation logic

9ea99a4

error wasn't in nim.py; here are sanity checking tests & explanations

2d2d5a7

leondz marked this pull request as ready for review May 7, 2024 11:46

leondz marked this pull request as draft May 7, 2024 11:51

leondz added 3 commits May 7, 2024 14:05

improve helpfulness of message when nims called w/ no nim name

b5b64c1

Merge branch 'main' into feature/nim

43326cc

gate openai test on api key presence

ae1c3cd

leondz marked this pull request as ready for review May 7, 2024 13:12

leondz requested a review from jmartin-tech May 7, 2024 13:12

jmartin-tech reviewed May 7, 2024

View reviewed changes

leondz closed this May 8, 2024

github-actions bot locked and limited conversation to collaborators May 8, 2024

leondz reopened this May 8, 2024

leondz merged commit 1eb3602 into main May 8, 2024
13 checks passed

leondz deleted the feature/nim branch May 8, 2024 06:03

leondz restored the feature/nim branch May 8, 2024 09:46

leondz deleted the feature/nim branch May 8, 2024 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generator: NIM #637

generator: NIM #637

leondz commented Apr 29, 2024

leondz commented Apr 29, 2024 •

edited

Loading

jmartin-tech left a comment

jmartin-tech left a comment

jmartin-tech May 6, 2024

leondz May 7, 2024

jmartin-tech May 6, 2024

leondz May 7, 2024 •

edited

Loading

jmartin-tech left a comment •

edited

Loading

leondz commented May 8, 2024

leondz commented May 8, 2024 •

edited

Loading

generator: NIM #637

generator: NIM #637

Conversation

leondz commented Apr 29, 2024

leondz commented Apr 29, 2024 • edited Loading

jmartin-tech left a comment

Choose a reason for hiding this comment

jmartin-tech left a comment

Choose a reason for hiding this comment

jmartin-tech May 6, 2024

Choose a reason for hiding this comment

leondz May 7, 2024

Choose a reason for hiding this comment

jmartin-tech May 6, 2024

Choose a reason for hiding this comment

leondz May 7, 2024 • edited Loading

Choose a reason for hiding this comment

jmartin-tech left a comment • edited Loading

Choose a reason for hiding this comment

leondz commented May 8, 2024

leondz commented May 8, 2024 • edited Loading

leondz commented Apr 29, 2024 •

edited

Loading

leondz May 7, 2024 •

edited

Loading

jmartin-tech left a comment •

edited

Loading

leondz commented May 8, 2024 •

edited

Loading