Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generator: NIM #637

Merged
merged 34 commits into from
May 8, 2024
Merged

generator: NIM #637

merged 34 commits into from
May 8, 2024

Conversation

leondz
Copy link
Collaborator

@leondz leondz commented Apr 29, 2024

Add NIM generator support. This PR only covers NVIDIA-hosted NIMs

@leondz leondz added the generators Interfaces with LLMs label Apr 29, 2024
@leondz leondz linked an issue Apr 29, 2024 that may be closed by this pull request
@leondz
Copy link
Collaborator Author

leondz commented Apr 29, 2024

See https://build.nvidia.com/explore/reasoning for items to test on

garak/generators/nim.py Outdated Show resolved Hide resolved
garak/generators/nim.py Outdated Show resolved Hide resolved
@leondz leondz changed the title Feature/nim generator: NIM Apr 30, 2024
@leondz leondz added the new plugin Describes an entirely new probe, detector, generator or harness label Apr 30, 2024
@leondz leondz requested a review from jmartin-tech May 2, 2024 10:41
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on having #645 landed we might refactor this initial support as OpenAICompatible and later extend to add a NemoLLMCompatible implementation.

There is a little more work needed beyond my comments here to enable, context_len for unknown models to fallback to None when not known to the tool since OpenAICompatible currently just rejects any model without a known context length in the garak.generators.openai.context_lengths dictionary.

See: https://github.com/leondz/garak/blob/075796f044c2877977383dcf561b5a54f865c009/garak/generators/openai.py#L135-L138

garak/generators/nim.py Outdated Show resolved Hide resolved
garak/generators/nim.py Show resolved Hide resolved
garak/generators/nim.py Outdated Show resolved Hide resolved
garak/generators/nim.py Outdated Show resolved Hide resolved
garak/generators/nim.py Outdated Show resolved Hide resolved
garak/generators/nim.py Outdated Show resolved Hide resolved
@leondz leondz marked this pull request as draft May 6, 2024 15:45
Co-authored-by: Jeffrey Martin <[email protected]>
Signed-off-by: Leon Derczynski <[email protected]>
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update look good, some nice to have items noted.

garak/generators/openai.py Outdated Show resolved Hide resolved
Comment on lines +11 to +14
@pytest.mark.skipif(
os.getenv(NVOpenAIChat.ENV_VAR, None) is None,
reason=f"NIM API key is not set in {NVOpenAIChat.ENV_VAR}",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future update I would like to see these tests converted to sub in a mock response fixture when the ENV_VAR is not set.

Consider this tests, could assert the correct type is created by injecting a mock key similar to the test in test_ggml.py

https://github.com/leondz/garak/blob/77ebd8ab242b07dd9ec0d551154933831dbb8092/tests/generators/test_ggml.py#L12-L28

Integration level tests that actually call out to a service can provide a mock when no key was in the ENV at launch and intercept the request call via request-mock in this class otherwise allowing the code to still validate changes even when a true endpoint is not available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, yes. I would also like to see this play better with generator internal api_keys, after the generator has validated / loaded its config, rather than going straight for the environment variable or _config. Tagging #589 as a note that this can be considered & addressed when handling that issue.

tests/generators/test_openai.py Outdated Show resolved Hide resolved
garak/generators/base.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to address in this PR, however I am noticing this file is used for testing, I wonder if it is possible to move it into the tests path as a fixture only loaded during testing.

Is there added value in having this available all the time?

Copy link
Collaborator Author

@leondz leondz May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same. No, there's no added value - the primary purpose of the plugin package test modules is to help integrators and users, not to ensure code integrity.

garak/generators/nim.py Outdated Show resolved Hide resolved
@leondz leondz marked this pull request as ready for review May 7, 2024 11:46
@leondz leondz marked this pull request as draft May 7, 2024 11:51
@leondz leondz marked this pull request as ready for review May 7, 2024 13:12
@leondz leondz requested a review from jmartin-tech May 7, 2024 13:12
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional testing passes however when executing the tests with a valid NIM_API_KEY in place tests/test_attempt.py reports an attempt to write to a file that no longer is closed.

======================================================================== FAILURES =========================================================================
_______________________________________________________________ test_attempt_sticky_params ________________________________________________________________

capsys = <_pytest.capture.CaptureFixture object at 0x33939e570>

    def test_attempt_sticky_params(capsys):
>       garak.cli.main(
            "-m test.Blank -g 1 -p atkgen,dan.Dan_6_0 --report_prefix _garak_test_attempt_sticky_params".split()
        )

tests/test_attempt.py:13:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
garak/cli.py:486: in main
    command.probewise_run(generator, probe_names, evaluator, buff_names)
garak/command.py:212: in probewise_run
    probewise_h.run(generator, probe_names, evaluator, buffs)
garak/harnesses/probewise.py:106: in run
    h.run(model, [probe], detectors, evaluator, announce_probe=False)
garak/harnesses/base.py:124: in run
    evaluator.evaluate(attempt_results)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <garak.evaluators.base.ThresholdEvaluator object at 0x3389ad370>, attempts = [<garak.attempt.Attempt object at 0x3394dbad0>]

    def evaluate(self, attempts: List[garak.attempt.Attempt]) -> None:
        """
        evaluate feedback from detectors
        expects a list of attempts that correspond to one probe
        outputs results once per detector
        """

        if len(attempts) == 0:
            logging.debug(
                "evaluators.base.Evaluator.evaluate called with 0 attempts, expected 1+"
            )
            return

        self.probename = attempts[0].probe_classname
        detector_names = attempts[0].detector_results.keys()

        for detector in detector_names:
            all_passes = []
            all_outputs = []
            for attempt in attempts:
                passes = [
                    1 if self.test(r) else 0
                    for r in map(float, attempt.detector_results[detector])
                ]
                all_passes += passes
                all_outputs += attempt.outputs
                for idx, score in enumerate(attempt.detector_results[detector]):
                    if not self.test(score):  # if we don't pass
                        if not _config.transient.hitlogfile:
                            if not _config.reporting.report_prefix:
                                hitlog_filename = f"{_config.reporting.report_dir}/garak.{_config.transient.run_id}.hitlog.jsonl"
                            else:
                                hitlog_filename = (
                                    _config.reporting.report_prefix + ".hitlog.jsonl"
                                )
                            logging.info("hit log in %s", hitlog_filename)
                            _config.transient.hitlogfile = open(
                                hitlog_filename,
                                "w",
                                buffering=1,
                                encoding="utf-8",
                            )

                        trigger = None
                        if "trigger" in attempt.notes:
                            trigger = attempt.notes["trigger"]
                        elif "triggers" in attempt.notes:
                            if (
                                isinstance(attempt.notes["triggers"], list)
                                and len(attempt.notes["triggers"]) == 1
                            ):  # a list of one can be reported just as a string
                                trigger = attempt.notes["triggers"][0]
                            else:
                                trigger = attempt.notes["triggers"]
>                       _config.transient.hitlogfile.write(
                            json.dumps(
                                {
                                    "goal": attempt.goal,
                                    "prompt": attempt.prompt,
                                    "output": attempt.outputs[idx],
                                    "trigger": trigger,
                                    "score": score,
                                    "run_id": str(_config.transient.run_id),
                                    "attempt_id": str(attempt.uuid),
                                    "attempt_seq": attempt.seq,
                                    "attempt_idx": idx,
                                    "generator": f"{_config.plugins.model_type} {_config.plugins.model_name}",
                                    "probe": self.probename,
                                    "detector": detector,
                                    "generations_per_prompt": _config.run.generations,
                                }
                            )
                            + "\n"  # generator,probe,prompt,trigger,result,detector,score,run id,attemptid,
                        )
E                       ValueError: I/O operation on closed file.

garak/evaluators/base.py:91: ValueError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------
garak LLM security probe v0.9.0.12.post1 ( https://github.com/leondz/garak ) at 2024-05-07T15:06:41.549252
📜 reporting to _garak_test_attempt_sticky_params.report.jsonl
🦜 loading generator: Test: Blank
🕵️  queue of probes: atkgen.Tox, dan.Dan_6_0
🔴🪖  🦜 loading generator: Hugging Face 🤗 pipeline: leondz/artgpt2tox
atkgen.Tox                                                                toxicity.ToxicCommentModel: PASS  ok on    2/   2
dan.Dan_6_0                                                                                  dan.DAN: PASS  ok on    1/   1
------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------

================================================================= short test summary info =================================================================
FAILED tests/test_attempt.py::test_attempt_sticky_params - ValueError: I/O operation on closed file.
================================================== 1 failed, 908 passed, 5 skipped in 191.22s (0:03:11) ===================================================

This consistently fails in local testing on my end. However executing the single test file in isolation passes:

% python -m pytest tests/test_attempt.py
=================================================================== test session starts ===================================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/jemartin/Projects/nvidia/garak
configfile: pyproject.toml
plugins: anyio-4.3.0, requests-mock-1.12.1
collected 1 item

tests/test_attempt.py .                                                                                                                             [100%]

==================================================================== 1 passed in 4.77s ====================================================================

And fails in the same manor when skipped nim tests execute first:

python -m pytest tests/generators/test_nim.py tests/test_attempt.py

This suggests there is some sort of data leakage or stale state in global config introduced by the skipped tests.

It seems reasonable to file a known issue with tests when NIM_API_KEY is in the env to let this move forward. Other efforts are planned to adjust _config scoping that may address this issue in #646.

@leondz
Copy link
Collaborator Author

leondz commented May 8, 2024

Interesting. Thanks for writing this up. Yes, agree with your hypothesis.

The failure may be related to these - tagging for reference:

If we're lucky, addressing #646 may clear up a bunch of these.

@leondz leondz closed this May 8, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 8, 2024
@leondz leondz reopened this May 8, 2024
@leondz leondz merged commit 1eb3602 into main May 8, 2024
13 checks passed
@leondz leondz deleted the feature/nim branch May 8, 2024 06:03
@leondz
Copy link
Collaborator Author

leondz commented May 8, 2024

Now getting the exact same test fail error, also in test_attempt_sticky_params. Will prioritise fixing this for release milestone.

@leondz leondz restored the feature/nim branch May 8, 2024 09:46
@leondz leondz deleted the feature/nim branch May 8, 2024 09:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
generators Interfaces with LLMs new plugin Describes an entirely new probe, detector, generator or harness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generator: add NIM support
3 participants