Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor huggingface config support #742

Merged

Conversation

jmartin-tech
Copy link
Collaborator

@jmartin-tech jmartin-tech commented Jun 17, 2024

Fix #672
Fix #589

Provide huggingface Pipeline and LLaVA configuration values as hf_args that can be passed via the yaml or json generator configuration paths.

  • do not override config deprefix_prompt
  • improve code reuse
    • consolidate __init__ where possible
    • shift generator or model object creation to _load_client()
    • Wrap errors as GarakException vs Exception
  • crude implementation of limitation on parallel generator call
  • add torch mps support
    • detect cuda vs mps vs cpu in a common way
    • guard import of OptimimPipeline
  • enable hf model or pipeline config in hf_args
    • support all generic pipeline args at all times
    • adds do_sample when model is a parameter to the Callable
    • adds low_cpu_mem_usage and all pipeline arguments for Callables without model
    • consolidates optimal device selection & set when not provided by config

Note: While hugggingface.Model not extends Pipeline due to how the model is loaded the only hf_arg currently passed is hf_args["device"] more work is needed to support more

Testing of each class type is needed and should likely be baselined code prior to #711 as few recommended possible testing configurations used on various hardware:

python -m garak -m huggingface.Model -n meta-llama/Llama-2-7b-chat-hf --config fast -g 1
python -m garak --model_type huggingface --model_name gpt2 --config fast -g 1
python -m garak --model_type huggingface.LLaVA --model_name llava-hf/llava-v1.6-mistral-7b-hf -p visual_jailbreak.FigStepTiny

Example configuration yaml:

plugins:
  generators:
    huggingface:
      hf_args:
        torch_dtype: float16
      Pipeline:
        hf_args:
            device: cuda

When no device is specified by configuration garak will now select cuda / mps / cpu depending on environment availability.

* consolidate `__init__` where possible
* shift generator or model object creation to `_load_client()`

Signed-off-by: Jeffrey Martin <[email protected]>
* detect cuda vs mps vs cpu in a common way
* guard import of OptimimPipeline

Signed-off-by: Jeffrey Martin <[email protected]>
* support all generic `pipeline` args at all times
* adds `do_sample` when `model` is a parameter to the `Callable`
* adds `low_cpu_mem_usage` and all `pipeline` for `Callables` without `model`
* consolidates optimal device selection & set when not provided by config

Signed-off-by: Jeffrey Martin <[email protected]>
@jmartin-tech jmartin-tech force-pushed the refactor/huggingface-config-support branch from 667fcc9 to 1fca119 Compare June 18, 2024 14:16
@jmartin-tech jmartin-tech force-pushed the refactor/huggingface-config-support branch from c14fecd to f4d77b6 Compare June 18, 2024 14:48
Copy link
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall -- left a few comments throughout. Great work!

@@ -50,53 +52,134 @@ def _set_hf_context_len(self, config):
if isinstance(config.n_ctx, int):
self.context_len = config.n_ctx

def _gather_hf_params(self, hf_constructor: Callable):
# this may be a bit too naive as it will pass any parameter valid for the pipeline signature
# this falls over when passed `from_pretrained` methods as the callable model params are not explicit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth catching this and gathering those params differently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mitigated by adding Pipeline params when model is not a parameter of the callable, future iterations may suggest adding a supported set of params so I left the notes here to offer details on why the code adds more parameters in some cases.

garak/generators/huggingface.py Show resolved Hide resolved
garak/generators/huggingface.py Show resolved Hide resolved
garak/generators/huggingface.py Show resolved Hide resolved
Signed-off-by: Jeffrey Martin <[email protected]>
Copy link
Owner

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. looking good. bunch of questions & clarification requests, mostly about making me smarter

@@ -27,6 +27,7 @@ class Generator(Configurable):

active = True
generator_family_name = None
parallel_capable = True
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tagging #361

garak/generators/huggingface.py Show resolved Hide resolved
garak/generators/huggingface.py Show resolved Hide resolved
selected_device = None
if self.hf_args["device"] is not None:
if isinstance(self.hf_args["device"], int):
# this assumes that indexed only devices selections means `cuda`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tagging #671, #672

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this largely resolves #672.
#671 will require testing on an actual device, but specification of device="cuda" should be all that's necessary for ROCm with HF.

garak/generators/huggingface.py Show resolved Hide resolved
garak/generators/huggingface.py Show resolved Hide resolved
Comment on lines -8 to +22
g = garak.generators.huggingface.Pipeline("gpt2")
gen_config = {
"huggingface": {
"Pipeline": {
"name": "gpt2",
"hf_args": {
"device": "cpu",
},
}
}
}
config_root = GarakSubConfig()
setattr(config_root, "generators", gen_config)

g = garak.generators.huggingface.Pipeline("gpt2", config_root=config_root)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: this is a working but i guess relatively inelegant pattern for getting a generator up.

  1. What happens if the config setting is skipped?
  2. Is the name parameter in Pipeline's constructor redundant? How does it relate to the name in gen_config - which takes precedence?
  3. why device cpu and not auto?

Copy link
Collaborator Author

@jmartin-tech jmartin-tech Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions:

  1. If config is skipped the new auto detection will load the most efficient device available. On macOS this turns out to be mps and actually causes this class to fail construction due to incomplete support for MPS in torch at this time.
  2. name duplication is redundant, currently name from the constructor is held over name in config_root however either could be removed for this test.
  3. See answer to 1, cpu is a guaranteed fully supported value.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's good to know.

  1. I think we probably don't want to ship logic that breaks generators by default on macOS - did I read this right?
  2. Sounds like there is some tidying up to do. I would love for the default config to run reasonably well out of the box, especially with Pipeline+gpt2, which is used in so many examples because it has a high chance of fitting in memory and going quickly on any machine. Requiring a setattr() gives an impression of "hotwiring". I'll think on this more - I haven't had time to play with the Feature: configurable plugins #711 config updates in depth much.
  3. OK, that's cool :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. mps does not break on macOS by default as gpt2 does work, I have tested it will load and complete a fast config, however this test needs to be consistent for all runs so forcing cpu was a reasonable solution. This test in particular results in the error below when asking for generation on "" on mps, the error seemed like a clear message to the user since overriding the device can be a straight forward config setting now.
ERROR    root:huggingface.py:203 Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device.
  1. The setattr() here could be replaced with a simple dictionary as both are supported by config I just followed more closely what cli will generate for as that is the more specific object type.
config_root = {
        "generators": {    
            "huggingface": {
                "Pipeline": {
                    "name": "gpt2",
                    "hf_args": {
                        "device": "cpu",
                    },
                }
            }
        }
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generations on empty strings frequently throw errors. Do we have a test for this? (I could easily verify this myself)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't test all generators for empty string behavior by default because they each their own unique special snowflake style (making it a schlep) and often require an API key (making it an expensive pain). I tried to write this one day and discovered it was slightly less feasible than imagined - the new config setup might make it easier now. There are a few tests in place, but it's incomplete.

Comment on lines -38 to +42
gpu: 1
hf_args:
torch_dtype: float16
device: cuda:1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come gpu: 1 changed to cuda:1 here...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are simply example format being documented in the test file. They are not actually consumed values in the tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we increase consistency to make the example more straightforward?

Copy link
Collaborator Author

@jmartin-tech jmartin-tech Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest the goal of the example is to show what you might configure differently depending on context.

The example here would define a model under test that is a huggingface.Pipeline would used any cuda devices while a huggingface generator created for use by the test.Blank detector would specify to use cuda:1 for it's device, and a huggingface generator created for use by the test.Blank buff would use device cuda:0. This would mean the model under tests is free to use any available device while the detector and buff would use specific different GPU devices from each other.

I think a more concrete or viable example might be on the horizo. This level of detail in config is likely rare for users to need and @erickgalinkin may be the only actual consumer for this level for the near term. 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels right -- I suspect the overwhelming majority of users will not need to configure things like this, but it's good to have. I'll likely need to refactor parts of AutoDAN and/or TAP as a result of this PR being merged and it's plausible that we'll need this level of granularity in configs, particularly for folks using resource-constrained systems.

Comment on lines -46 to +51
gpu: 1
hf_args:
device: cuda:0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and gpu: 1 changed to cuda:0 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu was a theoretical value device aligns with actual huggingface expectations.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. demonstrating variation is good. otoh I think the example could be confusing with gpu: 1 referring to both cuda: 0 and cuda: 1 within sight of each other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final result of the edits is removal of all gpu: entries and only provides huggingface examples using the device key.

tests/test_configurable.py Show resolved Hide resolved
* raise error when passed negative device integer
* rename parameter tracking var
* remove unused import
* add tests for `_select_hf_device()`

Signed-off-by: Jeffrey Martin <[email protected]>
@jmartin-tech jmartin-tech requested a review from leondz June 26, 2024 15:36
Copy link
Owner

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks!

@jmartin-tech jmartin-tech merged commit 21dd343 into leondz:main Jun 27, 2024
4 checks passed
@jmartin-tech jmartin-tech deleted the refactor/huggingface-config-support branch June 27, 2024 13:22
@github-actions github-actions bot locked and limited conversation to collaborators Jun 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support for Metal/MPS Configurable Generator parameters
3 participants