Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Model with nested all-linear target modules #2391

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BenjaminBossan
Copy link
Member

Resolves #2390

There was a bug in PEFT when adding a LoRA adapter with target_modules='all-linear' (e.g. via add_adapter) to a model that already had LoRA adapters applied. The resolution of 'all-linear' would result in, for instance, lora_A and lora_B being targeted, leading to nested LoRA adapters. With this fix, this is prevented and the correct layers will be targeted.

Resolves huggingface#2390

There was a bug in PEFT when adding a LoRA adapter with
target_modules='all-linear' (e.g. via add_adapter) to a model that
already had LoRA adapters applied. The resolution of 'all-linear' would
result in, for instance, lora_A and lora_B being targeted, leading to
nested LoRA adapters. With this fix, this is prevented and the correct
layers will be targeted.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

for suffix, child in module.named_modules():
if suffix:
module_names_to_exclude.add(f"{prefix}.{suffix}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we subtract all the base tuner layers anyway, why gather the linear ones explicitly in the first place and not just extract all the base tuner layers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I fully understand your comment. Here is what happens:

1: Old code

    for name, module in model.named_modules():
        if isinstance(module, linear_classes):
            linear_module_names.add(name)

Adds all linear modules, which is mostly correct, except that if LoRA is already applied, it will also add lora_A, lora_B, base_layer, etc.

2: New code

        elif isinstance(module, BaseTunerLayer) and any(n in type(module).__name__ for n in linear_names):
            linear_module_names.add(name)

Here we check for lora.Linear etc., which would be present if LoRA is already applied, and add those too, since those are the targets we want (as they replace the original nn.Linear).

3: New code

    for prefix, module in model.named_modules():
        if isinstance(module, BaseTunerLayer):
            for suffix, child in module.named_modules():
                if suffix:
                    module_names_to_exclude.add(f"{prefix}.{suffix}")

Here we remove the possibly present, wrongly added lora_A, lora_B, base_layer, etc.

In summary, adding 3 is the fix to accidentally targeting nested nn.Linear layers as originally reported in #2390. The addition of 2 was necessary because otherwise, we would not actually update the lora.Linear (et al.) layers when adding a second adapter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Using 2 LoRA configs with target_modules='all-linear' leads to nested LoRA layers
3 participants