Consider top-level buffers when computing `infer_auto_device_map` #792

younesbelkada · 2022-10-26T23:49:33Z

What does this PR do?

This PR adds list(model._buffers) inside modules_to_treat when computing the auto_device_map. This scenario occured when I tried to add accelerate support for BART-like models when the final_logits_bias is registered as a buffer and is different than a uint type. It seems that we need to assign a device to this buffer.

The other solution is to "force-ignore" the buffer in check_device_map here since the tensors that are in model._buffers are stored in the state_dict.

cc @sgugger @muellerzr

slow tests from tests/test_bigmodeling.py pass!

HuggingFaceDocBuilderDev · 2022-10-26T23:53:00Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Can you explain how some buffers don't end up in model._buffers because I don't fully understand that part.

younesbelkada · 2022-10-27T13:44:17Z

So if I understood it correctly, if you have some modules such as nn.BatchNorm in your model (as it is done in the accelerate CI test), the buffers running_mean and running_var will not be stored inside model._buffers but in model.named_buffers(). That is why I had to "filter out" the buffers by considering only the ones that are inside model._buffers and model.named_buffers()

Here is an example that I have quickly tried:

model = nn.Sequential(nn.Linear(1, 1), nn.BatchNorm1d(1), nn.Embedding(1, 1), nn.LayerNorm(1)) 
print(list(model.named_buffers()))
>>>[('1.running_mean', tensor([0.])), ('1.running_var', tensor([1.])), ('1.num_batches_tracked', tensor(0))]
print(list(model._buffers))
>>> []
model.register_buffer("position_bias", torch.ones(1))
print(list(model._buffers))
>>> ['position_bias']
print(list(model.named_buffers()))
>>> [('position_bias', tensor([1.])), ('1.running_mean', tensor([0.])), ('1.running_var', tensor([1.])), ('1.num_batches_tracked', tensor(0))]

sgugger · 2022-10-27T14:55:07Z

I think in this case, it's just the difference between named_buffers(recurse=True) and named_buffers(recurse=False). I'm not convinced this fix is the right fix, so would like to learn more what is failing.

younesbelkada · 2022-10-27T15:28:40Z

Ah yes I see, you're probably right here! Let me dig a bit more and get back to you here

younesbelkada · 2022-10-27T16:14:00Z

@sgugger I might have more clue on what is failing
I think that the problem comes from the fact that the infer_auto_device_map does take into account only modules and submodules, I have made a script below to better illustrate the problem

import torch.nn as nn
import torch
from accelerate.utils import infer_auto_device_map
from accelerate.big_modeling import dispatch_model

class SubModule(nn.Module):
    def __init__(self):
        super().__init__()

        self.register_buffer("position_bias", torch.ones(1, 1000))

class Model(nn.Module):
    def __init__(self, wrap_module=True):
        super().__init__()
        self.l1 = nn.Linear(1000, 1000)
        self.l2 = nn.Linear(1000, 1000)
        self.l3 = nn.Linear(1000, 1000)

        self.bn1 = nn.BatchNorm1d(1000)
        self.bn2 = nn.BatchNorm1d(1000)

        if wrap_module:
            self.position_bias = SubModule()
        else:
            self.register_buffer("position_bias", torch.ones(1, 1000))

# Test 1: wrapping with a module - this will pass
model = Model()
device_map = infer_auto_device_map(model, {0:"10MB", "cpu":"100MB"})
model = dispatch_model(model, device_map)

# Test 2: below will fail
model = Model(wrap_module=False)
device_map = infer_auto_device_map(model, {0:"10MB", "cpu":"100MB"})
model = dispatch_model(model, device_map)

Let me know what do you think!

I guess this failed for BartPreTrainedModel since the position_bias buffer is on the parent module itself

sgugger · 2022-10-27T16:18:46Z

Ah, in this case it looks very much like the problem #747 fixed for top-level parameters, so the fix should be pretty similar here too!

- use `model.named_buffers(recurse=False)` instead Co-authored-by: Sylvain Gugger <[email protected]>

sgugger

Perfect, thanks!

younesbelkada · 2022-10-27T21:13:34Z

The whole testing suite (including slow tests) is green! 🟢 Merging !

add buffers support when computing infer_auto_device_map

ec5abee

younesbelkada added 2 commits October 27, 2022 08:52

should fix broken test

db6ab8a

fix broken test

30e44e2

younesbelkada changed the title ~~Add buffers support when computing infer_auto_device_map~~ Add model._buffers support when computing infer_auto_device_map Oct 27, 2022

younesbelkada requested review from muellerzr and sgugger October 27, 2022 09:40

sgugger reviewed Oct 27, 2022

View reviewed changes

simpler solution

771f35b

- use `model.named_buffers(recurse=False)` instead Co-authored-by: Sylvain Gugger <[email protected]>

younesbelkada changed the title ~~Add model._buffers support when computing infer_auto_device_map~~ Consider top-level buffer when computing infer_auto_device_map Oct 27, 2022

younesbelkada changed the title ~~Consider top-level buffer when computing infer_auto_device_map~~ Consider top-level buffers when computing infer_auto_device_map Oct 27, 2022

forward contrib credits from suggestion

20923ca

younesbelkada mentioned this pull request Oct 27, 2022

Add accelerate support for BART-like models huggingface/transformers#19927

Merged

sgugger approved these changes Oct 27, 2022

View reviewed changes

younesbelkada merged commit 415b738 into huggingface:main Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider top-level buffers when computing `infer_auto_device_map` #792

Consider top-level buffers when computing `infer_auto_device_map` #792

Uh oh!

younesbelkada commented Oct 26, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 26, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

younesbelkada commented Oct 27, 2022 •

edited

Loading

Uh oh!

sgugger commented Oct 27, 2022

Uh oh!

younesbelkada commented Oct 27, 2022

Uh oh!

younesbelkada commented Oct 27, 2022 •

edited

Loading

Uh oh!

sgugger commented Oct 27, 2022

Uh oh!

sgugger left a comment

Uh oh!

younesbelkada commented Oct 27, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Consider top-level buffers when computing infer_auto_device_map #792

Consider top-level buffers when computing infer_auto_device_map #792

Uh oh!

Conversation

younesbelkada commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Oct 27, 2022

Uh oh!

younesbelkada commented Oct 27, 2022

Uh oh!

younesbelkada commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Oct 27, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Consider top-level buffers when computing `infer_auto_device_map` #792

Consider top-level buffers when computing `infer_auto_device_map` #792

younesbelkada commented Oct 26, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 26, 2022 •

edited

Loading

younesbelkada commented Oct 27, 2022 •

edited

Loading

younesbelkada commented Oct 27, 2022 •

edited

Loading

younesbelkada commented Oct 27, 2022 •

edited

Loading