-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 4bit BNB layers meta-device materialization #19150
Conversation
d341df3
to
f5be300
Compare
1ec5916
to
ef17e7a
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #19150 +/- ##
==========================================
- Coverage 83% 54% -29%
==========================================
Files 443 439 -4
Lines 36859 36984 +125
==========================================
- Hits 30539 19940 -10599
- Misses 6320 17044 +10724 |
@@ -37,7 +39,8 @@ | |||
|
|||
log = logging.getLogger(__name__) | |||
|
|||
_BITSANDBYTES_AVAILABLE = RequirementCache("bitsandbytes>=0.41.0") | |||
# TODO: unpin after resolving the `quant_state` format breaking changes | |||
_BITSANDBYTES_AVAILABLE = RequirementCache("bitsandbytes==0.41.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still new to this repo, so maybe I misunderstood something.
Not sure how it can work.
In requirements/pytorch/extra.txt
the BNB is fixed to 0.41.1.
The same goes to tests: skipif
will always return True. That's why these tests are skipped.
These are GPU tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh great catch. I messed upthe requirements, I'll open a PR
What does this PR do?
Adds support for converting Linear layers on meta-device to Bitsandbytes Linear layers.
Also for materializing and quantizing Bitsandbytes Linear layers on meta-device.
Scenario 1
Cons:
to_empty
andreset_parameters
(called frommaterialize_meta_tensors
) will create empty tensors and quantize. There's no way to avoid this unless we makematerialize_meta_tensors
aware of bitsandbytes.load_checkpoint
will again quantize weights that were already quantized during materialization.Scenario 2
I believe this is the ideal scenario.
Scenario 3
Cons:
fabric.setup
will need to recreate layers so the model hooks are lost.to_empty
andreset_parameters
(called frommaterialize_meta_tensors
) will both create empty tensors and quantize. There's no way to avoid this unless we makematerialize_meta_tensors
aware of bitsandbytes.Scenario 4
Cons:
8-bit layer materialization is not implemented. I only made the minimal changes required for it.
📚 Documentation preview 📚: https://pytorch-lightning--19150.org.readthedocs.build/en/19150/
cc @Borda @carmocca @justusschock @awaelchli