Making wrapper tensor subclass to work in serialization #2440

jerryzh168 · 2024-08-06T21:16:45Z

Summary:
huggingface_hub seriliazation relies on storage_ptr of a tensor to implement sharding logic, but wrapper_tensor_subclass does not have storage, so we unflatten the tensor and get storage_id by returning a tuple constructed from all storage_ids from internal plain or tensor subclassed tensors

Note: This PR only supported non-safetensor serialization for tensor subclasses

Test Plan:
tested with script in huggingface/transformers#32364

Reviewers:

Subscribers:

Tasks:

Tags:

jerryzh168 · 2024-08-08T21:09:13Z

@SunMarc can you find someone to review this?

SunMarc · 2024-08-09T01:43:29Z

@SunMarc can you find someone to review this?

Yes ! Thanks for your work ! I've asked @Wauplin to review. It would be nice to have a few tests if possible !

…on (non-safetensor) Summary: huggingface_hub seriliazation relies on storage_ptr of a tensor to implement sharding logic, but wrapper_tensor_subclass does not have storage, so we unflatten the tensor and get storage_id from adding all storage_ids from internal plain tensors, this is a bit hacky, open to more robust ideas. Test Plan: tested with script in huggingface/transformers#32364 Reviewers: Subscribers: Tasks: Tags:

Summary: Similar to huggingface/huggingface_hub#2440 we want to allow safetensor to handle wrapper tensor subclasses, we mainly added: 1. tensor storage size: this is done through flattening the wrapper tensor subclass and add up the storage size of all sub tensors recursively 2. storage_ptr: this is done by constructing a tuple given the "storage_ptr" for flattened tensors, this could be a nested tuple of tuple of int, e.g. ((1, 2), 3, (4, (5, 6),),) Test Plan: Added a test in test_pt_model.py, will also test manually Reviewers: Subscribers: Tasks: Tags:

Wauplin · 2024-08-12T15:12:39Z

Hi @jerryzh168, thanks for the opening this PR! And thanks @SunMarc for pulling me in this convo.

The general logic of this PR looks ok to me -even though I'm missing some broader context I think-. The tests look good as well. However, as a huggingface_hub maintainer, what I'm the most afraid of is to rely too heavily on pytorch internals. Typically from torch.utils._python_dispatch import is_traceable_wrapper_subclass and tensor.__tensor_flatten__() are relying on private / internal methods that I'm not familiar with. It will become harder to maintain in the future especially if we want to keep huggingface_hub compatible with "every" pytorch versions in the future (e.g. what if an internal helper is moved?).

In the end, the only thing that I want for _get_unique_id or get_torch_storage_id is to get a unique id from a pytorch tensor. Would it be possible to expose a generic helper that handles all the cases in pytorch directly? I feel that adding more and more logic to get unique ids in huggingface_hub is not the correct place. Note that this logic is currently duplicated in transformers and accelerate as well but we are in the process of centralizing it to huggingface_hub, so at least it's a first step towards uniformization :)

jerryzh168 · 2024-08-14T17:57:04Z

@Wauplin I can add this to pytorch, but it will only be available in nightlies or torch 2.5+, so don't we still need to add this in huggingface_hub or other places for now? what is the version requirement for huggingface_hub/safetensors etc.? do they plan to work with all different torch versions?

Summary: Currently [huggingface_hub](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/serialization/_torch.py), [safetensors](https://github.com/huggingface/safetensors/blob/main/bindings/python/py_src/safetensors/torch.py#L11), [accelerate](https://github.com/huggingface/accelerate/blob/a452327e8e04b20779882dc491e00de602d554cb/src/accelerate/utils/modeling.py#L175) all have their own implementation of `get_storage_id` and `get_storage_size`, `storage_ptr`, which makes assumption on internal implementation details of torch.Tensor, and storage, and does not work for wrapper tensor subclasses Motivated by huggingface/huggingface_hub#2440 (comment) maybe it makes more sense to add these as utils in pytorch so they can be maintained by us instead This PR added `get_storage_id`: returns a unique identifier for the tensor storage, for tensor subclasses, it returns a nested tuple of unique ids from underlying plain tensors `get_storage_size`: returns the size in bytes for the underlying storage, for tensor subclasses, it returns the sum of the size from all underlying plain tensors Test Plan: python test/test_utils.py TestStorageUtils Reviewers: Subscribers: Tasks: Tags:

Summary: Currently [huggingface_hub](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/serialization/_torch.py), [safetensors](https://github.com/huggingface/safetensors/blob/main/bindings/python/py_src/safetensors/torch.py#L11), [accelerate](https://github.com/huggingface/accelerate/blob/a452327e8e04b20779882dc491e00de602d554cb/src/accelerate/utils/modeling.py#L175) all have their own implementation of `get_storage_id` and `get_storage_size`, `storage_ptr`, which makes assumption on internal implementation details of torch.Tensor, and storage, and does not work for wrapper tensor subclasses Motivated by huggingface/huggingface_hub#2440 (comment) maybe it makes more sense to add these as utils in pytorch so they can be maintained by us instead This PR added `get_storage_id`: returns a unique identifier for the tensor storage, for tensor subclasses, it returns a nested tuple of unique ids from underlying plain tensors `get_storage_size`: returns the size in bytes for the underlying storage, for tensor subclasses, it returns the sum of the size from all underlying plain tensors Test Plan: python test/test_utils.py TestStorageUtils Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Currently [huggingface_hub](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/serialization/_torch.py), [safetensors](https://github.com/huggingface/safetensors/blob/main/bindings/python/py_src/safetensors/torch.py#L11), [accelerate](https://github.com/huggingface/accelerate/blob/a452327e8e04b20779882dc491e00de602d554cb/src/accelerate/utils/modeling.py#L175) all have their own implementation of `get_storage_id` and `get_storage_size`, `storage_ptr`, which makes assumption on internal implementation details of torch.Tensor, and storage, and does not work for wrapper tensor subclasses Motivated by huggingface/huggingface_hub#2440 (comment) maybe it makes more sense to add these as utils in pytorch so they can be maintained by us instead This PR added `get_storage_id`: returns a unique identifier for the tensor storage, for tensor subclasses, it returns a nested tuple of unique ids from underlying plain tensors `get_storage_size`: returns the size in bytes for the underlying storage, for tensor subclasses, it returns the sum of the size from all underlying plain tensors Test Plan: python test/test_utils.py TestStorageUtils Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 155983cad1176187ef703b69ad06651cbf2ccd83 Pull Request resolved: #133524

Summary: Currently [huggingface_hub](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/serialization/_torch.py), [safetensors](https://github.com/huggingface/safetensors/blob/main/bindings/python/py_src/safetensors/torch.py#L11), [accelerate](https://github.com/huggingface/accelerate/blob/a452327e8e04b20779882dc491e00de602d554cb/src/accelerate/utils/modeling.py#L175) all have their own implementation of `get_storage_id` and `get_storage_size`, `storage_ptr`, which makes assumption on internal implementation details of torch.Tensor, and storage, and does not work for wrapper tensor subclasses Motivated by huggingface/huggingface_hub#2440 (comment) maybe it makes more sense to add these as utils in pytorch so they can be maintained by us instead This PR added `get_storage_id`: returns a unique identifier for the tensor storage, for tensor subclasses, it returns a nested tuple of unique ids from underlying plain tensors `get_storage_size`: returns the size in bytes for the underlying storage, for tensor subclasses, it returns the sum of the size from all underlying plain tensors Test Plan: python test/test_utils.py TestStorageUtils Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Currently [huggingface_hub](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/serialization/_torch.py), [safetensors](https://github.com/huggingface/safetensors/blob/main/bindings/python/py_src/safetensors/torch.py#L11), [accelerate](https://github.com/huggingface/accelerate/blob/a452327e8e04b20779882dc491e00de602d554cb/src/accelerate/utils/modeling.py#L175) all have their own implementation of `get_storage_id` and `get_storage_size`, `storage_ptr`, which makes assumption on internal implementation details of torch.Tensor, and storage, and does not work for wrapper tensor subclasses Motivated by huggingface/huggingface_hub#2440 (comment) maybe it makes more sense to add these as utils in pytorch so they can be maintained by us instead This PR added `get_storage_id`: returns a unique identifier for the tensor storage, for tensor subclasses, it returns a nested tuple of unique ids from underlying plain tensors `get_storage_size`: returns the size in bytes for the underlying storage, for tensor subclasses, it returns the sum of the size from all underlying plain tensors Test Plan: python test/test_utils.py TestStorageUtils Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 87d1bd591a8cb1f926738f8e251fc56d8cd9e3f2 Pull Request resolved: #133524

HuggingFaceDocBuilderDev · 2024-08-20T09:19:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin

Hi @jerryzh168, thanks for the changes! The implementation looks good to me at first glance. I've added a comment regarding the version parsing thing. In the meantime, I'll run some tests locally on my side.

In a follow-up PR, we'll add CI tests for torch 2.0 and 2.5 (for instance) to be sure both versions are compatible. I can take care of that part.

src/huggingface_hub/serialization/_torch.py

jerryzh168 · 2024-08-23T03:52:30Z

@Wauplin please take a look again, thanks!

Wauplin

Thanks for the changes @jerryzh168! I left some comments below

src/huggingface_hub/serialization/_torch.py

Wauplin · 2024-08-23T17:18:46Z

src/huggingface_hub/serialization/_torch.py

+
+    Taken from https://github.com/huggingface/transformers/blob/1ecf5f7c982d761b4daaa96719d162c324187c64/src/transformers/pytorch_utils.py#L278.
+    """
+    return tensor.device, _get_unique_id(tensor), get_torch_storage_size(tensor)


If I understand correctly, two "meta" tensors can have the exact same _get_unique_id(tensor), the exact same tensor.device but still be different, correct? If different, how can we be sure their storage size distinguish them? Can it happen that they randomly happen to have the same storage size?

yeah it just means the current approach does not generalize to meta tensor, does it work previously?

I think we'd need to reimplement the higher level sharding logic in the end in pytorch, I added some PoC in the slack, let me make a quick intro there

yeah it just means the current approach does not generalize to meta tensor, does it work previously?

I don't think so since we never had to serialize meta tensors. The only use case that could benefit from that is in accelerate (find tied parameters from the meta model). Right now, this is how we do for meta tensors: https://github.com/huggingface/accelerate/blob/726140cad2f2361d79da7786a7b96d0bee591c48/src/accelerate/utils/modeling.py#L677

tests/test_serialization.py

jerryzh168 · 2024-08-24T03:44:17Z

tests/test_serialization.py

+            # TODO: need to fix safetensor support for tensor subclasses before we can add this
+            # to test
+            # shared_layer = TwoTensor(torch.tensor([4]), torch.tensor([4]))


@Wauplin this seem to fail because safetensor does not support wrapper tensor subclass yet, we can enable this when we add the similar support in safetensors

Actually we don't have to test on save_torch_state_dict using safetensors. It is possible to simply test with split_torch_state_dict_into_shards (needs to be imported) since what we really care about is the grouping of the tensors, not necessarily the serialization -for now at least-. Could you update the fixture and test in that direction?

Also, torch_state_dict_shared_layers is already taken as a name for a fixture so code quality is complaining in the CI.

Wauplin · 2024-08-26T16:29:57Z

I pushed a commit to fix some linting + merge from main so that we are now testing the pipeline on both pytorch~=1.11 and latest pytorch version. This way I assume we will be able to spot breaking changes in the future while still testing on previous versions up to 24 months.

Last remaining thing is the test to complete IMO (see #2440 (comment)). Otherwise, everything looks good!

… dict

jerryzh168 · 2024-08-29T23:31:03Z

@Wauplin @SunMarc I added the test for tensor subclass for split_torch_state_dict_into_shards, please take a look again, thanks!

Wauplin

Thanks @jerryzh168! Everything looks good to me now! I fixed a few formatting issues so we're now ready to merge this. Thanks again and thanks @SunMarc as well for the inputs :)

Looking forward to see deeper integration into pytorch directly!

…nfig quantized model Summary: After huggingface/huggingface_hub#2440 we added non-safetensor serialization and deserialization in huggingface, with this we can now add the support in transformers Note that we don't plan to add safetensor serialization due to different goals of wrapper tensor subclass and safetensor see README for more details Test Plan: tested locally Reviewers: Subscribers: Tasks: Tags:

…33456) * Enable non-safetensor serialization and deserialization for TorchAoConfig quantized model Summary: After huggingface/huggingface_hub#2440 we added non-safetensor serialization and deserialization in huggingface, with this we can now add the support in transformers Note that we don't plan to add safetensor serialization due to different goals of wrapper tensor subclass and safetensor see README for more details Test Plan: tested locally Reviewers: Subscribers: Tasks: Tags: * formatting * formatting * minor fix * formatting * address comments * comments * minor fix * update doc * refactor compressed tensor quantizer

…uggingface#33456) * Enable non-safetensor serialization and deserialization for TorchAoConfig quantized model Summary: After huggingface/huggingface_hub#2440 we added non-safetensor serialization and deserialization in huggingface, with this we can now add the support in transformers Note that we don't plan to add safetensor serialization due to different goals of wrapper tensor subclass and safetensor see README for more details Test Plan: tested locally Reviewers: Subscribers: Tasks: Tags: * formatting * formatting * minor fix * formatting * address comments * comments * minor fix * update doc * refactor compressed tensor quantizer

jerryzh168 changed the title ~~Making wrapper tensor subclass to work in huggingface_hub serializati…~~ Making wrapper tensor subclass to work in serialization Aug 6, 2024

jerryzh168 force-pushed the non-safetensor-ser branch 2 times, most recently from 40413f5 to d6c4256 Compare August 8, 2024 17:01

SunMarc requested a review from Wauplin August 9, 2024 01:42

jerryzh168 force-pushed the non-safetensor-ser branch from d6c4256 to 40d9bda Compare August 9, 2024 03:31

add tests

4ebe8aa

jerryzh168 force-pushed the non-safetensor-ser branch from 40d9bda to 4ebe8aa Compare August 9, 2024 03:32

jerryzh168 mentioned this pull request Aug 9, 2024

Adding support for wrapper tensor subclass for safetensor huggingface/safetensors#516

Closed

update signature to include new changes for tensor subclass

0a1cc96

jerryzh168 mentioned this pull request Aug 12, 2024

Add TorchAOHfQuantizer huggingface/transformers#32306

Merged

jerryzh168 mentioned this pull request Aug 15, 2024

Add storage_utils pytorch/pytorch#133524

Closed

jerryzh168 and others added 5 commits August 19, 2024 16:33

add torch version checks and move around import

8fd26b7

more fixes

b889019

tested with torch 2.0.0 and 2.5.0

c91fe18

Merge branch 'main' into non-safetensor-ser

83f6884

Merge branch 'main' into non-safetensor-ser

a02657d

Wauplin reviewed Aug 20, 2024

View reviewed changes

src/huggingface_hub/serialization/_torch.py Outdated Show resolved Hide resolved

remove torch_version_at_least from _torch.py

94b3885

jerryzh168 requested a review from Wauplin August 20, 2024 20:42

jerryzh168 added 2 commits August 21, 2024 20:20

simplify code for checking if tensor subclass is available or not

1b25148

minor fix

ca3c228

Wauplin reviewed Aug 23, 2024

View reviewed changes

addressing comments and run tests with torch 2.4.0

122a68a

jerryzh168 commented Aug 24, 2024

View reviewed changes

jerryzh168 requested a review from Wauplin August 24, 2024 03:44

Wauplin mentioned this pull request Aug 26, 2024

Torch: test on 2.0 and latest versions + explicitly load with weights_only=True #2488

Merged

Wauplin and others added 2 commits August 26, 2024 17:50

Merge branch 'main' into non-safetensor-ser

3c79607

some linting

7fb689b

add test_split_torch_state_dict_into_shards for tensor subclass state…

45cb98a

… dict

jerryzh168 requested a review from SunMarc August 29, 2024 23:30

Wauplin added 3 commits August 30, 2024 10:42

lint

1486253

style

7850025

quality

473c317

Wauplin approved these changes Aug 30, 2024

View reviewed changes

Wauplin merged commit f12ba86 into huggingface:main Aug 30, 2024
16 checks passed

jerryzh168 mentioned this pull request Sep 13, 2024

Enable non-safetensor ser/deser for TorchAoConfig quantized model 🔴 huggingface/transformers#33456

Merged

LysandreJik mentioned this pull request Sep 17, 2024

PretrainedModule.save_pretrained(safe_serialization=False) does not work with PyTorch wrapper tensor subclasses huggingface/transformers#32364

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making wrapper tensor subclass to work in serialization #2440

Making wrapper tensor subclass to work in serialization #2440

jerryzh168 commented Aug 6, 2024 •

edited

Loading

jerryzh168 commented Aug 8, 2024

SunMarc commented Aug 9, 2024

Wauplin commented Aug 12, 2024

jerryzh168 commented Aug 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 20, 2024

Wauplin left a comment

jerryzh168 commented Aug 23, 2024

Wauplin left a comment

Wauplin Aug 23, 2024

jerryzh168 Aug 23, 2024

SunMarc Aug 26, 2024

jerryzh168 Aug 24, 2024

Wauplin Aug 26, 2024

Wauplin commented Aug 26, 2024

jerryzh168 commented Aug 29, 2024

Wauplin left a comment

Making wrapper tensor subclass to work in serialization #2440

Making wrapper tensor subclass to work in serialization #2440

Conversation

jerryzh168 commented Aug 6, 2024 • edited Loading

jerryzh168 commented Aug 8, 2024

SunMarc commented Aug 9, 2024

Wauplin commented Aug 12, 2024

jerryzh168 commented Aug 14, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Aug 20, 2024

Wauplin left a comment

Choose a reason for hiding this comment

jerryzh168 commented Aug 23, 2024

Wauplin left a comment

Choose a reason for hiding this comment

Wauplin Aug 23, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 23, 2024

Choose a reason for hiding this comment

SunMarc Aug 26, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 24, 2024

Choose a reason for hiding this comment

Wauplin Aug 26, 2024

Choose a reason for hiding this comment

Wauplin commented Aug 26, 2024

jerryzh168 commented Aug 29, 2024

Wauplin left a comment

Choose a reason for hiding this comment

jerryzh168 commented Aug 6, 2024 •

edited

Loading

jerryzh168 commented Aug 14, 2024 •

edited

Loading