Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uint1 to uint7 dtypes #117208

Closed
wants to merge 6 commits into from

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Jan 11, 2024

Stack from ghstack (oldest at bottom):

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g Int4GroupedQuantization(torch.Tensor) will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jan 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117208

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d382027 with merge base f70aeb4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test that the Python bindings are working and you can make a wrapper subclass with it

c10/core/ScalarType.h Outdated Show resolved Hide resolved
@vadimkantorov
Copy link
Contributor

vadimkantorov commented Jan 11, 2024

Should there also be opaque dtypes for 2-bit/3-bit/4-bit sub-bytes? I think, the quantization efforts now go even there... Regarding 1-bit, there also exists "One-bit Adam" (I think even implemented somewhere in Meta experimental optimizer repos) which use the 1-bit - can be a good showcase for uint1 or for bitmap/bittensor dtype in the related:

One nasty thing is that if you google uint1 or uint4, 1-byte and 4-byte exotic namings come out: https://people.montefiore.uliege.be/boigelot/research/lash/man/uint.html which is not very nice...

What is the pre-supposed indexing semantics for these sub-byte types? (and if some standardized indexing is supposed at all?) e.g. uint1tensor[3] should retrieve the 3-rd bit of the first byte? or the third encompassing byte?

@jerryzh168
Copy link
Contributor Author

Should there also be opaque dtypes for 2-bit/3-bit/4-bit sub-bytes?

yeah we have these in the PR, this PR adds all dtypes from uint1 to uint7

One nasty thing is that if you google uint1 or uint4, 1-byte and 4-byte exotic namings come out: people.montefiore.uliege.be/boigelot/research/lash/man/uint.html which is not very nice...

do you mean the lash-dtype naming?

What is the pre-supposed indexing semantics for these sub-byte types? (and if some standardized indexing is supposed at all?) e.g. uint1tensor[3] should retrieve the 3-rd bit of the first byte? or the third encompassing byte?

I don't think we want to support sub byte indexing or non-byte aligned indexing. but we can support byte aligned indexing by unpacking, see: https://github.com/pytorch-labs/ao/pull/13/files#diff-109a7f01577eb57b0d9facb5e1c17c23158f544b7203cda513075487a389b2f6R160-R165

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Jan 11, 2024

Yeah, the indexing / virtual vs actual byte shape is important for the usecase like BitTensor/BitMap where semantically they want to be like compressed BoolTensor

Regarding the naming, I mean that uint1/uint4 probably would google badly, as currently there exist some trash mentions in other contexts where 1/4 stand for byte and not bit, and uint4_t does not exist in C context.

Regarding the uint1, I would also suggest to have some alias or a subclass in core like torch.bit or torch.bitmap or torch.bitset or similar which suggest also a higher-level usage - the compressing BoolTensor might a relatively frequent high-level usecase, also with pack/unpack and RoaringBitmap-like ops, and then maybe some classical morphological / binary image processing ops, also maybe with some LSH/hashing ops

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@github-actions github-actions bot added the release notes: quantization release notes category label Jan 11, 2024
Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@jerryzh168 jerryzh168 requested review from ezyang and albanD January 11, 2024 22:20
@jerryzh168
Copy link
Contributor Author

Regarding the naming, I mean that uint1/uint4 probably would google badly, as currently there exist some trash mentions in other contexts where 1/4 stand for byte and not bit, and uint4_t does not exist in C context.

I think uint1 to uint7 is consistent with C/C++ naming of uint8, uint16, uint32 dtypes

Regarding the uint1, I would also suggest to have some alias or a subclass in core like torch.bit or torch.bitmap or torch.bitset or similar which suggest also a higher-level usage - the compressing BoolTensor might a relatively frequent high-level usecase, also with pack/unpack and RoaringBitmap-like ops, and then maybe some classical morphological / binary image processing ops, also maybe with some LSH/hashing ops

can you write some quick code example of what you want to do (what is higher-level usage)? I think these may be built in tensor subclass in general

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Jan 12, 2024

Yeah, the indexing / virtual vs actual byte shape is important for the usecase like BitTensor/BitMap where semantically they want to be like compressed BoolTensor

For non-aligned indexing, I think the most plausible implementation strategy in this direction is an introduction of a 1/8 SymInt (similar to SingletonSymNode we use to represent ragged dimension). Then, if I have a 8 element uint1 tensor, the storage offset of the 1-index element is 1/8, 2-index is 2/8, and so forth.

TBH, @jerryzh168 and co are not that interested in the packed bool tensor use case, so someone else is probably going to have to implement it.

Regarding the uint1, I would also suggest to have some alias or a subclass in core like torch.bit or torch.bitmap or torch.bitset or similar which suggest also a higher-level usage - the compressing BoolTensor might a relatively frequent high-level usecase, also with pack/unpack and RoaringBitmap-like ops, and then maybe some classical morphological / binary image processing ops, also maybe with some LSH/hashing ops

My vote is for torch.bit

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vadimkantorov
Copy link
Contributor

vadimkantorov commented Jan 12, 2024

what is higher-level usage

e.g. as

  • pack/unpack for compressing BoolTensor
  • set ops (RoaringBitmaps style)
  • binary image processing / local binary patterns,
  • Hadamard matrices, Haar wavelet basis matrices - probably not worth saving space on the analysis matrices, but in general representing 0/1 or -1/1 or -1/0/1-matrices efficiently can be cool (which consist only of -1/1 or -1/0/1 - although stretching to -1 is maybe taking it too far (e.g. b'1' could mean 1s and b'0' could mean -1s=absence of 1| and for 2bit case b'01' could mean 1s, b'00' could mean 0s and b'11' could mean -1s) :)) etc :)
  • Further on - maybe for some binary neural nets in the future - intersecting/competing with 1-bit quantiztion in some limit extreme cases I guess

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype 

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 16, 2024
Summary:
As a follow up from #117208, this PR added a UInt4Tensor
in python, it can be used to construct a uint4 tensor and supports some basic operations like
view, slice etc.

We can extend this to support different quantized tensors as mentioned in the previous PR.

Later
* tensor factory support for uint4, and other sub byte dtypes
* other sub byte tensor subclass support

Test Plan:
python test/test_tensors.py -k test_constructor

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 7461060077376810b3036adaa984ee01a5110705
Pull Request resolved: #117557
jerryzh168 added a commit that referenced this pull request Sep 19, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 19, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e2c23c7e0422773c8cd6552437333ff55cd038ce
Pull Request resolved: #136301
jerryzh168 added a commit that referenced this pull request Sep 19, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 19, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e2c23c7e0422773c8cd6552437333ff55cd038ce
Pull Request resolved: #136301
jerryzh168 added a commit that referenced this pull request Sep 19, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 19, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c71df2e0f95dcb50e923b5e0baa8fe69369964fb
Pull Request resolved: #136301
jerryzh168 added a commit that referenced this pull request Sep 25, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: cd0264ca1729a24dc8fc388d31e0b4a2f2ffb764
Pull Request resolved: #136301
jerryzh168 added a commit that referenced this pull request Sep 25, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Sep 25, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 27, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 9979fb52ef74c78a1c1f024bb1697ed51c008954
Pull Request resolved: #136301
pytorchmergebot pushed a commit that referenced this pull request Sep 28, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization (https://www.internalfb.com/diff/D62464487)

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: #136301
Approved by: https://github.com/ezyang
jerryzh168 added a commit that referenced this pull request Oct 14, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Oct 14, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 9979fb52ef74c78a1c1f024bb1697ed51c008954
Pull Request resolved: #137928
jerryzh168 added a commit that referenced this pull request Oct 14, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Oct 14, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 0d803c52b959500dd9e31d63bfb10ac3517fb62e
Pull Request resolved: #137928
pytorchmergebot pushed a commit that referenced this pull request Oct 15, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: ac4d8f77145b7ac8927ce57393d0781329fbacbe
Pull Request resolved: #137928
jerryzh168 added a commit that referenced this pull request Oct 16, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Oct 16, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2f974946efc9003f1b996728c49001879f00ec23
Pull Request resolved: #137928
jerryzh168 added a commit that referenced this pull request Oct 16, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Oct 16, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Oct 16, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Oct 16, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 70729384e5c2f9cd1be94bf7d079cfcefa7bb094
Pull Request resolved: #137928
pytorchmergebot pushed a commit that referenced this pull request Oct 18, 2024
Summary:
Similar to #117208, we want to add int1 to int7 for edge use cases
for weight quantization

Test Plan:
python test/test_quantization.py -k test_uint4_int4_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944)
Pull Request resolved: #137928
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: quantization release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants