Add uint1 to uint7 dtypes #117208

jerryzh168 · 2024-01-11T03:01:10Z

Stack from ghstack (oldest at bottom):

-> Add uint1 to uint7 dtypes #117208

Summary:
These dtypes are added since we see more demand for these sub byte dtypes, especially with
the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks)

Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass.
e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later)

Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well.
e.g Int4GroupedQuantization(torch.Tensor) will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example)

Test Plan:
CIs
python test/test_quantization.py -k test_uint1_7_dtype

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example) Test Plan: CIs Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2024-01-11T03:01:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117208

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d382027 with merge base f70aeb4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang

Add a test that the Python bindings are working and you can make a wrapper subclass with it

c10/core/ScalarType.h

aten/src/ATen/Dispatch_v2.h

vadimkantorov · 2024-01-11T15:50:58Z

Should there also be opaque dtypes for 2-bit/3-bit/4-bit sub-bytes? I think, the quantization efforts now go even there... Regarding 1-bit, there also exists "One-bit Adam" (I think even implemented somewhere in Meta experimental optimizer repos) which use the 1-bit - can be a good showcase for uint1 or for bitmap/bittensor dtype in the related:

[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used ao#292

One nasty thing is that if you google uint1 or uint4, 1-byte and 4-byte exotic namings come out: https://people.montefiore.uliege.be/boigelot/research/lash/man/uint.html which is not very nice...

What is the pre-supposed indexing semantics for these sub-byte types? (and if some standardized indexing is supposed at all?) e.g. uint1tensor[3] should retrieve the 3-rd bit of the first byte? or the third encompassing byte?

jerryzh168 · 2024-01-11T17:41:05Z

Should there also be opaque dtypes for 2-bit/3-bit/4-bit sub-bytes?

yeah we have these in the PR, this PR adds all dtypes from uint1 to uint7

One nasty thing is that if you google uint1 or uint4, 1-byte and 4-byte exotic namings come out: people.montefiore.uliege.be/boigelot/research/lash/man/uint.html which is not very nice...

do you mean the lash-dtype naming?

What is the pre-supposed indexing semantics for these sub-byte types? (and if some standardized indexing is supposed at all?) e.g. uint1tensor[3] should retrieve the 3-rd bit of the first byte? or the third encompassing byte?

I don't think we want to support sub byte indexing or non-byte aligned indexing. but we can support byte aligned indexing by unpacking, see: https://github.com/pytorch-labs/ao/pull/13/files#diff-109a7f01577eb57b0d9facb5e1c17c23158f544b7203cda513075487a389b2f6R160-R165

vadimkantorov · 2024-01-11T17:53:00Z

Yeah, the indexing / virtual vs actual byte shape is important for the usecase like BitTensor/BitMap where semantically they want to be like compressed BoolTensor

Regarding the naming, I mean that uint1/uint4 probably would google badly, as currently there exist some trash mentions in other contexts where 1/4 stand for byte and not bit, and uint4_t does not exist in C context.

Regarding the uint1, I would also suggest to have some alias or a subclass in core like torch.bit or torch.bitmap or torch.bitset or similar which suggest also a higher-level usage - the compressing BoolTensor might a relatively frequent high-level usecase, also with pack/unpack and RoaringBitmap-like ops, and then maybe some classical morphological / binary image processing ops, also maybe with some LSH/hashing ops

Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example) Test Plan: CIs Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example) Test Plan: CIs python test/test_quantization.py -k test_uint1_7_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jerryzh168 · 2024-01-11T22:23:11Z

Regarding the naming, I mean that uint1/uint4 probably would google badly, as currently there exist some trash mentions in other contexts where 1/4 stand for byte and not bit, and uint4_t does not exist in C context.

I think uint1 to uint7 is consistent with C/C++ naming of uint8, uint16, uint32 dtypes

Regarding the uint1, I would also suggest to have some alias or a subclass in core like torch.bit or torch.bitmap or torch.bitset or similar which suggest also a higher-level usage - the compressing BoolTensor might a relatively frequent high-level usecase, also with pack/unpack and RoaringBitmap-like ops, and then maybe some classical morphological / binary image processing ops, also maybe with some LSH/hashing ops

can you write some quick code example of what you want to do (what is higher-level usage)? I think these may be built in tensor subclass in general

Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example) Test Plan: CIs python test/test_quantization.py -k test_uint1_7_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

ezyang · 2024-01-12T01:02:13Z

Yeah, the indexing / virtual vs actual byte shape is important for the usecase like BitTensor/BitMap where semantically they want to be like compressed BoolTensor

For non-aligned indexing, I think the most plausible implementation strategy in this direction is an introduction of a 1/8 SymInt (similar to SingletonSymNode we use to represent ragged dimension). Then, if I have a 8 element uint1 tensor, the storage offset of the 1-index element is 1/8, 2-index is 2/8, and so forth.

TBH, @jerryzh168 and co are not that interested in the packed bool tensor use case, so someone else is probably going to have to implement it.

Regarding the uint1, I would also suggest to have some alias or a subclass in core like torch.bit or torch.bitmap or torch.bitset or similar which suggest also a higher-level usage - the compressing BoolTensor might a relatively frequent high-level usecase, also with pack/unpack and RoaringBitmap-like ops, and then maybe some classical morphological / binary image processing ops, also maybe with some LSH/hashing ops

My vote is for torch.bit

test/quantization/core/test_utils.py

torch/csrc/utils/tensor_dtypes.cpp

Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example) Test Plan: CIs python test/test_quantization.py -k test_uint1_7_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

vadimkantorov · 2024-01-12T08:18:01Z

what is higher-level usage

e.g. as

pack/unpack for compressing BoolTensor
set ops (RoaringBitmaps style)
binary image processing / local binary patterns,
Hadamard matrices, Haar wavelet basis matrices - probably not worth saving space on the analysis matrices, but in general representing 0/1 or -1/1 or -1/0/1-matrices efficiently can be cool (which consist only of -1/1 or -1/0/1 - although stretching to -1 is maybe taking it too far (e.g. b'1' could mean 1s and b'0' could mean -1s=absence of 1| and for 2bit case b'01' could mean 1s, b'00' could mean 0s and b'11' could mean -1s) :)) etc :)
Further on - maybe for some binary neural nets in the future - intersecting/competing with 1-bit quantiztion in some limit extreme cases I guess

torch/csrc/utils/tensor_dtypes.cpp

Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see pytorch/ao#13 as an example) Test Plan: CIs python test/test_quantization.py -k test_uint1_7_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: As a follow up from #117208, this PR added a UInt4Tensor in python, it can be used to construct a uint4 tensor and supports some basic operations like view, slice etc. We can extend this to support different quantized tensors as mentioned in the previous PR. Later * tensor factory support for uint4, and other sub byte dtypes * other sub byte tensor subclass support Test Plan: python test/test_tensors.py -k test_constructor Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7461060077376810b3036adaa984ee01a5110705 Pull Request resolved: #117557

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e2c23c7e0422773c8cd6552437333ff55cd038ce Pull Request resolved: #136301

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e2c23c7e0422773c8cd6552437333ff55cd038ce Pull Request resolved: #136301

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization (https://www.internalfb.com/diff/D62464487) Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c71df2e0f95dcb50e923b5e0baa8fe69369964fb Pull Request resolved: #136301

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: cd0264ca1729a24dc8fc388d31e0b4a2f2ffb764 Pull Request resolved: #136301

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization (https://www.internalfb.com/diff/D62464487) Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9979fb52ef74c78a1c1f024bb1697ed51c008954 Pull Request resolved: #136301

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization (https://www.internalfb.com/diff/D62464487) Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: #136301 Approved by: https://github.com/ezyang

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9979fb52ef74c78a1c1f024bb1697ed51c008954 Pull Request resolved: #137928

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0d803c52b959500dd9e31d63bfb10ac3517fb62e Pull Request resolved: #137928

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ac4d8f77145b7ac8927ce57393d0781329fbacbe Pull Request resolved: #137928

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2f974946efc9003f1b996728c49001879f00ec23 Pull Request resolved: #137928

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) [ghstack-poisoned]

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 70729384e5c2f9cd1be94bf7d079cfcefa7bb094 Pull Request resolved: #137928

Summary: Similar to #117208, we want to add int1 to int7 for edge use cases for weight quantization Test Plan: python test/test_quantization.py -k test_uint4_int4_dtype Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D64344944](https://our.internmc.facebook.com/intern/diff/D64344944) Pull Request resolved: #137928 Approved by: https://github.com/malfet

jerryzh168 requested review from ezyang, albanD and soulitzer January 11, 2024 03:32

ezyang reviewed Jan 11, 2024

View reviewed changes

c10/core/ScalarType.h Outdated Show resolved Hide resolved

albanD reviewed Jan 11, 2024

View reviewed changes

aten/src/ATen/Dispatch_v2.h Show resolved Hide resolved

github-actions bot added the release notes: quantization release notes category label Jan 11, 2024

jerryzh168 requested review from ezyang and albanD January 11, 2024 22:20

ezyang reviewed Jan 12, 2024

View reviewed changes

test/quantization/core/test_utils.py Show resolved Hide resolved

ezyang reviewed Jan 12, 2024

View reviewed changes

torch/csrc/utils/tensor_dtypes.cpp Outdated Show resolved Hide resolved

ezyang approved these changes Jan 12, 2024

View reviewed changes

ezyang reviewed Jan 12, 2024

View reviewed changes

torch/csrc/utils/tensor_dtypes.cpp Outdated Show resolved Hide resolved

jerryzh168 mentioned this pull request Jan 16, 2024

Add UInt4Tensor with tensor subclass #117557

Closed

This was referenced Jan 18, 2024

Add support for instantiating a new torch.dtype in python #116294

Closed

Adding bits8 to inductor codegen #116295

Closed

jerryzh168 mentioned this pull request Sep 14, 2024

[RFC] torchao Contributor Guide pytorch/ao#391

Open

jerryzh168 mentioned this pull request Sep 19, 2024

Add int1 to int7 dtypes #136301

Closed

jerryzh168 mentioned this pull request Oct 14, 2024

[reland] Add int1 to int7 dtypes #137928

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add uint1 to uint7 dtypes #117208

Add uint1 to uint7 dtypes #117208

jerryzh168 commented Jan 11, 2024 •

edited

Loading

pytorch-bot bot commented Jan 11, 2024 •

edited

Loading

ezyang left a comment

vadimkantorov commented Jan 11, 2024 •

edited

Loading

jerryzh168 commented Jan 11, 2024

vadimkantorov commented Jan 11, 2024 •

edited

Loading

jerryzh168 commented Jan 11, 2024

ezyang commented Jan 12, 2024

vadimkantorov commented Jan 12, 2024 •

edited

Loading

Add uint1 to uint7 dtypes #117208

Add uint1 to uint7 dtypes #117208

Conversation

jerryzh168 commented Jan 11, 2024 • edited Loading

pytorch-bot bot commented Jan 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117208

✅ No Failures

ezyang left a comment

Choose a reason for hiding this comment

vadimkantorov commented Jan 11, 2024 • edited Loading

jerryzh168 commented Jan 11, 2024

vadimkantorov commented Jan 11, 2024 • edited Loading

jerryzh168 commented Jan 11, 2024

ezyang commented Jan 12, 2024

vadimkantorov commented Jan 12, 2024 • edited Loading

jerryzh168 commented Jan 11, 2024 •

edited

Loading

pytorch-bot bot commented Jan 11, 2024 •

edited

Loading

vadimkantorov commented Jan 11, 2024 •

edited

Loading

vadimkantorov commented Jan 11, 2024 •

edited

Loading

vadimkantorov commented Jan 12, 2024 •

edited

Loading