Add semi-structured sparse + dynamic int8 subclasses #36

jcaip · 2024-02-06T20:09:19Z

This PR adds in int8 dynamic quantization + semi-structured sparsity support into torchao.

This is implemented by extending the existing quantization subclasses to use sparse ops.
Ideally we would be able to compose subclasses, and call to_sparse_semi_structured from inside the quantization subclass, but ATM nested subclass tracing does not work with torch.compile and for stuff like fusing scales into the sparse multiply you would probably want to implement it like this anyways.

In particular, this PR adds in two new subclasses:

Int8DynamicallyQuantizedSemiStructuredSparseLinearWeight
Int8DynamicallyQuantized24CusparseltLinearFuseMulWeight

For the cuSPARSELt subclasse, I can extend Int8DynamicallyQuantizedWeightBase by storing the compressed representation in W_int_repr.

FuseMulWeight will fuse one of the multiplies for the dequant into the cuSPARSELt matmul op. However cuSPARSELt expects this in a float32 format, so this eats into our previous speedups since we're now passing this as a bfloat16 tensor.

However for the general subclass, I need to extend QuantizeWeightBase, because I need to pass two tensors (packed and meta) for the CUTLASS sparse mm op. This relies on to_sparse_semi_structured to decide between CUTLASS and cuSPARSELt, which is the right choice for UI but makes benchmarking between them kind of difficult, since it's a class var that decides which backend gets used. Maybe we should add a flag to to_sparse_semi_structured because you might mix between cutlass and cusparselt.

I've also added a benchmarking script for SAM. I don't know how we plan on handling dependencies in torchao, but let me know if there's a better place for that.

On batch size 32, I see a 1.16x speedup over bfloat16 torch.compile baseline, from 21.96 -> 25.54 img/s.

Other benchmarks (BS=16)

HDCharles · 2024-02-06T20:20:06Z

can you move the benchmark_sam and other .py files to one of the other directories? Maybe make a torch/benchmarks dir?

cpuhrsch · 2024-04-04T19:26:05Z

torchao/sparsity/dynamic_quant_sparse.py

+        )
+
+        int_data = w_int_repr.contiguous()
+        int_data = torch._cslt_compress(int_data)


Is it currently possible to replace this with

from torch.sparse import to_sparse_semi_structured int_data = to_sparse_semi_structured(int_data)

Leaving this one here b/c it's the cuSPARSELt fuse mul special one, but I have changed the subclass to be backend agnostic (Int8DynamicallyQuantizedSemiStructuredSparseLinearWeight)

side note - do you care much about naming convention? This name is so long I kind of want to change it to something simpler like QuantizedSemiSparseLinearWeight

jerryzh168 · 2024-04-24T00:24:54Z

torchao/sparsity/dynamic_quant_sparse.py

+        )
+
+
+class Int8DynamicallyQuantizedSemiStructuredSparseLinearWeight(QuantizedLinearWeightBase):


is sparsity also implemented with tensor subclass? I thought we should be able to compose them in some way?

We don't have nested subclassing support currently for tracing, so we can't compose them currently :( hence why we're landing in prototype.

I can tag you in the issue i'll make to raise this for core.

cpuhrsch

Approving for prototype. Thanks for sending this :D

msaroufim · 2024-04-25T19:26:02Z

@jcaip just saw the CI failure issue is our CI GPU is too old so updating it now to A10G which should work for your code so after I merge this #176 make sure to rebase to main

…bs/ao into jcaip/quant+sparse_subclasses

This PR adds in int8 dynamic quantization + semi-structured sparsity support into torchao.

wip

b3b2b6d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2024

jcaip added 12 commits March 21, 2024 13:28

test

5cc766e

wip

cd785b5

wip

d084b91

wip

646a157

test

5f0d450

refactor

f1a947f

fix quant api

af1bbbe

wip

e193f2f

updated script and cleaned up api

db7d98d

update

f9e3449

wip

65e290c

formatted api and added per-linear tuning to script

31706d5

jcaip changed the title ~~Add 24 sparse + dynamic int8 composed subclasses~~ [wip] Add 24 sparse + dynamic int8 subclasses Apr 1, 2024

jcaip added 2 commits April 1, 2024 16:19

update

0ce93c8

clean up imports

a7d5359

cpuhrsch reviewed Apr 4, 2024

View reviewed changes

This was referenced Apr 12, 2024

torch.mm() cuSPARSELt/CUTLASS back-end produce result of different dtypes for int8 inputs in case of an input being SparseSemiStructuredTensor pytorch/pytorch#115420

Closed

2:4 sparsity + PTQ(int8) model's inference #134

Open

jcaip and others added 2 commits April 19, 2024 17:39

Merge branch 'main' into jcaip/quant+sparse_subclasses

4648202

updated files

3b89696

jcaip changed the title ~~[wip] Add 24 sparse + dynamic int8 subclasses~~ Add semi-structured sparse + dynamic int8 subclasses Apr 22, 2024

jcaip added 3 commits April 22, 2024 16:57

remove file

a7b4f8b

remove fuse mul API

dafcd63

add test

070d773

jerryzh168 reviewed Apr 24, 2024

View reviewed changes

move to prototype

0a8e226

cpuhrsch approved these changes Apr 24, 2024

View reviewed changes

jcaip and others added 2 commits April 23, 2024 19:05

Merge branch 'main' into jcaip/quant+sparse_subclasses

be6b387

fix tests

4e0c8b3

jcaip force-pushed the jcaip/quant+sparse_subclasses branch from afc9d3d to 4e0c8b3 Compare April 24, 2024 09:57

jcaip added 3 commits April 24, 2024 02:59

fix test

db9ca9c

added init

7a9b6f9

updated test

fb3beb6

jcaip and others added 7 commits April 25, 2024 12:56

fix test

1a1edcc

Merge branch 'main' into jcaip/quant+sparse_subclasses

0523042

skip on pt 2.2

461a306

Merge branch 'jcaip/quant+sparse_subclasses' of github.com:pytorch-la…

5f787c0

…bs/ao into jcaip/quant+sparse_subclasses

Merge branch 'main' into jcaip/quant+sparse_subclasses

5962825

typo

ddc5dea

Merge branch 'jcaip/quant+sparse_subclasses' of github.com:pytorch-la…

a5f188a

…bs/ao into jcaip/quant+sparse_subclasses

jcaip merged commit 739e62d into main Apr 26, 2024
13 checks passed

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Add semi-structured sparse + dynamic int8 subclasses (pytorch#36)

51c1b83

This PR adds in int8 dynamic quantization + semi-structured sparsity support into torchao.

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

fixes issue pytorch#36

9745fe0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add semi-structured sparse + dynamic int8 subclasses #36

Add semi-structured sparse + dynamic int8 subclasses #36

jcaip commented Feb 6, 2024 •

edited

Loading

HDCharles commented Feb 6, 2024

cpuhrsch Apr 4, 2024

jcaip Apr 22, 2024

jerryzh168 Apr 24, 2024

jcaip Apr 24, 2024

cpuhrsch left a comment

msaroufim commented Apr 25, 2024

		)


		class Int8DynamicallyQuantizedSemiStructuredSparseLinearWeight(QuantizedLinearWeightBase):

Add semi-structured sparse + dynamic int8 subclasses #36

Add semi-structured sparse + dynamic int8 subclasses #36

Conversation

jcaip commented Feb 6, 2024 • edited Loading

Other benchmarks (BS=16)

HDCharles commented Feb 6, 2024

cpuhrsch Apr 4, 2024

Choose a reason for hiding this comment

jcaip Apr 22, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 24, 2024

Choose a reason for hiding this comment

jcaip Apr 24, 2024

Choose a reason for hiding this comment

cpuhrsch left a comment

Choose a reason for hiding this comment

msaroufim commented Apr 25, 2024

jcaip commented Feb 6, 2024 •

edited

Loading