Add `AffineQuantizedTensor` based workflow doc and examples #277

jerryzh168 · 2024-05-25T00:28:25Z

Summary:
att

Test Plan:
.

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-05-25T00:28:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/277

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit accdb3d with merge base 6dd63b8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim · 2024-05-25T19:38:35Z

torchao/quantization/README.md

+
+
+# benchmark to see the speedup
+from torch.utils.benchmark import Timer


We have a benchmark util now https://github.com/pytorch/ao/blob/main/torchao/utils.py#L4

Mind if we update it in case yours is subtly different?

sure, but we probably need to follow up and enable benchmark for cpu and mps as well for that, created an issue here: #287

torchao/quantization/README.md

msaroufim · 2024-05-25T19:43:27Z

torchao/quantization/README.md

+        # optional filtering for module name, shape etc.
+        # quantization activation (needed by dynamic quantization)
+        # m.weight = nn.Parameter(to_laq(m.weight, device=..., layout=..., ...))
+        m.weight = nn.Parameter(to_aq(m.weight, device=..., layout=..., ...))


Can we make this a real snippet people can copy paste?

msaroufim

Some minor feedback but thank you for writing this!

Summary: att Test Plan: . Reviewers: Subscribers: Tasks: Tags:

)

Add special case for zero-temperature sampling. For stories15M on my devserver, seems to improve tokens/sec as follows: before: 189, 180, 166 after: 264, 285, 285

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 25, 2024

jerryzh168 force-pushed the add-aqt-docs branch from 643403a to c0907aa Compare May 25, 2024 00:30

jerryzh168 requested review from cpuhrsch, msaroufim and HDCharles May 25, 2024 00:31

msaroufim reviewed May 25, 2024

View reviewed changes

torchao/quantization/README.md Outdated Show resolved Hide resolved

msaroufim reviewed May 25, 2024

View reviewed changes

torchao/quantization/README.md Outdated Show resolved Hide resolved

msaroufim reviewed May 25, 2024

View reviewed changes

msaroufim requested changes May 25, 2024

View reviewed changes

jerryzh168 force-pushed the add-aqt-docs branch from e934e1d to 192b92f Compare May 29, 2024 02:00

jerryzh168 requested a review from msaroufim May 29, 2024 02:07

Add AffineQuantizedTensor based workflow doc and examples

ac4af97

Summary: att Test Plan: . Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the add-aqt-docs branch from 192b92f to ac4af97 Compare May 29, 2024 02:08

msaroufim approved these changes May 29, 2024

View reviewed changes

Merge branch 'main' into add-aqt-docs

accdb3d

msaroufim merged commit cbc74ee into pytorch:main May 29, 2024
13 checks passed

jerryzh168 deleted the add-aqt-docs branch July 2, 2024 20:31

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Add AffineQuantizedTensor based workflow doc and examples (pytorch#277

dfed332

)

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

zero temp sampling (pytorch#277)

309798d

Add special case for zero-temperature sampling. For stories15M on my devserver, seems to improve tokens/sec as follows: before: 189, 180, 166 after: 264, 285, 285

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `AffineQuantizedTensor` based workflow doc and examples #277

Add `AffineQuantizedTensor` based workflow doc and examples #277

jerryzh168 commented May 25, 2024

pytorch-bot bot commented May 25, 2024 •

edited

Loading

msaroufim May 25, 2024

jerryzh168 May 29, 2024

msaroufim May 25, 2024

jerryzh168 May 29, 2024

msaroufim left a comment



		# benchmark to see the speedup
		from torch.utils.benchmark import Timer

Add AffineQuantizedTensor based workflow doc and examples #277

Add AffineQuantizedTensor based workflow doc and examples #277

Conversation

jerryzh168 commented May 25, 2024

pytorch-bot bot commented May 25, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/277

✅ No Failures

msaroufim May 25, 2024

Choose a reason for hiding this comment

jerryzh168 May 29, 2024

Choose a reason for hiding this comment

msaroufim May 25, 2024

Choose a reason for hiding this comment

jerryzh168 May 29, 2024

Choose a reason for hiding this comment

msaroufim left a comment

Choose a reason for hiding this comment

Add `AffineQuantizedTensor` based workflow doc and examples #277

Add `AffineQuantizedTensor` based workflow doc and examples #277

pytorch-bot bot commented May 25, 2024 •

edited

Loading