[low-bit optim] Add coat for float8 optimizer #1231

MirMustafaAli · 2024-11-06T11:33:41Z

This is a Work in Progress PR for #1190.

As a draft PR, I have followed the first piece of advice by @gau-nernst of "extending OptimStateFp8". Have created a separate Dynamic Range Function Instead of creating a different quantize_fp8 method as it will be applied before quantization to achieve larger representation range of float8 datatypes and the class will be storing value k to inverse the it after dequantization.

Requirements:
TBA
Additional Code/logic Added:
TBA
Logic/Code changes to existing codebase:
TBA
Outcome:
TBA
Scope of Usage:
TBA
Example
TBA

Changes:

Dynamic Range Expansion Function: implementation of formula from the paper
Created OptimStateFp8WithDynamicRangeExpansion class by extending OptimStateFp8: by referencing the implementation of the OptimStatefp8. I have only overridden the dequantize method
Implemented aten.copy.default and aten.to_copy.default for OptimStateFp8WithDynamicRangeExpansion:

Benchmarks

Parameters

Parameter	Value
Learning Rate (`lr`)	0.0001
Automatic Mixed Precision (`amp`)	bf16
Seed	42
Model	timm/vit_base_patch16_224.augreg_in21k
Optimizer (`optim`)	AdamWFp8Ao_coat
Compile	False
Profile	False
Project	COAT-benchmarking
Number of Epochs	10
Run Name	AdamWFp8Ao_coat
Full BF16	False
Number of Workers	4
Batch Size	1024
Weight Decay	0
Channels Last	False
Optimizer CPU Offload	None
Cosine LR Scheduler	False
Checkpoint Activations	False

Results

…ass with DRE

pytorch-bot · 2024-11-06T11:33:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1231

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

This comment was automatically generated by Dr. CI and updates every 15 minutes.

gau-nernst · 2024-11-06T12:22:23Z

I was thinking you can just add a flag to the current OptimStateFp8, something like dynamic_range_expansion: bool, instead of subclass-ing it.

MirMustafaAli · 2024-11-06T13:21:24Z

I was thinking you can just add a flag to the current OptimStateFp8, something like dynamic_range_expansion: bool, instead of subclass-ing it.

i have added the flag for optimstatefp8. could you verify its right?

gau-nernst · 2024-11-06T13:56:29Z

I think this requires a bit more work. You need to verify that you can create an optimizer with this (add test to https://github.com/pytorch/ao/blob/main/test/prototype/test_low_bit_optim.py) as well do some short training runs for sanity checks (using https://github.com/pytorch/ao/blob/main/benchmarks/benchmark_low_bit_adam.py).

I think for merging the PR, we should wait for the official code release to check numeric against them.

If you don't mind, we can discuss more details in GPU-MODE discord group https://discord.gg/gpumode. Just create a thread under torchao and tag me in (@gau.nernst)

MirMustafaAli · 2024-11-06T13:59:21Z

I understand the situation for merging the PR. Will be glad to work on working on this issue. creating thread in gpumode

…ass with DRE

into add_coat_optimizer

torchao/prototype/low_bit_optim/subclass_fp8.py

…n is true

…plying condition on k

torchao/prototype/low_bit_optim/adam.py

gau-nernst

Thanks for the update. The PR is coming out nicely. There are some failing CI tests. Can you fix them, including the ruff linter?

Some extra items once that is finished:

Update doc (link to the paper + usage)
Run benchmark for sanity check https://github.com/pytorch/ao/blob/main/benchmarks/benchmark_low_bit_adam.py. I'm thinking comparing between BF16 baseline, FP8 optimizer, and FP8 COAT optimizer. Feel free to select a benchmark config suitable for you. And add the benchmark results in this PR description. Ideally, it should show that FP8 COAT is better than FP8 (though we might not observe it)

test/prototype/test_low_bit_optim.py

torchao/prototype/low_bit_optim/adam.py

torchao/prototype/low_bit_optim/subclass_fp8.py

…skip marker to within the function.

MirMustafaAli added 4 commits November 6, 2024 04:55

added dynamic range expansion

c62fcd3

created optimstate with DRE class

b11b4f6

implement copy_.default for OptimStateFp8WithDynamicRangeExpansion cl…

b887e69

…ass with DRE

implements _to_copy

ab02605

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 6, 2024

MirMustafaAli marked this pull request as draft November 6, 2024 11:35

MirMustafaAli added 5 commits November 6, 2024 07:06

removed implemented classes

43f5c08

dynamic_range_expansion -> apply_dynamic_range_expansion

7d98b15

add DRE flags to class

e03c79c

implementing contraction for dequantize

5faa7de

copy k values as well for copy method

b43c88f

MirMustafaAli added 10 commits November 8, 2024 12:25

added dynamic range expansion

ac41627

created optimstate with DRE class

79c9461

implement copy_.default for OptimStateFp8WithDynamicRangeExpansion cl…

47a7bb0

…ass with DRE

implements _to_copy

a162f94

removed implemented classes

5c1a3f4

dynamic_range_expansion -> apply_dynamic_range_expansion

1458d65

add DRE flags to class

42fbb09

implementing contraction for dequantize

9d1c00c

copy k values as well for copy method

7bc6ea4

Merge branch 'add_coat_optimizer' of https://github.com/MirMustafaAli/ao

7be5a6b

into add_coat_optimizer

MirMustafaAli force-pushed the add_coat_optimizer branch from 4c45349 to 7be5a6b Compare November 8, 2024 18:29

MirMustafaAli added 3 commits November 8, 2024 13:22

combine range_expansion into quantize_fp8 function

70937c8

passing apply_range_expansion to quantize_fp8

3583de7

remove apply_dynamic_range_expansion method

c754893

MirMustafaAli added 4 commits November 9, 2024 02:01

replaced condition check using variable k

3d0d5d6

added parameter dynamic_range_expansion

c413ac4

pass bool condition for quantizing src tensor

c3f5d29

readded the torchversion safe_global exports

1ec9335

gau-nernst requested changes Nov 9, 2024

View reviewed changes

torchao/prototype/low_bit_optim/subclass_fp8.py Outdated Show resolved Hide resolved

torchao/prototype/low_bit_optim/subclass_fp8.py Outdated Show resolved Hide resolved

torchao/prototype/low_bit_optim/subclass_fp8.py Outdated Show resolved Hide resolved

MirMustafaAli added 7 commits November 9, 2024 03:07

initialize k to none and later assign value if dynamic range expansio…

122530e

…n is true

conditional statement by checking if k is None instead of directly ap…

77e1371

…plying condition on k

checking if k is available in dst to copy it

366743c

matching parameters counts with constructor of optimStateFp8

38951ae

copy to k tensor only if k is not None

4b3fb6b

passing k tensor if values are available

7185b00

providing dynamic range expansion to the adamfloat8 class

0d7edae

gau-nernst reviewed Nov 12, 2024

View reviewed changes

torchao/prototype/low_bit_optim/adam.py Outdated Show resolved Hide resolved

MirMustafaAli added 3 commits November 12, 2024 09:52

change of _subclass_zeros from static method to normal class method

58ff635

added dynamic range expansion to adamwfp8

6c536a9

adding smoke test for additional parameters for float8 optimizers

767ccab

gau-nernst added the topic: new feature Use this tag if this PR adds a new feature label Nov 13, 2024

gau-nernst reviewed Nov 13, 2024

View reviewed changes

MirMustafaAli added 12 commits November 12, 2024 21:16

added new line

8fa5e3d

remove newline

f34bfdd

removed optim_addon parameter

41598a0

rename test_optim_addon to test_optim_fp8_coat_smoke

c189dc7

code formatting

6bb49ea

Merge branch 'main' into add_coat_optimizer

6707425

Moved device compatibility check for FP8 optimizer tests from pytest …

b1aea26

…skip marker to within the function.

formatting for ruff check F,I

92ca7b2

removing duplicate

861423d

checking if device is cuda before calling device capability

7661b61

Updating Readme with dynamic range Expansion and Reference to Paper

e1fa683

Merge branch 'main' into add_coat_optimizer

62eac8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[low-bit optim] Add coat for float8 optimizer #1231

[low-bit optim] Add coat for float8 optimizer #1231

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

gau-nernst left a comment

[low-bit optim] Add coat for float8 optimizer #1231

Are you sure you want to change the base?

[low-bit optim] Add coat for float8 optimizer #1231

Conversation

MirMustafaAli commented Nov 6, 2024 • edited Loading

Benchmarks

pytorch-bot bot commented Nov 6, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1231

❗ 1 Active SEVs

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024

gau-nernst commented Nov 6, 2024

MirMustafaAli commented Nov 6, 2024 • edited Loading

gau-nernst left a comment

Choose a reason for hiding this comment

MirMustafaAli commented Nov 6, 2024 •

edited

Loading

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

MirMustafaAli commented Nov 6, 2024 •

edited

Loading