Enable CPU Offload for Intel GPU #1324

dbyoung18 · 2024-11-22T06:01:49Z

Background

Current CPU Offload in torchao only supports CUDA backend. We would like to add support for Intel GPU with the device option "xpu".

Details

add "device" attribute to CPUOffloadOptimizer, default setting to "cuda"
enhance and verify UT test_optim_cpu_offload_correctness & test_optim_cpu_offload_save_load pass on Intel GPU
add "device" argument to benchmark_low_bit_adam.py. Users can use "--device xpu" to benchmark CPU Offload on Intel GPU. Currently it supports both full BF16 and BF16 AMP training w/ eager and compiled mode. Verified workloads on Intel GPU achieve memory saving and interleaving as expected as the description in reference PR:ao#584

pytorch-bot · 2024-11-22T06:01:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1324

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 03ac00f with merge base 478d15b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

gau-nernst

Thanks for the feature addition! Hopefully once the device-agnostic API support arrives, we can eliminate the if-else checks 😆

benchmarks/benchmark_low_bit_adam.py

torchao/prototype/low_bit_optim/cpu_offload.py

test/prototype/test_low_bit_optim.py

torchao/prototype/low_bit_optim/cpu_offload.py

gau-nernst · 2024-11-24T12:14:36Z

@dbyoung18 Can you run ruff format and push the formatted code? CUDA nightly is failing because of bitsandbytes calling triton.ops (~~I think later versions of triton doesn't have triton.ops anymore~~ bitsandbytes-foundation/bitsandbytes#1413). It's not related but not sure if we can merge until that is fixed 😢. I think other PRs will be affected too.

Otherwise, everything else looks good already!

dbyoung18 · 2024-11-24T15:02:42Z

@dbyoung18 Can you run ruff format and push the formatted code? CUDA nightly is failing because of bitsandbytes calling triton.ops (~~I think later versions of triton doesn't have triton.ops anymore~~ bitsandbytes-foundation/bitsandbytes#1413). It's not related but not sure if we can merge until that is fixed 😢. I think other PRs will be affected too.

Otherwise, everything else looks good already!

Done for ruff format. Hopes the bnb issue could be resolved soon. THX again for ur review and quick feedback:)

gau-nernst · 2024-11-26T00:57:50Z

@dbyoung18 Can you merge from main? #1343 should fix the bnb issue.

Also, can you also update the doc here? https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload

After that we are good to merge 😃

Signed-off-by: dbyoung18 <[email protected]>

dbyoung18 · 2024-11-26T02:28:15Z

@dbyoung18 Can you merge from main? #1343 should fix the bnb issue.

Also, can you also update the doc here? https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload

After that we are good to merge 😃

Done for both. We have a plan to gradually support torch-ao & pytorch core on Intel GPU. For this PR it covers CPU Offload only and I will look into the remain part of low-bit optimizers for next step. Since meanwhile we are also on the way to upstream FlashAttention backend to pytorch core(target v2.6 or v2.7), would like to add benchmark data to the README when it's ready. So currently, I only modify the README to make the CPU-Offload part to cover XPU scope. THX for review and I am also looking forward to make further contributions soon.😃

gau-nernst · 2024-11-26T02:44:55Z

Sounds good! The low-bit optimizers rely entirely on the tensor subclass + torch.compile() stack, so as long as there is a triton build that supports XPU backend, it should work out-of-the-box!

…RM (pytorch#1324)

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2024

dbyoung18 marked this pull request as draft November 22, 2024 06:02

dbyoung18 changed the title ~~Enable CPU Offload for Intel GPU~~ [WIP] Enable CPU Offload for Intel GPU Nov 22, 2024

dbyoung18 force-pushed the xpu/cpu_offload branch from 755414f to 742ffbf Compare November 22, 2024 07:03

dbyoung18 changed the title ~~[WIP] Enable CPU Offload for Intel GPU~~ Enable CPU Offload for Intel GPU Nov 22, 2024

dbyoung18 marked this pull request as ready for review November 22, 2024 07:07

gau-nernst approved these changes Nov 23, 2024

View reviewed changes

gau-nernst added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Nov 23, 2024

gau-nernst reviewed Nov 23, 2024

View reviewed changes

torchao/prototype/low_bit_optim/cpu_offload.py Outdated Show resolved Hide resolved

gau-nernst reviewed Nov 24, 2024

View reviewed changes

torchao/prototype/low_bit_optim/cpu_offload.py Outdated Show resolved Hide resolved

torchao/prototype/low_bit_optim/cpu_offload.py Outdated Show resolved Hide resolved

dbyoung18 added 10 commits November 26, 2024 02:12

feat(cpu-offload): enable CPU Offload for XPU

1ac46ce

Signed-off-by: dbyoung18 <[email protected]>

test(cpu-offload): enable benchmark_low_bit_adam for XPU

7de69c8

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): auto-detect ProfilerActivity

d5635cf

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): replace if-else w/ getattr for device API calls

e563122

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): add auto-detect available devices to utils

8453f20

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): improve auto-detect ProfilerActivity

b34700d

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): improve device assert

80e582a

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): fix auto-detect mps

3dd3d50

Signed-off-by: dbyoung18 <[email protected]>

fix(cpu-offload): fix import order

66310ba

Signed-off-by: dbyoung18 <[email protected]>

refactor(cpu-offload): use ruff format

9d15d4b

Signed-off-by: dbyoung18 <[email protected]>

dbyoung18 force-pushed the xpu/cpu_offload branch from 649b00b to 9d15d4b Compare November 26, 2024 02:12

doc(cpu-offload): modify README to cover XPU

03ac00f

Signed-off-by: dbyoung18 <[email protected]>

msaroufim self-requested a review November 26, 2024 03:21

msaroufim approved these changes Nov 26, 2024

View reviewed changes

msaroufim merged commit 6ff3904 into pytorch:main Nov 26, 2024
18 checks passed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Update quantization.md for low bit: Add Warning for supporting only A…

70260eb

…RM (pytorch#1324)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable CPU Offload for Intel GPU #1324

Enable CPU Offload for Intel GPU #1324

Uh oh!

dbyoung18 commented Nov 22, 2024

Uh oh!

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading

Uh oh!

gau-nernst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gau-nernst commented Nov 24, 2024 •

edited

Loading

Uh oh!

dbyoung18 commented Nov 24, 2024

Uh oh!

gau-nernst commented Nov 26, 2024

Uh oh!

dbyoung18 commented Nov 26, 2024

Uh oh!

gau-nernst commented Nov 26, 2024

Uh oh!

Uh oh!

Uh oh!

Enable CPU Offload for Intel GPU #1324

Enable CPU Offload for Intel GPU #1324

Uh oh!

Conversation

dbyoung18 commented Nov 22, 2024

Background

Details

Uh oh!

pytorch-bot bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1324

✅ No Failures

Uh oh!

gau-nernst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gau-nernst commented Nov 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbyoung18 commented Nov 24, 2024

Uh oh!

gau-nernst commented Nov 26, 2024

Uh oh!

dbyoung18 commented Nov 26, 2024

Uh oh!

gau-nernst commented Nov 26, 2024

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading

gau-nernst commented Nov 24, 2024 •

edited

Loading