Add Compressedbackend for Onebit optimizers #5473

Liangliang-Ma · 2024-04-28T02:44:34Z

In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of compressed_allreduce lies on packbits and unpackbits. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the compressed_allreduce work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators.

In this PR, we:

Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam
Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder
Add tests for onebit with CompressedBackend

Onebitadam

delock · 2024-05-04T12:46:45Z

csrc/xpu/packbits/packing.cpp

+
+at::Tensor packbits(at::Tensor tensor, int input_size, int rank)
+{
+    /*


@Liangliang-Ma the function documentation needs to be moved to line 39 right before the function def line.

delock · 2024-05-07T00:38:44Z

@tjruwase this PR is an approach to abstract the generic part of 1bit-adam and implment accelerator dependent part with DeepSpeed custom op builder. So 1bit-adam does not need to depend on accelerator specific libraries.

@inkcherry I remember you investigated in 1bit adam portability before, FYI this PR implement a portable version of 1bit adam support.

Liangliang-Ma · 2024-05-14T06:46:08Z

Hi @tjruwase , could you please help to review this PR? Thanks!

tests/onebit/test_compressed_backend.py

add README.md for onebit tests

Liangliang-Ma · 2024-05-24T03:21:53Z

@tjruwase I have noticed that in onebit unit test, the onebit comm backend is assigned like this:
"comm_backend_name": get_accelerator().communication_backend_name().
But in fact, it's not the same thing, for get_accelerator().communication_backend_name() chooses which backend to do regular comm, like broadcast, allgather etc, while onebit backend specifies how to do error-compensated compression. We have MPI-based onebit backend not in accelerators' comm backend. And also I found that both HPU and NPU accelerator have 'hccl' communication_backend_name, which will lead to same onebit backend. So maybe we can change this part? How about we add a interface under accelerator like get_accelerator().onebit_backend() or change unit test to a more general way? Thanks.

Liangliang-Ma · 2024-06-05T03:40:26Z

@tjruwase Hi, May I ask if you could help to review my last comment or merge this one first? Thanks

tjruwase · 2024-06-05T20:28:38Z

@Liangliang-Ma, apologies for delay. I am still thinking about your last comment, but will not delay this PR.

In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of `compressed_allreduce` lies on `packbits` and `unpackbits`. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the `compressed_allreduce` work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators. In this PR, we: 1. Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam 2. Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder 3. Add tests for onebit with CompressedBackend --------- Co-authored-by: Olatunji Ruwase <[email protected]>

This one is document supplement for #5473. --------- Co-authored-by: Logan Adams <[email protected]>

Liangliang-Ma added 10 commits April 16, 2024 01:24

init of onebitadam

3236b2c

add packbits builder

614ea06

packing kernels

0fd77ca

rename ccl backend

d5227a2

rename file

aa8b6dc

rename and add comment

c6cd225

format

c883555

change kernel and add ut

d0c9b4b

rm unused lib

bd04050

Merge pull request #10 from Liangliang-Ma/onebitadam

706934f

Onebitadam

Liangliang-Ma requested review from conglongli, awan-10, mrwyattii, tjruwase, loadams and arashb as code owners April 28, 2024 02:44

delock reviewed May 4, 2024

View reviewed changes

move doc to right place

3073f9a

Merge branch 'master' into onebit

4a1dd4c

delock mentioned this pull request May 10, 2024

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

Closed

25 tasks

Liangliang-Ma marked this pull request as draft May 11, 2024 08:50

Liangliang-Ma marked this pull request as ready for review May 14, 2024 06:44

tjruwase added 2 commits May 14, 2024 08:38

Merge branch 'master' into onebit

0cb887a

Merge branch 'master' into onebit

7ca31a3

tjruwase reviewed May 21, 2024

View reviewed changes

tests/onebit/test_compressed_backend.py Show resolved Hide resolved

Liangliang-Ma added 3 commits May 22, 2024 23:54

Create README.md

1ef6db7

add README.md for onebit tests

49baea4

add README.md for onebit tests

Merge branch 'master' into onebit

983152a

Merge branch 'master' into onebit

b2537ae

tjruwase approved these changes May 23, 2024

View reviewed changes

fix readme format

beff88c

tjruwase added this pull request to the merge queue Jun 5, 2024

Merged via the queue into microsoft:master with commit 11a62a0 Jun 5, 2024
13 checks passed

Liangliang-Ma mentioned this pull request Jul 19, 2024

Add doc of compressed backend in Onebit optimizers #5782

Merged

loadams added a commit that referenced this pull request Jul 29, 2024

Add doc of compressed backend in Onebit optimizers (#5782)

afe1b9e

This one is document supplement for #5473. --------- Co-authored-by: Logan Adams <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Compressedbackend for Onebit optimizers #5473

Add Compressedbackend for Onebit optimizers #5473

Liangliang-Ma commented Apr 28, 2024

delock May 4, 2024

Liangliang-Ma May 6, 2024

delock commented May 7, 2024

Liangliang-Ma commented May 14, 2024

Liangliang-Ma commented May 24, 2024 •

edited

Loading

Liangliang-Ma commented Jun 5, 2024

tjruwase commented Jun 5, 2024

Add Compressedbackend for Onebit optimizers #5473

Add Compressedbackend for Onebit optimizers #5473

Conversation

Liangliang-Ma commented Apr 28, 2024

delock May 4, 2024

Choose a reason for hiding this comment

Liangliang-Ma May 6, 2024

Choose a reason for hiding this comment

delock commented May 7, 2024

Liangliang-Ma commented May 14, 2024

Liangliang-Ma commented May 24, 2024 • edited Loading

Liangliang-Ma commented Jun 5, 2024

tjruwase commented Jun 5, 2024

Liangliang-Ma commented May 24, 2024 •

edited

Loading