[Auto Parallel] Improve the APIs #45776

aoyulong · 2022-09-06T01:47:32Z

PR types

Others

PR changes

APIs

Describe

This pr systematically improve the APIs of auto parallel, and update the codes and unittests as well. These APIs can work in both the dynamic and static graphs, which are transparent for users. The main contributions:

Engine class (in Paddle/python/paddle/distributed/auto_parallel/engine.py) provides the high-level APIs for distributed training, evaluating and predicting. For example:

import paddle
import paddle.vision.transforms as T
import paddle.distributed.auto_parallel as auto
from paddle.vision.datasets import MNIST

transform = T.Compose([
    T.Transpose(),
    T.Normalize([127.5], [127.5])
])
train_dataset = MNIST(mode='train', transform=transform)
valid_dataset = MNIST(mode='test', transform=transform)

model = paddle.vision.models.LeNet()
loss = paddle.nn.CrossEntropyLoss() 
optimizer = paddle.optimizer.Adam(
    learning_rate=0.001, parameters=model.parameters())
metrics = paddle.metric.Accuracy(topk=(1, 2))

engine = auto.Engine(model, loss, optimizer, metrics) 
# fit 
engine.fit(train_dataset,
           epochs=2,
           batch_size=64)
# evaluate 
engine.evaluate(valid_dataset,
                batch_size=64)
# predict
engine.predict(valid_dataset,
               batch_size=64)
# save
engine.save("./my_model")
# load 
engine.load("./my_model")

shard_tensor and shard_op (in Paddle/python/paddle/distributed/auto_parallel/interface.py) provides the mid-level APIs for users to shard tensors or operators according to their own choices. For example:

import paddle
import paddle.distributed.auto_parallel as auto 

mesh = auto.ProcessMesh([[0, 1], [2, 3]], dim_names=["x", "y"])
a = paddle.ones([4, 6])
b = paddle.zeros([4, 6])

# shard_tensor
auto.shard_tensor(a, mesh, shard_spec=["x", "y"])
# shard_op, functional style
auto.shard_op(paddle.add, mesh,
              in_shard_specs=[["x", "y"], ["y", None]],
              out_shard_specs=[[None, "x"]])(a, b)

Strategy (in Paddle/python/paddle/distributed/auto_parallel/strategy.py) is used to configure the paralleization and optimization behaviors.
ProcessMesh (in Paddle/python/paddle/distributed/auto_parallel/process_mesh.py) is used to describe the topology of the used processes in the distributed computation.

…t_attr

paddle-bot · 2022-09-06T01:47:35Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…oParallel/new_api

… new_api

[AutoParallel] Update to the new strategy impl

… new_api

qingqing01 · 2022-09-14T11:21:46Z

python/paddle/distributed/auto_parallel/engine.py

+            valid_data=None,
+            valid_freq=1,
+            valid_batch_size=1,
+            valid_sample_split=None,


下面evaluate接口中用的eval_data，这里命名最好保持一致

[Auto Parallel] Add valid after training

Bug fix

… new_api

…_api

JiabinYang

LGTM

XieYunshen

LGTM
单测时间阈值修改
单测迁移

* [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <[email protected]> Co-authored-by: caozhou <[email protected]> Co-authored-by: caozhou <[email protected]>

* [AutoParallel] adapt gradient merge pass (#45915) * adapt gradient merge * fix op_role * fix strategy * [Auto Parallel] Gradient Fuse Allreduce (#45643) * bugfix (#45332) * dist embedding support lookup table v1 * add unitest * customize wait_comm * group gradients * bugfix * update program * [Auto Parallel] Improve the APIs (#45776) * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Use c++ dist attr in the completion process * [Auto Parallel] Add minor changes * [Auto Parallel] Add the serialization process for dist attrs * [Auto Parallel] Remove unnecessary comments * [Auto Parallel] Fix some bugs * [Auto Parallel] Fix the code style * [Auto Parallel] Remove unnecessary impls * [Auto Parallel] Fix the importing error * [Auto Parallel] Fix the copy from bugs of op dist attr * [Auto Parallel] Replace the use of constexpr if * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh * [Auto Parallel] Change API of the completion unittest * [Auto Parallel] Fix the bug when set_attr an int * [Auto Parallel] Add the unittest for the serialization * [Auto Parallel] Add some unit tests * [Auto Paralle] Unify the strategy * [Auto Parallel] Improve the engine api * [Auto Parallel] Reset the changes made to the framework * [Auto Parallel] Change the engine unittest * [Auto Parallel] Update API of the completion and partitioner * [Auto Parallel] Update unit tests using engine api * update shard annotation * [Auto Parallel] Remove the modifications of other modules * [Auto Parallel] Add docs for APIs * add new strategy * [Auto Parallel] Replace the logger * [Auto Parallel] Restore the test_program.py * [Auto Parallel] Change the import rules * [Auto Parallel] Add the examples for Engine * [Auto Parallel] Do some minor changes * [Auto Parallel] Remove yaml dependency * [Auto Parallel] Fix the unittests * add valid after train * bug fix Co-authored-by: zhaoyingli <[email protected]> Co-authored-by: caozhou <[email protected]> Co-authored-by: caozhou <[email protected]> * [Auto Parallel] Bugfix allreduce fuse for MP (#46086) * bugfix * bugfix * typos fixed * update strategy (#46138) Co-authored-by: zhaoyingli <[email protected]> Co-authored-by: JZ-LIANG <[email protected]> Co-authored-by: zhaoyingli <[email protected]> Co-authored-by: caozhou <[email protected]> Co-authored-by: caozhou <[email protected]>

aoyulong added 25 commits August 24, 2022 13:18

[Auto Parallel] Use c++ dist attr in the completion process

45328ab

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into new_dis…

5bab411

…t_attr

[Auto Parallel] Add minor changes

247d61e

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into new_dis…

35cc6e3

…t_attr

[Auto Parallel] Use c++ dist attr in the completion process

26f9b9d

[Auto Parallel] Add minor changes

dafc2f3

[Auto Parallel] Add the serialization process for dist attrs

c37f036

[Auto Parallel] Remove unnecessary comments

76e193d

[Auto Parallel] Fix some bugs

e988924

Merge branch 'develop' into new_dist_attr

4b757a5

[Auto Parallel] Fix the code style

1897263

Merge branch 'new_dist_attr' into serialize_dist_attr

6215661

[Auto Parallel] Remove unnecessary impls

a243b06

[Auto Parallel] Fix the importing error

aeb113a

Merge branch 'new_dist_attr' into serialize_dist_attr

929fd58

[Auto Parallel] Fix the copy from bugs of op dist attr

8f10e71

Merge branch 'new_dist_attr' into serialize_dist_attr

748b1d2

[Auto Parallel] Replace the use of constexpr if

d5198e7

[Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh

f77dc9a

[Auto Parallel] Change API of the completion unittest

020bf9e

[Auto Parallel] Fix the bug when set_attr an int

6660644

Merge branch 'develop' into serialize_dist_attr

58be315

Merge branch 'serialize_dist_attr' into new_api

66e17d7

[Auto Parallel] Add the unittest for the serialization

7f9ddc8

Merge branch 'serialize_dist_attr' into new_api

b971647

aoyulong added 2 commits September 6, 2022 01:49

[Auto Parallel] Add some unit tests

fb5d7f4

[Auto Paralle] Unify the strategy

1572348

aoyulong changed the title ~~[Auto Parallel] Improve the APIs~~ [WIP: Auto Parallel] Improve the APIs Sep 7, 2022

aoyulong changed the title ~~[WIP: Auto Parallel] Improve the APIs~~ [Auto Parallel] WIP: Improve the APIs Sep 7, 2022

zhaoyingli and others added 10 commits September 13, 2022 19:44

Merge branch 'new_api' of https://github.com/aoyulong/Paddle into Aut…

06174d1

…oParallel/new_api

add new strategy

a6bf0c2

[Auto Parallel] Replace the logger

4d92b53

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6b1fed1

… new_api

[Auto Parallel] Restore the test_program.py

23f6539

fix conflict

d148cb8

Merge pull request #6 from zhaoyinglia/AutoParallel/new_api

e94e263

[AutoParallel] Update to the new strategy impl

Merge branch 'develop' into new_api

31c4d71

[Auto Parallel] Change the import rules

a0fe2c3

[Auto Parallel] Add the examples for Engine

2b54978

aoyulong changed the title ~~[Auto Parallel] WIP: Improve the APIs~~ [Auto Parallel] Improve the APIs Sep 14, 2022

aoyulong added 4 commits September 14, 2022 06:03

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

eac9880

… new_api

[Auto Parallel] Do some minor changes

3c5b79c

[Auto Parallel] Remove yaml dependency

fb435a0

[Auto Parallel] Fix the unittests

14bce35

qingqing01 reviewed Sep 14, 2022

View reviewed changes

zhaoyingli and others added 6 commits September 14, 2022 19:32

add valid after train

dd26467

Merge pull request #7 from zhaoyinglia/new_api_valid

28d6af8

[Auto Parallel] Add valid after training

bug fix

5b0603b

Merge pull request #8 from zhaoyinglia/new_api_fix

c0add9e

Bug fix

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8e81432

… new_api

Merge branch 'new_api' of https://github.com/aoyulong/Paddle into new…

5093402

…_api

JiabinYang approved these changes Sep 15, 2022

View reviewed changes

XieYunshen approved these changes Sep 15, 2022

View reviewed changes

raindrops2sea approved these changes Sep 15, 2022

View reviewed changes

aoyulong merged commit b042a3b into PaddlePaddle:develop Sep 15, 2022

aoyulong mentioned this pull request Sep 17, 2022

[Cherry-pick][Auto Parallel] Improve the APIs #46164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel] Improve the APIs #45776

[Auto Parallel] Improve the APIs #45776

aoyulong commented Sep 6, 2022 •

edited

Loading

paddle-bot bot commented Sep 6, 2022

qingqing01 Sep 14, 2022

aoyulong Sep 15, 2022

JiabinYang left a comment

XieYunshen left a comment

[Auto Parallel] Improve the APIs #45776

[Auto Parallel] Improve the APIs #45776

Conversation

aoyulong commented Sep 6, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Sep 6, 2022

qingqing01 Sep 14, 2022

Choose a reason for hiding this comment

aoyulong Sep 15, 2022

Choose a reason for hiding this comment

JiabinYang left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

aoyulong commented Sep 6, 2022 •

edited

Loading